2018-09-10 16:23:53 +08:00
|
|
|
//===-- DexTests.cpp ---------------------------------*- C++ -*-----------===//
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2018-09-06 20:54:43 +08:00
|
|
|
#include "FuzzyMatch.h"
|
|
|
|
#include "TestFS.h"
|
2018-08-20 22:39:32 +08:00
|
|
|
#include "TestIndex.h"
|
|
|
|
#include "index/Index.h"
|
|
|
|
#include "index/Merge.h"
|
2019-04-12 18:09:37 +08:00
|
|
|
#include "index/SymbolID.h"
|
2018-09-10 16:23:53 +08:00
|
|
|
#include "index/dex/Dex.h"
|
2018-07-27 17:54:27 +08:00
|
|
|
#include "index/dex/Iterator.h"
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
#include "index/dex/Token.h"
|
|
|
|
#include "index/dex/Trigram.h"
|
2018-07-27 17:54:27 +08:00
|
|
|
#include "llvm/Support/ScopedPrinter.h"
|
|
|
|
#include "llvm/Support/raw_ostream.h"
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
#include "gmock/gmock.h"
|
|
|
|
#include "gtest/gtest.h"
|
|
|
|
#include <string>
|
|
|
|
#include <vector>
|
|
|
|
|
2018-10-09 18:02:02 +08:00
|
|
|
using ::testing::AnyOf;
|
2018-08-20 22:39:32 +08:00
|
|
|
using ::testing::ElementsAre;
|
2019-04-12 18:09:37 +08:00
|
|
|
using ::testing::IsEmpty;
|
2018-08-20 22:39:32 +08:00
|
|
|
using ::testing::UnorderedElementsAre;
|
|
|
|
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
namespace clang {
|
|
|
|
namespace clangd {
|
|
|
|
namespace dex {
|
2018-08-20 22:39:32 +08:00
|
|
|
namespace {
|
[clangd] Proof-of-concept query iterators for Dex symbol index
This patch introduces three essential types of query iterators:
`DocumentIterator`, `AndIterator`, `OrIterator`. It provides a
convenient API for query tree generation and serves as a building block
for the next generation symbol index - Dex. Currently, many
optimizations are missed to improve code readability and to serve as the
reference implementation. Potential improvements are briefly mentioned
in `FIXME`s and will be addressed in the following patches.
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
Iterators, their applications and potential extensions are explained in
detail in the design proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: ioeric, sammccall, ilya-biryukov
Subscribers: cfe-commits, klimek, jfb, mgrang, mgorny, MaskRay, jkorous,
arphaman
Differential Revision: https://reviews.llvm.org/D49546
llvm-svn: 338017
2018-07-26 18:42:31 +08:00
|
|
|
|
2018-09-06 20:54:43 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// Query iterator tests.
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2018-08-24 19:25:43 +08:00
|
|
|
std::vector<DocID> consumeIDs(Iterator &It) {
|
|
|
|
auto IDAndScore = consume(It);
|
2018-08-22 21:44:15 +08:00
|
|
|
std::vector<DocID> IDs(IDAndScore.size());
|
|
|
|
for (size_t I = 0; I < IDAndScore.size(); ++I)
|
|
|
|
IDs[I] = IDAndScore[I].first;
|
|
|
|
return IDs;
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, DocumentIterator) {
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L({4, 7, 8, 20, 42, 100});
|
|
|
|
auto DocIterator = L.iterator();
|
2018-07-27 17:54:27 +08:00
|
|
|
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 4U);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
|
|
|
DocIterator->advance();
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 7U);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
|
|
|
DocIterator->advanceTo(20);
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 20U);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
|
|
|
DocIterator->advanceTo(65);
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 100U);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
|
|
|
DocIterator->advanceTo(420);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_TRUE(DocIterator->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, AndTwoLists) {
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{10000};
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L0({0, 5, 7, 10, 42, 320, 9000});
|
|
|
|
const PostingList L1({0, 4, 7, 10, 30, 60, 320, 9000});
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
auto And = C.intersect(L1.iterator(), L0.iterator());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(And->reachedEnd());
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(consumeIDs(*And), ElementsAre(0U, 7U, 10U, 320U, 9000U));
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
And = C.intersect(L0.iterator(), L1.iterator());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
|
|
|
And->advanceTo(0);
|
|
|
|
EXPECT_EQ(And->peek(), 0U);
|
|
|
|
And->advanceTo(5);
|
|
|
|
EXPECT_EQ(And->peek(), 7U);
|
|
|
|
And->advanceTo(10);
|
|
|
|
EXPECT_EQ(And->peek(), 10U);
|
|
|
|
And->advanceTo(42);
|
|
|
|
EXPECT_EQ(And->peek(), 320U);
|
|
|
|
And->advanceTo(8999);
|
|
|
|
EXPECT_EQ(And->peek(), 9000U);
|
|
|
|
And->advanceTo(9001);
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, AndThreeLists) {
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{10000};
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L0({0, 5, 7, 10, 42, 320, 9000});
|
|
|
|
const PostingList L1({0, 4, 7, 10, 30, 60, 320, 9000});
|
|
|
|
const PostingList L2({1, 4, 7, 11, 30, 60, 320, 9000});
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
auto And = C.intersect(L0.iterator(), L1.iterator(), L2.iterator());
|
2018-07-27 17:54:27 +08:00
|
|
|
EXPECT_EQ(And->peek(), 7U);
|
|
|
|
And->advanceTo(300);
|
|
|
|
EXPECT_EQ(And->peek(), 320U);
|
|
|
|
And->advanceTo(100000);
|
|
|
|
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_TRUE(And->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
}
|
|
|
|
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
TEST(DexIterators, AndEmpty) {
|
|
|
|
Corpus C{10000};
|
2018-10-05 00:05:22 +08:00
|
|
|
const PostingList L1{1};
|
|
|
|
const PostingList L2{2};
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
// These iterators are empty, but the optimizer can't tell.
|
|
|
|
auto Empty1 = C.intersect(L1.iterator(), L2.iterator());
|
|
|
|
auto Empty2 = C.intersect(L1.iterator(), L2.iterator());
|
|
|
|
// And syncs iterators on construction, and used to fail on empty children.
|
|
|
|
auto And = C.intersect(std::move(Empty1), std::move(Empty2));
|
|
|
|
EXPECT_TRUE(And->reachedEnd());
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, OrTwoLists) {
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{10000};
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L0({0, 5, 7, 10, 42, 320, 9000});
|
|
|
|
const PostingList L1({0, 4, 7, 10, 30, 60, 320, 9000});
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
auto Or = C.unionOf(L0.iterator(), L1.iterator());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(Or->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
EXPECT_EQ(Or->peek(), 0U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 4U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 5U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 7U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 10U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 30U);
|
|
|
|
Or->advanceTo(42);
|
|
|
|
EXPECT_EQ(Or->peek(), 42U);
|
|
|
|
Or->advanceTo(300);
|
|
|
|
EXPECT_EQ(Or->peek(), 320U);
|
|
|
|
Or->advanceTo(9000);
|
|
|
|
EXPECT_EQ(Or->peek(), 9000U);
|
|
|
|
Or->advanceTo(9001);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_TRUE(Or->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
Or = C.unionOf(L0.iterator(), L1.iterator());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(consumeIDs(*Or),
|
2018-07-27 17:54:27 +08:00
|
|
|
ElementsAre(0U, 4U, 5U, 7U, 10U, 30U, 42U, 60U, 320U, 9000U));
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, OrThreeLists) {
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{10000};
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L0({0, 5, 7, 10, 42, 320, 9000});
|
|
|
|
const PostingList L1({0, 4, 7, 10, 30, 60, 320, 9000});
|
|
|
|
const PostingList L2({1, 4, 7, 11, 30, 60, 320, 9000});
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
auto Or = C.unionOf(L0.iterator(), L1.iterator(), L2.iterator());
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(Or->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
EXPECT_EQ(Or->peek(), 0U);
|
|
|
|
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 1U);
|
|
|
|
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 4U);
|
|
|
|
|
|
|
|
Or->advanceTo(7);
|
|
|
|
|
|
|
|
Or->advanceTo(59);
|
|
|
|
EXPECT_EQ(Or->peek(), 60U);
|
|
|
|
|
|
|
|
Or->advanceTo(9001);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_TRUE(Or->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// FIXME(kbobyrev): The testcase below is similar to what is expected in real
|
|
|
|
// queries. It should be updated once new iterators (such as boosting, limiting,
|
|
|
|
// etc iterators) appear. However, it is not exhaustive and it would be
|
2018-08-22 21:44:15 +08:00
|
|
|
// beneficial to implement automatic generation (e.g. fuzzing) of query trees
|
|
|
|
// for more comprehensive testing.
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, QueryTree) {
|
2018-07-27 17:54:27 +08:00
|
|
|
//
|
|
|
|
// +-----------------+
|
|
|
|
// |And Iterator:1, 5|
|
|
|
|
// +--------+--------+
|
|
|
|
// |
|
|
|
|
// |
|
2018-08-22 21:44:15 +08:00
|
|
|
// +-------------+----------------------+
|
2018-07-27 17:54:27 +08:00
|
|
|
// | |
|
|
|
|
// | |
|
2018-08-22 21:44:15 +08:00
|
|
|
// +----------v----------+ +----------v------------+
|
|
|
|
// |And Iterator: 1, 5, 9| |Or Iterator: 0, 1, 3, 5|
|
|
|
|
// +----------+----------+ +----------+------------+
|
2018-07-27 17:54:27 +08:00
|
|
|
// | |
|
2018-09-25 19:54:51 +08:00
|
|
|
// +------+-----+ ------------+
|
|
|
|
// | | | |
|
|
|
|
// +-------v-----+ +----+---+ +---v----+ +----v---+
|
|
|
|
// |1, 3, 5, 8, 9| |Boost: 2| |Boost: 3| |Boost: 4|
|
|
|
|
// +-------------+ +----+---+ +---+----+ +----+---+
|
|
|
|
// | | |
|
|
|
|
// +----v-----+ +-v--+ +---v---+
|
|
|
|
// |1, 5, 7, 9| |1, 5| |0, 3, 5|
|
|
|
|
// +----------+ +----+ +-------+
|
2018-08-22 21:44:15 +08:00
|
|
|
//
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{10};
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L0({1, 3, 5, 8, 9});
|
|
|
|
const PostingList L1({1, 5, 7, 9});
|
2018-09-25 19:54:51 +08:00
|
|
|
const PostingList L2({1, 5});
|
|
|
|
const PostingList L3({0, 3, 5});
|
2018-07-27 17:54:27 +08:00
|
|
|
|
|
|
|
// Root of the query tree: [1, 5]
|
2018-10-03 03:59:23 +08:00
|
|
|
auto Root = C.intersect(
|
2018-07-27 17:54:27 +08:00
|
|
|
// Lower And Iterator: [1, 5, 9]
|
2018-10-03 03:59:23 +08:00
|
|
|
C.intersect(L0.iterator(), C.boost(L1.iterator(), 2U)),
|
2018-07-27 17:54:27 +08:00
|
|
|
// Lower Or Iterator: [0, 1, 5]
|
2018-10-03 03:59:23 +08:00
|
|
|
C.unionOf(C.boost(L2.iterator(), 3U), C.boost(L3.iterator(), 4U)));
|
2018-07-27 17:54:27 +08:00
|
|
|
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_FALSE(Root->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
EXPECT_EQ(Root->peek(), 1U);
|
|
|
|
Root->advanceTo(0);
|
|
|
|
// Advance multiple times. Shouldn't do anything.
|
|
|
|
Root->advanceTo(1);
|
|
|
|
Root->advanceTo(0);
|
|
|
|
EXPECT_EQ(Root->peek(), 1U);
|
2018-08-24 19:25:43 +08:00
|
|
|
auto ElementBoost = Root->consume();
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(ElementBoost, 6);
|
2018-07-27 17:54:27 +08:00
|
|
|
Root->advance();
|
|
|
|
EXPECT_EQ(Root->peek(), 5U);
|
|
|
|
Root->advanceTo(5);
|
|
|
|
EXPECT_EQ(Root->peek(), 5U);
|
2018-08-24 19:25:43 +08:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(ElementBoost, 8);
|
2018-07-27 17:54:27 +08:00
|
|
|
Root->advanceTo(9000);
|
2018-08-20 17:16:14 +08:00
|
|
|
EXPECT_TRUE(Root->reachedEnd());
|
2018-07-27 17:54:27 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, StringRepresentation) {
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{10};
|
2018-10-02 19:51:36 +08:00
|
|
|
const PostingList L1({1, 3, 5});
|
|
|
|
const PostingList L2({1, 7, 9});
|
|
|
|
|
|
|
|
// No token given, prints full posting list.
|
|
|
|
auto I1 = L1.iterator();
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*I1), "[1 3 5]");
|
2018-10-02 19:51:36 +08:00
|
|
|
|
|
|
|
// Token given, uses token's string representation.
|
|
|
|
Token Tok(Token::Kind::Trigram, "L2");
|
|
|
|
auto I2 = L1.iterator(&Tok);
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*I2), "T=L2");
|
2018-10-02 19:51:36 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
auto Tree = C.limit(C.intersect(move(I1), move(I2)), 10);
|
2018-10-09 18:02:02 +08:00
|
|
|
// AND reorders its children, we don't care which order it prints.
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_THAT(llvm::to_string(*Tree), AnyOf("(LIMIT 10 (& [1 3 5] T=L2))",
|
|
|
|
"(LIMIT 10 (& T=L2 [1 3 5]))"));
|
2018-07-27 17:54:27 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, Limit) {
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{10000};
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L0({3, 6, 7, 20, 42, 100});
|
|
|
|
const PostingList L1({1, 3, 5, 6, 7, 30, 100});
|
|
|
|
const PostingList L2({0, 3, 5, 7, 8, 100});
|
2018-08-10 19:50:44 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
auto DocIterator = C.limit(L0.iterator(), 42);
|
2018-08-24 19:25:43 +08:00
|
|
|
EXPECT_THAT(consumeIDs(*DocIterator), ElementsAre(3, 6, 7, 20, 42, 100));
|
2018-08-10 19:50:44 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
DocIterator = C.limit(L0.iterator(), 3);
|
2018-08-24 19:25:43 +08:00
|
|
|
EXPECT_THAT(consumeIDs(*DocIterator), ElementsAre(3, 6, 7));
|
2018-08-10 19:50:44 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
DocIterator = C.limit(L0.iterator(), 0);
|
2018-08-24 19:25:43 +08:00
|
|
|
EXPECT_THAT(consumeIDs(*DocIterator), ElementsAre());
|
2018-08-10 19:50:44 +08:00
|
|
|
|
2018-10-03 03:59:23 +08:00
|
|
|
auto AndIterator =
|
|
|
|
C.intersect(C.limit(C.all(), 343), C.limit(L0.iterator(), 2),
|
|
|
|
C.limit(L1.iterator(), 3), C.limit(L2.iterator(), 42));
|
2018-08-24 19:25:43 +08:00
|
|
|
EXPECT_THAT(consumeIDs(*AndIterator), ElementsAre(3, 7));
|
2018-08-10 19:50:44 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, True) {
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
EXPECT_TRUE(Corpus{0}.all()->reachedEnd());
|
|
|
|
EXPECT_THAT(consumeIDs(*Corpus{4}.all()), ElementsAre(0, 1, 2, 3));
|
2018-08-22 21:44:15 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexIterators, Boost) {
|
2018-10-03 03:59:23 +08:00
|
|
|
Corpus C{5};
|
|
|
|
auto BoostIterator = C.boost(C.all(), 42U);
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_FALSE(BoostIterator->reachedEnd());
|
2018-08-24 19:25:43 +08:00
|
|
|
auto ElementBoost = BoostIterator->consume();
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(ElementBoost, 42U);
|
|
|
|
|
2018-09-14 01:11:03 +08:00
|
|
|
const PostingList L0({2, 4});
|
|
|
|
const PostingList L1({1, 4});
|
2018-10-03 03:59:23 +08:00
|
|
|
auto Root = C.unionOf(C.all(), C.boost(L0.iterator(), 2U),
|
|
|
|
C.boost(L1.iterator(), 3U));
|
2018-08-22 21:44:15 +08:00
|
|
|
|
2018-08-24 19:25:43 +08:00
|
|
|
ElementBoost = Root->consume();
|
2018-10-03 03:59:23 +08:00
|
|
|
EXPECT_THAT(ElementBoost, 1);
|
2018-08-22 21:44:15 +08:00
|
|
|
Root->advance();
|
|
|
|
EXPECT_THAT(Root->peek(), 1U);
|
2018-08-24 19:25:43 +08:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(ElementBoost, 3);
|
|
|
|
|
|
|
|
Root->advance();
|
|
|
|
EXPECT_THAT(Root->peek(), 2U);
|
2018-08-24 19:25:43 +08:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(ElementBoost, 2);
|
|
|
|
|
|
|
|
Root->advanceTo(4);
|
2018-08-24 19:25:43 +08:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 21:44:15 +08:00
|
|
|
EXPECT_THAT(ElementBoost, 3);
|
2018-08-20 16:47:30 +08:00
|
|
|
}
|
|
|
|
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
TEST(DexIterators, Optimizations) {
|
|
|
|
Corpus C{5};
|
2018-10-05 00:29:58 +08:00
|
|
|
const PostingList L1{1};
|
|
|
|
const PostingList L2{2};
|
|
|
|
const PostingList L3{3};
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
|
|
|
|
// empty and/or yield true/false
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*C.intersect()), "true");
|
|
|
|
EXPECT_EQ(llvm::to_string(*C.unionOf()), "false");
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
|
|
|
|
// true/false inside and/or short-circuit
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*C.intersect(L1.iterator(), C.all())), "[1]");
|
|
|
|
EXPECT_EQ(llvm::to_string(*C.intersect(L1.iterator(), C.none())), "false");
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
// Not optimized to avoid breaking boosts.
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*C.unionOf(L1.iterator(), C.all())),
|
|
|
|
"(| [1] true)");
|
|
|
|
EXPECT_EQ(llvm::to_string(*C.unionOf(L1.iterator(), C.none())), "[1]");
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
|
|
|
|
// and/or nested inside and/or are flattened
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*C.intersect(
|
|
|
|
L1.iterator(), C.intersect(L1.iterator(), L1.iterator()))),
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
"(& [1] [1] [1])");
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*C.unionOf(
|
|
|
|
L1.iterator(), C.unionOf(L2.iterator(), L3.iterator()))),
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
"(| [1] [2] [3])");
|
|
|
|
|
|
|
|
// optimizations combine over multiple levels
|
2019-01-07 23:45:19 +08:00
|
|
|
EXPECT_EQ(llvm::to_string(*C.intersect(
|
|
|
|
C.intersect(L1.iterator(), C.intersect()), C.unionOf(C.all()))),
|
[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
2018-10-04 21:12:23 +08:00
|
|
|
"[1]");
|
|
|
|
}
|
|
|
|
|
2018-09-06 20:54:43 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// Search token tests.
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2019-05-06 18:08:47 +08:00
|
|
|
::testing::Matcher<std::vector<Token>>
|
2018-09-06 20:54:43 +08:00
|
|
|
tokensAre(std::initializer_list<std::string> Strings, Token::Kind Kind) {
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
std::vector<Token> Tokens;
|
2018-09-06 20:54:43 +08:00
|
|
|
for (const auto &TokenData : Strings) {
|
|
|
|
Tokens.push_back(Token(Kind, TokenData));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
}
|
2019-05-06 18:08:47 +08:00
|
|
|
return ::testing::UnorderedElementsAreArray(Tokens);
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
}
|
|
|
|
|
2019-05-06 18:08:47 +08:00
|
|
|
::testing::Matcher<std::vector<Token>>
|
2018-09-06 20:54:43 +08:00
|
|
|
trigramsAre(std::initializer_list<std::string> Trigrams) {
|
|
|
|
return tokensAre(Trigrams, Token::Kind::Trigram);
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTrigrams, IdentifierTrigrams) {
|
2018-08-13 16:57:06 +08:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("X86"),
|
2018-10-04 22:01:55 +08:00
|
|
|
trigramsAre({"x86", "x", "x8"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
2018-10-04 22:01:55 +08:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("nl"), trigramsAre({"nl", "n"}));
|
2018-08-13 16:57:06 +08:00
|
|
|
|
2018-10-04 22:01:55 +08:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("n"), trigramsAre({"n"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("clangd"),
|
2018-10-04 22:01:55 +08:00
|
|
|
trigramsAre({"c", "cl", "cla", "lan", "ang", "ngd"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("abc_def"),
|
2018-10-04 22:01:55 +08:00
|
|
|
trigramsAre({"a", "ab", "ad", "abc", "abd", "ade", "bcd", "bde",
|
|
|
|
"cde", "def"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
2018-08-13 16:57:06 +08:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("a_b_c_d_e_"),
|
2018-10-04 22:08:11 +08:00
|
|
|
trigramsAre({"a", "a_", "ab", "abc", "bcd", "cde"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
2018-08-13 16:57:06 +08:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("unique_ptr"),
|
2018-10-04 22:01:55 +08:00
|
|
|
trigramsAre({"u", "un", "up", "uni", "unp", "upt", "niq", "nip",
|
|
|
|
"npt", "iqu", "iqp", "ipt", "que", "qup", "qpt",
|
|
|
|
"uep", "ept", "ptr"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
2018-08-28 01:26:43 +08:00
|
|
|
EXPECT_THAT(
|
|
|
|
generateIdentifierTrigrams("TUDecl"),
|
2018-10-04 22:01:55 +08:00
|
|
|
trigramsAre({"t", "tu", "td", "tud", "tde", "ude", "dec", "ecl"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("IsOK"),
|
2018-10-04 22:01:55 +08:00
|
|
|
trigramsAre({"i", "is", "io", "iso", "iok", "sok"}));
|
2018-08-13 16:57:06 +08:00
|
|
|
|
|
|
|
EXPECT_THAT(
|
|
|
|
generateIdentifierTrigrams("abc_defGhij__klm"),
|
2018-10-04 22:08:11 +08:00
|
|
|
trigramsAre({"a", "ab", "ad", "abc", "abd", "ade", "adg", "bcd",
|
|
|
|
"bde", "bdg", "cde", "cdg", "def", "deg", "dgh", "dgk",
|
|
|
|
"efg", "egh", "egk", "fgh", "fgk", "ghi", "ghk", "gkl",
|
2018-10-04 22:01:55 +08:00
|
|
|
"hij", "hik", "hkl", "ijk", "ikl", "jkl", "klm"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTrigrams, QueryTrigrams) {
|
2018-10-04 22:01:55 +08:00
|
|
|
EXPECT_THAT(generateQueryTrigrams("c"), trigramsAre({"c"}));
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("cl"), trigramsAre({"cl"}));
|
2018-08-13 16:57:06 +08:00
|
|
|
EXPECT_THAT(generateQueryTrigrams("cla"), trigramsAre({"cla"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
[clangd] Simplify Dex query tree logic and fix missing-posting-list bug
Summary:
The bug being fixed: when a posting list doesn't exist in the index, it
was previously just dropped from the query rather than being treated as
empty. Now that we have the FALSE iterator, we can use it instead.
The query tree logic previously had a bunch of special cases to detect whether
subtrees are empty. Now we just naively build the whole tree, and rely
on the query optimizations to drop the trivial parts.
Finally, there was a bug in trigram generation: the empty query would
generate a single trigram "$$$" instead of no trigrams.
This had no effect (there was no posting list, so the other bug
cancelled it out). But we now have to fix this bug too.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52796
llvm-svn: 343802
2018-10-05 01:18:55 +08:00
|
|
|
EXPECT_THAT(generateQueryTrigrams(""), trigramsAre({}));
|
2018-10-04 22:01:55 +08:00
|
|
|
EXPECT_THAT(generateQueryTrigrams("_"), trigramsAre({"_"}));
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("__"), trigramsAre({"__"}));
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("___"), trigramsAre({}));
|
2018-08-13 16:57:06 +08:00
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("X86"), trigramsAre({"x86"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("clangd"),
|
|
|
|
trigramsAre({"cla", "lan", "ang", "ngd"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("abc_def"),
|
|
|
|
trigramsAre({"abc", "bcd", "cde", "def"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("a_b_c_d_e_"),
|
|
|
|
trigramsAre({"abc", "bcd", "cde"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("unique_ptr"),
|
|
|
|
trigramsAre({"uni", "niq", "iqu", "que", "uep", "ept", "ptr"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("TUDecl"),
|
|
|
|
trigramsAre({"tud", "ude", "dec", "ecl"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("IsOK"), trigramsAre({"iso", "sok"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("abc_defGhij__klm"),
|
|
|
|
trigramsAre({"abc", "bcd", "cde", "def", "efg", "fgh", "ghi",
|
|
|
|
"hij", "ijk", "jkl", "klm"}));
|
|
|
|
}
|
|
|
|
|
2018-09-06 20:54:43 +08:00
|
|
|
TEST(DexSearchTokens, SymbolPath) {
|
|
|
|
EXPECT_THAT(generateProximityURIs(
|
|
|
|
"unittest:///clang-tools-extra/clangd/index/Token.h"),
|
|
|
|
ElementsAre("unittest:///clang-tools-extra/clangd/index/Token.h",
|
|
|
|
"unittest:///clang-tools-extra/clangd/index",
|
|
|
|
"unittest:///clang-tools-extra/clangd",
|
|
|
|
"unittest:///clang-tools-extra", "unittest:///"));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateProximityURIs("unittest:///a/b/c.h"),
|
|
|
|
ElementsAre("unittest:///a/b/c.h", "unittest:///a/b",
|
|
|
|
"unittest:///a", "unittest:///"));
|
|
|
|
}
|
|
|
|
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// Index tests.
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(Dex, Lookup) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"ns::abc", "ns::xyz"}), RefSlab(),
|
|
|
|
RelationSlab());
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::abc")), UnorderedElementsAre("ns::abc"));
|
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::abc"), SymbolID("ns::xyz")}),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("ns::abc", "ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::nonono"), SymbolID("ns::xyz")}),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::nonono")), UnorderedElementsAre());
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(Dex, FuzzyFind) {
|
2018-09-14 01:11:03 +08:00
|
|
|
auto Index =
|
|
|
|
Dex::build(generateSymbols({"ns::ABC", "ns::BCD", "::ABC",
|
|
|
|
"ns::nested::ABC", "other::ABC", "other::A"}),
|
2019-06-15 10:26:47 +08:00
|
|
|
RefSlab(), RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "ABC";
|
|
|
|
Req.Scopes = {"ns::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*Index, Req), UnorderedElementsAre("ns::ABC"));
|
2018-08-20 22:39:32 +08:00
|
|
|
Req.Scopes = {"ns::", "ns::nested::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*Index, Req),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("ns::ABC", "ns::nested::ABC"));
|
|
|
|
Req.Query = "A";
|
|
|
|
Req.Scopes = {"other::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*Index, Req),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("other::A", "other::ABC"));
|
|
|
|
Req.Query = "";
|
|
|
|
Req.Scopes = {};
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*Index, Req),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("ns::ABC", "ns::BCD", "::ABC",
|
|
|
|
"ns::nested::ABC", "other::ABC",
|
|
|
|
"other::A"));
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, DexLimitedNumMatches) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateNumSymbols(0, 100), RefSlab(), RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "5";
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
2018-09-13 22:27:03 +08:00
|
|
|
Req.Limit = 3;
|
2018-08-20 22:39:32 +08:00
|
|
|
bool Incomplete;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
auto Matches = match(*I, Req, &Incomplete);
|
2018-09-13 22:27:03 +08:00
|
|
|
EXPECT_TRUE(Req.Limit);
|
|
|
|
EXPECT_EQ(Matches.size(), *Req.Limit);
|
2018-08-20 22:39:32 +08:00
|
|
|
EXPECT_TRUE(Incomplete);
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, FuzzyMatch) {
|
|
|
|
auto I = Dex::build(
|
2018-09-06 20:54:43 +08:00
|
|
|
generateSymbols({"LaughingOutLoud", "LionPopulation", "LittleOldLady"}),
|
2019-06-15 10:26:47 +08:00
|
|
|
RefSlab(), RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "lol";
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
2018-09-13 22:27:03 +08:00
|
|
|
Req.Limit = 2;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*I, Req),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("LaughingOutLoud", "LittleOldLady"));
|
|
|
|
}
|
|
|
|
|
2018-10-04 22:01:55 +08:00
|
|
|
TEST(DexTest, ShortQuery) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"OneTwoThreeFour"}), RefSlab(),
|
|
|
|
RelationSlab());
|
2018-10-04 22:01:55 +08:00
|
|
|
FuzzyFindRequest Req;
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
2018-10-04 22:01:55 +08:00
|
|
|
bool Incomplete;
|
|
|
|
|
|
|
|
EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre("OneTwoThreeFour"));
|
|
|
|
EXPECT_FALSE(Incomplete) << "Empty string is not a short query";
|
|
|
|
|
|
|
|
Req.Query = "t";
|
|
|
|
EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre());
|
|
|
|
EXPECT_TRUE(Incomplete) << "Short queries have different semantics";
|
|
|
|
|
|
|
|
Req.Query = "tt";
|
|
|
|
EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre());
|
|
|
|
EXPECT_TRUE(Incomplete) << "Short queries have different semantics";
|
|
|
|
|
|
|
|
Req.Query = "ttf";
|
|
|
|
EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre("OneTwoThreeFour"));
|
|
|
|
EXPECT_FALSE(Incomplete) << "3-char string is not a short query";
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, MatchQualifiedNamesWithoutSpecificScope) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"a::y1", "b::y2", "y3"}), RefSlab(),
|
|
|
|
RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
2018-08-20 22:39:32 +08:00
|
|
|
Req.Query = "y";
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "b::y2", "y3"));
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, MatchQualifiedNamesWithGlobalScope) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"a::y1", "b::y2", "y3"}), RefSlab(),
|
|
|
|
RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {""};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("y3"));
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, MatchQualifiedNamesWithOneScope) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I =
|
|
|
|
Dex::build(generateSymbols({"a::y1", "a::y2", "a::x", "b::y2", "y3"}),
|
|
|
|
RefSlab(), RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {"a::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "a::y2"));
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, MatchQualifiedNamesWithMultipleScopes) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I =
|
|
|
|
Dex::build(generateSymbols({"a::y1", "a::y2", "a::x", "b::y3", "y3"}),
|
|
|
|
RefSlab(), RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {"a::", "b::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "a::y2", "b::y3"));
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, NoMatchNestedScopes) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"a::y1", "a::b::y2"}), RefSlab(),
|
|
|
|
RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {"a::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1"));
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
2018-09-28 02:46:00 +08:00
|
|
|
TEST(DexTest, WildcardScope) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"a::y1", "a::b::y2", "c::y3"}),
|
|
|
|
RefSlab(), RelationSlab());
|
2018-09-28 02:46:00 +08:00
|
|
|
FuzzyFindRequest Req;
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
2018-09-28 02:46:00 +08:00
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {"a::"};
|
|
|
|
EXPECT_THAT(match(*I, Req),
|
|
|
|
UnorderedElementsAre("a::y1", "a::b::y2", "c::y3"));
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, IgnoreCases) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"ns::ABC", "ns::abc"}), RefSlab(),
|
|
|
|
RelationSlab());
|
2018-08-20 22:39:32 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "AB";
|
|
|
|
Req.Scopes = {"ns::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("ns::ABC", "ns::abc"));
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
[clangd] Simplify Dex query tree logic and fix missing-posting-list bug
Summary:
The bug being fixed: when a posting list doesn't exist in the index, it
was previously just dropped from the query rather than being treated as
empty. Now that we have the FALSE iterator, we can use it instead.
The query tree logic previously had a bunch of special cases to detect whether
subtrees are empty. Now we just naively build the whole tree, and rely
on the query optimizations to drop the trivial parts.
Finally, there was a bug in trigram generation: the empty query would
generate a single trigram "$$$" instead of no trigrams.
This had no effect (there was no posting list, so the other bug
cancelled it out). But we now have to fix this bug too.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52796
llvm-svn: 343802
2018-10-05 01:18:55 +08:00
|
|
|
TEST(DexTest, UnknownPostingList) {
|
|
|
|
// Regression test: we used to ignore unknown scopes and accept any symbol.
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"ns::ABC", "ns::abc"}), RefSlab(),
|
|
|
|
RelationSlab());
|
[clangd] Simplify Dex query tree logic and fix missing-posting-list bug
Summary:
The bug being fixed: when a posting list doesn't exist in the index, it
was previously just dropped from the query rather than being treated as
empty. Now that we have the FALSE iterator, we can use it instead.
The query tree logic previously had a bunch of special cases to detect whether
subtrees are empty. Now we just naively build the whole tree, and rely
on the query optimizations to drop the trivial parts.
Finally, there was a bug in trigram generation: the empty query would
generate a single trigram "$$$" instead of no trigrams.
This had no effect (there was no posting list, so the other bug
cancelled it out). But we now have to fix this bug too.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52796
llvm-svn: 343802
2018-10-05 01:18:55 +08:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Scopes = {"ns2::"};
|
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre());
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, Lookup) {
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = Dex::build(generateSymbols({"ns::abc", "ns::xyz"}), RefSlab(),
|
|
|
|
RelationSlab());
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::abc")), UnorderedElementsAre("ns::abc"));
|
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::abc"), SymbolID("ns::xyz")}),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("ns::abc", "ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::nonono"), SymbolID("ns::xyz")}),
|
2018-08-20 22:39:32 +08:00
|
|
|
UnorderedElementsAre("ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 22:37:43 +08:00
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::nonono")), UnorderedElementsAre());
|
2018-08-20 22:39:32 +08:00
|
|
|
}
|
|
|
|
|
2018-09-24 16:45:18 +08:00
|
|
|
TEST(DexTest, SymbolIndexOptionsFilter) {
|
|
|
|
auto CodeCompletionSymbol = symbol("Completion");
|
|
|
|
auto NonCodeCompletionSymbol = symbol("NoCompletion");
|
|
|
|
CodeCompletionSymbol.Flags = Symbol::SymbolFlag::IndexedForCodeCompletion;
|
|
|
|
NonCodeCompletionSymbol.Flags = Symbol::SymbolFlag::None;
|
|
|
|
std::vector<Symbol> Symbols{CodeCompletionSymbol, NonCodeCompletionSymbol};
|
2019-06-15 10:26:47 +08:00
|
|
|
Dex I(Symbols, RefSlab(), RelationSlab());
|
2018-09-24 16:45:18 +08:00
|
|
|
FuzzyFindRequest Req;
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
2018-09-24 16:45:18 +08:00
|
|
|
Req.RestrictForCodeCompletion = false;
|
|
|
|
EXPECT_THAT(match(I, Req), ElementsAre("Completion", "NoCompletion"));
|
|
|
|
Req.RestrictForCodeCompletion = true;
|
|
|
|
EXPECT_THAT(match(I, Req), ElementsAre("Completion"));
|
|
|
|
}
|
|
|
|
|
2018-09-10 16:23:53 +08:00
|
|
|
TEST(DexTest, ProximityPathsBoosting) {
|
2018-09-06 20:54:43 +08:00
|
|
|
auto RootSymbol = symbol("root::abc");
|
|
|
|
RootSymbol.CanonicalDeclaration.FileURI = "unittest:///file.h";
|
|
|
|
auto CloseSymbol = symbol("close::abc");
|
|
|
|
CloseSymbol.CanonicalDeclaration.FileURI = "unittest:///a/b/c/d/e/f/file.h";
|
|
|
|
|
|
|
|
std::vector<Symbol> Symbols{CloseSymbol, RootSymbol};
|
2019-06-15 10:26:47 +08:00
|
|
|
Dex I(Symbols, RefSlab(), RelationSlab());
|
2018-09-06 20:54:43 +08:00
|
|
|
|
|
|
|
FuzzyFindRequest Req;
|
2018-11-06 19:08:17 +08:00
|
|
|
Req.AnyScope = true;
|
2018-09-06 20:54:43 +08:00
|
|
|
Req.Query = "abc";
|
|
|
|
// The best candidate can change depending on the proximity paths.
|
2018-09-13 22:27:03 +08:00
|
|
|
Req.Limit = 1;
|
2018-09-06 20:54:43 +08:00
|
|
|
|
|
|
|
// FuzzyFind request comes from the file which is far from the root: expect
|
|
|
|
// CloseSymbol to come out.
|
|
|
|
Req.ProximityPaths = {testPath("a/b/c/d/e/f/file.h")};
|
|
|
|
EXPECT_THAT(match(I, Req), ElementsAre("close::abc"));
|
|
|
|
|
|
|
|
// FuzzyFind request comes from the file which is close to the root: expect
|
|
|
|
// RootSymbol to come out.
|
|
|
|
Req.ProximityPaths = {testPath("file.h")};
|
|
|
|
EXPECT_THAT(match(I, Req), ElementsAre("root::abc"));
|
|
|
|
}
|
|
|
|
|
2018-10-04 17:16:12 +08:00
|
|
|
TEST(DexTests, Refs) {
|
2019-01-07 23:45:19 +08:00
|
|
|
llvm::DenseMap<SymbolID, std::vector<Ref>> Refs;
|
2018-11-14 19:55:45 +08:00
|
|
|
auto AddRef = [&](const Symbol &Sym, const char *Filename, RefKind Kind) {
|
2019-01-03 21:28:05 +08:00
|
|
|
auto &SymbolRefs = Refs[Sym.ID];
|
2018-10-04 17:16:12 +08:00
|
|
|
SymbolRefs.emplace_back();
|
|
|
|
SymbolRefs.back().Kind = Kind;
|
|
|
|
SymbolRefs.back().Location.FileURI = Filename;
|
|
|
|
};
|
|
|
|
auto Foo = symbol("foo");
|
|
|
|
auto Bar = symbol("bar");
|
|
|
|
AddRef(Foo, "foo.h", RefKind::Declaration);
|
2019-01-15 02:11:09 +08:00
|
|
|
AddRef(Foo, "foo.cc", RefKind::Definition);
|
2018-10-04 17:16:12 +08:00
|
|
|
AddRef(Foo, "reffoo.h", RefKind::Reference);
|
|
|
|
AddRef(Bar, "bar.h", RefKind::Declaration);
|
|
|
|
|
|
|
|
RefsRequest Req;
|
|
|
|
Req.IDs.insert(Foo.ID);
|
|
|
|
Req.Filter = RefKind::Declaration | RefKind::Definition;
|
2019-01-15 02:11:09 +08:00
|
|
|
|
|
|
|
std::vector<std::string> Files;
|
2019-11-13 21:42:26 +08:00
|
|
|
EXPECT_FALSE(Dex(std::vector<Symbol>{Foo, Bar}, Refs, RelationSlab())
|
|
|
|
.refs(Req, [&](const Ref &R) {
|
|
|
|
Files.push_back(R.Location.FileURI);
|
|
|
|
}));
|
2019-01-15 02:11:09 +08:00
|
|
|
EXPECT_THAT(Files, UnorderedElementsAre("foo.h", "foo.cc"));
|
2018-10-04 17:16:12 +08:00
|
|
|
|
2019-01-15 02:11:09 +08:00
|
|
|
Req.Limit = 1;
|
|
|
|
Files.clear();
|
2019-11-13 21:42:26 +08:00
|
|
|
EXPECT_TRUE(Dex(std::vector<Symbol>{Foo, Bar}, Refs, RelationSlab())
|
|
|
|
.refs(Req, [&](const Ref &R) {
|
|
|
|
Files.push_back(R.Location.FileURI);
|
|
|
|
}));
|
2019-01-15 02:11:09 +08:00
|
|
|
EXPECT_THAT(Files, ElementsAre(AnyOf("foo.h", "foo.cc")));
|
2018-10-04 17:16:12 +08:00
|
|
|
}
|
|
|
|
|
2019-06-15 10:26:47 +08:00
|
|
|
TEST(DexTests, Relations) {
|
|
|
|
auto Parent = symbol("Parent");
|
|
|
|
auto Child1 = symbol("Child1");
|
|
|
|
auto Child2 = symbol("Child2");
|
|
|
|
|
|
|
|
std::vector<Symbol> Symbols{Parent, Child1, Child2};
|
|
|
|
|
2019-10-17 22:08:28 +08:00
|
|
|
std::vector<Relation> Relations{{Parent.ID, RelationKind::BaseOf, Child1.ID},
|
|
|
|
{Parent.ID, RelationKind::BaseOf, Child2.ID}};
|
2019-06-15 10:26:47 +08:00
|
|
|
|
|
|
|
Dex I{Symbols, RefSlab(), Relations};
|
|
|
|
|
|
|
|
std::vector<SymbolID> Results;
|
|
|
|
RelationsRequest Req;
|
|
|
|
Req.Subjects.insert(Parent.ID);
|
2019-10-17 22:08:28 +08:00
|
|
|
Req.Predicate = RelationKind::BaseOf;
|
2019-06-15 10:26:47 +08:00
|
|
|
I.relations(Req, [&](const SymbolID &Subject, const Symbol &Object) {
|
|
|
|
Results.push_back(Object.ID);
|
|
|
|
});
|
|
|
|
EXPECT_THAT(Results, UnorderedElementsAre(Child1.ID, Child2.ID));
|
|
|
|
}
|
|
|
|
|
2019-02-06 23:36:23 +08:00
|
|
|
TEST(DexTest, PreferredTypesBoosting) {
|
|
|
|
auto Sym1 = symbol("t1");
|
|
|
|
Sym1.Type = "T1";
|
|
|
|
auto Sym2 = symbol("t2");
|
|
|
|
Sym2.Type = "T2";
|
|
|
|
|
|
|
|
std::vector<Symbol> Symbols{Sym1, Sym2};
|
2019-06-15 10:26:47 +08:00
|
|
|
Dex I(Symbols, RefSlab(), RelationSlab());
|
2019-02-06 23:36:23 +08:00
|
|
|
|
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.AnyScope = true;
|
|
|
|
Req.Query = "t";
|
|
|
|
// The best candidate can change depending on the preferred type.
|
|
|
|
Req.Limit = 1;
|
|
|
|
|
2020-01-29 03:23:46 +08:00
|
|
|
Req.PreferredTypes = {std::string(Sym1.Type)};
|
2019-02-06 23:36:23 +08:00
|
|
|
EXPECT_THAT(match(I, Req), ElementsAre("t1"));
|
|
|
|
|
2020-01-29 03:23:46 +08:00
|
|
|
Req.PreferredTypes = {std::string(Sym2.Type)};
|
2019-02-06 23:36:23 +08:00
|
|
|
EXPECT_THAT(match(I, Req), ElementsAre("t2"));
|
|
|
|
}
|
|
|
|
|
[clangd] Store explicit template specializations in index for code navigation purposes
Summary:
This introduces ~4k new symbols, and ~10k refs for LLVM. We need that
information for providing better code navigation support:
- When references for a class template is requested, we should return these specializations as well.
- When children of a specialization is requested, we should be able to query for those symbols(instead of just class template)
Number of symbols: 378574 -> 382784
Number of refs: 5098857 -> 5110689
Reviewers: hokein, gribozavr
Reviewed By: gribozavr
Subscribers: nridge, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59083
llvm-svn: 356125
2019-03-14 16:35:17 +08:00
|
|
|
TEST(DexTest, TemplateSpecialization) {
|
|
|
|
SymbolSlab::Builder B;
|
|
|
|
|
|
|
|
Symbol S = symbol("TempSpec");
|
2019-03-21 06:51:56 +08:00
|
|
|
S.ID = SymbolID("0");
|
[clangd] Store explicit template specializations in index for code navigation purposes
Summary:
This introduces ~4k new symbols, and ~10k refs for LLVM. We need that
information for providing better code navigation support:
- When references for a class template is requested, we should return these specializations as well.
- When children of a specialization is requested, we should be able to query for those symbols(instead of just class template)
Number of symbols: 378574 -> 382784
Number of refs: 5098857 -> 5110689
Reviewers: hokein, gribozavr
Reviewed By: gribozavr
Subscribers: nridge, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59083
llvm-svn: 356125
2019-03-14 16:35:17 +08:00
|
|
|
B.insert(S);
|
|
|
|
|
2019-03-21 06:51:56 +08:00
|
|
|
S = symbol("TempSpec");
|
|
|
|
S.ID = SymbolID("1");
|
2019-04-12 18:09:37 +08:00
|
|
|
S.TemplateSpecializationArgs = "<int, bool>";
|
[clangd] Store explicit template specializations in index for code navigation purposes
Summary:
This introduces ~4k new symbols, and ~10k refs for LLVM. We need that
information for providing better code navigation support:
- When references for a class template is requested, we should return these specializations as well.
- When children of a specialization is requested, we should be able to query for those symbols(instead of just class template)
Number of symbols: 378574 -> 382784
Number of refs: 5098857 -> 5110689
Reviewers: hokein, gribozavr
Reviewed By: gribozavr
Subscribers: nridge, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59083
llvm-svn: 356125
2019-03-14 16:35:17 +08:00
|
|
|
S.SymInfo.Properties = static_cast<index::SymbolPropertySet>(
|
|
|
|
index::SymbolProperty::TemplateSpecialization);
|
|
|
|
B.insert(S);
|
|
|
|
|
2019-03-21 06:51:56 +08:00
|
|
|
S = symbol("TempSpec");
|
|
|
|
S.ID = SymbolID("2");
|
2019-04-12 18:09:37 +08:00
|
|
|
S.TemplateSpecializationArgs = "<int, U>";
|
[clangd] Store explicit template specializations in index for code navigation purposes
Summary:
This introduces ~4k new symbols, and ~10k refs for LLVM. We need that
information for providing better code navigation support:
- When references for a class template is requested, we should return these specializations as well.
- When children of a specialization is requested, we should be able to query for those symbols(instead of just class template)
Number of symbols: 378574 -> 382784
Number of refs: 5098857 -> 5110689
Reviewers: hokein, gribozavr
Reviewed By: gribozavr
Subscribers: nridge, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59083
llvm-svn: 356125
2019-03-14 16:35:17 +08:00
|
|
|
S.SymInfo.Properties = static_cast<index::SymbolPropertySet>(
|
|
|
|
index::SymbolProperty::TemplatePartialSpecialization);
|
|
|
|
B.insert(S);
|
|
|
|
|
2019-06-15 10:26:47 +08:00
|
|
|
auto I = dex::Dex::build(std::move(B).build(), RefSlab(), RelationSlab());
|
[clangd] Store explicit template specializations in index for code navigation purposes
Summary:
This introduces ~4k new symbols, and ~10k refs for LLVM. We need that
information for providing better code navigation support:
- When references for a class template is requested, we should return these specializations as well.
- When children of a specialization is requested, we should be able to query for those symbols(instead of just class template)
Number of symbols: 378574 -> 382784
Number of refs: 5098857 -> 5110689
Reviewers: hokein, gribozavr
Reviewed By: gribozavr
Subscribers: nridge, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59083
llvm-svn: 356125
2019-03-14 16:35:17 +08:00
|
|
|
FuzzyFindRequest Req;
|
2019-03-21 06:51:56 +08:00
|
|
|
Req.AnyScope = true;
|
2019-03-20 17:43:38 +08:00
|
|
|
|
2019-04-12 18:09:37 +08:00
|
|
|
Req.Query = "TempSpec";
|
|
|
|
EXPECT_THAT(match(*I, Req),
|
|
|
|
UnorderedElementsAre("TempSpec", "TempSpec<int, bool>",
|
|
|
|
"TempSpec<int, U>"));
|
|
|
|
|
|
|
|
// FIXME: Add filtering for template argument list.
|
|
|
|
Req.Query = "TempSpec<int";
|
|
|
|
EXPECT_THAT(match(*I, Req), IsEmpty());
|
[clangd] Store explicit template specializations in index for code navigation purposes
Summary:
This introduces ~4k new symbols, and ~10k refs for LLVM. We need that
information for providing better code navigation support:
- When references for a class template is requested, we should return these specializations as well.
- When children of a specialization is requested, we should be able to query for those symbols(instead of just class template)
Number of symbols: 378574 -> 382784
Number of refs: 5098857 -> 5110689
Reviewers: hokein, gribozavr
Reviewed By: gribozavr
Subscribers: nridge, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59083
llvm-svn: 356125
2019-03-14 16:35:17 +08:00
|
|
|
}
|
|
|
|
|
2018-08-20 22:39:32 +08:00
|
|
|
} // namespace
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 18:34:57 +08:00
|
|
|
} // namespace dex
|
|
|
|
} // namespace clangd
|
|
|
|
} // namespace clang
|