[clangd] Improve performance of dex by 45-60%

Take full advantage of AND's iterator children size estimation: use early reset
in sync() and prevent large overhead. The idea is that the children at the
beginning of the list are smaller and cheaper to advance. Very large children
negate the effect of this performance optimisation and hence should be
advanced only when absolutely necessary. By reducing the number of large
iterators' updates, we increase the performance by a large margin.

This change was tested on a comprehensive query dataset. The performance
boost increases with the average length of the query, on small queries it is
close to 45% but the longer they go the closer it gets to 60% and beyond.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D106528
This commit is contained in:
Kirill Bobyrev 2021-07-23 15:28:31 +02:00
parent 1528a4d400
commit a0987e350c
No known key found for this signature in database
GPG Key ID: 2307C055C8384FA0
1 changed files with 10 additions and 2 deletions

View File

@ -104,11 +104,19 @@ private:
// In this case, just terminate the process.
if (ReachedEnd)
return;
// Cache the result so that peek() is not called again as it may be
// quite expensive in AND with large subtrees.
auto Candidate = Child->peek();
// If any child goes beyond given ID (i.e. ID is not the common item),
// all children should be advanced to the next common item.
if (Child->peek() > SyncID) {
SyncID = Child->peek();
if (Candidate > SyncID) {
SyncID = Candidate;
NeedsAdvance = true;
// Reset and try to sync again. Sync starts with the first child as
// this is the cheapest (smallest size estimate). This way advanceTo
// is called on the large posting lists once the sync point is very
// likely.
break;
}
}
} while (NeedsAdvance);