llvm-project/clang-tools-extra/clangd/index/BackgroundRebuild.h

//===--- BackgroundIndexRebuild.h - when to rebuild the bg index--*- C++-*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file contains an implementation detail of the background indexer
// (Background.h), which is exposed for testing.
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_BACKGROUND_INDEX_REBUILD_H
#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_BACKGROUND_INDEX_REBUILD_H

#include "index/FileIndex.h"
#include "index/Index.h"
#include "llvm/Support/Threading.h"
#include <cstddef>

namespace clang {
namespace clangd {

// The BackgroundIndexRebuilder builds the serving data structures periodically
// in response to events in the background indexer. The goal is to ensure the
// served data stays fairly fresh, without wasting lots of CPU rebuilding it
// often.
//
// The index is always built after a set of shards are loaded from disk.
// This happens when clangd discovers a compilation database that we've
// previously built an index for. It's a fairly fast process that yields lots
// of data, so we wait to get all of it.
//
// The index is built after indexing a few translation units, if it wasn't built
// already. This ensures quick startup if there's no existing index.
// Waiting for a few random TUs yields coverage of the most common headers.
//
// The index is rebuilt every N TUs, to keep if fresh as files are indexed.
//
// The index is rebuilt every time the queue goes idle, if it's stale.
//
// All methods are threadsafe. They're called after FileSymbols is updated
// etc. Without external locking, the rebuilt index may include more updates
// than intended, which is fine.
//
// This class is exposed in the header so it can be tested.
class BackgroundIndexRebuilder {
public:
  BackgroundIndexRebuilder(SwapIndex *Target, FileSymbols *Source,
                           unsigned Threads)
      : TUsBeforeFirstBuild(llvm::heavyweight_hardware_concurrency(Threads)
                                .compute_thread_count()),
        Target(Target), Source(Source) {}

  // Called to indicate a TU has been indexed.
  // May rebuild, if enough TUs have been indexed.
  void indexedTU();
  // Called to indicate that all worker threads are idle.
  // May reindex, if the index is not up to date.
  void idle();
  // Called to indicate we're going to load a batch of shards from disk.
  // startLoading() and doneLoading() must be paired, but multiple loading
  // sessions may happen concurrently.
  void startLoading();
  // Called to indicate some shards were actually loaded from disk.
  void loadedShard(size_t ShardCount);
  // Called to indicate we're finished loading shards from disk.
  // May rebuild (if any were loaded).
  void doneLoading();

  // Ensures we won't start any more rebuilds.
  void shutdown();

  // Thresholds for rebuilding as TUs get indexed.
  const unsigned TUsBeforeFirstBuild; // Typically one per worker thread.
  const unsigned TUsBeforeRebuild = 100;

private:
  // Run Check under the lock, and rebuild if it returns true.
  void maybeRebuild(const char *Reason, std::function<bool()> Check);
  bool enoughTUsToRebuild() const;

  // All transient state is guarded by the mutex.
  std::mutex Mu;
  bool ShouldStop = false;
  // Index builds are versioned. ActiveVersion chases StartedVersion.
  unsigned StartedVersion = 0;
  unsigned ActiveVersion = 0;
  // How many TUs have we indexed so far since startup?
  unsigned IndexedTUs = 0;
  unsigned IndexedTUsAtLastRebuild = 0;
  // Are we loading shards? May be multiple concurrent sessions.
  unsigned Loading = 0;
  unsigned LoadedShards; // In the current loading session.

  SwapIndex *Target;
  FileSymbols *Source;
};

} // namespace clangd
} // namespace clang

#endif
[clangd] Rewrite of logic to rebuild the background index serving structures. Summary: Previously it was rebuilding every 5s by default, which was much too frequent in the long run - the goal was to provide an early build. There were also some bugs. There were also some bugs, and a dedicated thread was used in production but not tested. - rebuilds are triggered by #TUs built, rather than time. This should scale more sensibly to fast vs slow machines. - there are two separate indexed-TU thresholds to trigger index build: 5 TUs for the first build, 100 for subsequent rebuilds. - rebuild is always done on the regular indexing threads, and is affected by blockUntilIdle. This means unit/lit tests run the production configuration. - fixed a bug where we'd rebuild after attempting to load shards, even if there were no shards. - the BackgroundIndexTests don't really test the subtleties of the rebuild policy (for determinism, we call blockUntilIdle, so rebuild-on-idle is enough to pass the tests). Instead, we expose the rebuilder as a separate class and have fine-grained tests for it. Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64291 llvm-svn: 365531 2019-07-10 02:30:49 +08:00			`//===--- BackgroundIndexRebuild.h - when to rebuild the bg index--- C++--===//`
			`//`
			`// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.`
			`// See https://llvm.org/LICENSE.txt for license information.`
			`// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception`
			`//`
			`//===----------------------------------------------------------------------===//`
			`//`
			`// This file contains an implementation detail of the background indexer`
			`// (Background.h), which is exposed for testing.`
			`//`
			`//===----------------------------------------------------------------------===//`

			`#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_BACKGROUND_INDEX_REBUILD_H`
			`#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_BACKGROUND_INDEX_REBUILD_H`

			`#include "index/FileIndex.h"`
			`#include "index/Index.h"`
[clangd] Don't rebuild background index until we indexed one TU per thread. Summary: This increases the odds that the boosted file (cpp file matching header) will be ready. (It always enqueues first, so it'll be present unless another thread indexes two files before the first thread indexes one.) Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64682 llvm-svn: 366199 2019-07-16 18:17:06 +08:00			`#include "llvm/Support/Threading.h"`
Revert "Revert r366458, r366467 and r366468" This reverts commit 9c377105da0be7c2c9a3c70035ce674c71b846af. [clangd][BackgroundIndexLoader] Directly store DependentTU while loading shard Summary: We were deferring the population of DependentTU field in LoadedShard until BackgroundIndexLoader was consumed. This actually triggers a use after free since the shards FileToTU was pointing at could've been moved while consuming the Loader. Reviewers: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D64980 llvm-svn: 366559 2019-07-19 18:18:52 +08:00			`#include <cstddef>`
[clangd] Rewrite of logic to rebuild the background index serving structures. Summary: Previously it was rebuilding every 5s by default, which was much too frequent in the long run - the goal was to provide an early build. There were also some bugs. There were also some bugs, and a dedicated thread was used in production but not tested. - rebuilds are triggered by #TUs built, rather than time. This should scale more sensibly to fast vs slow machines. - there are two separate indexed-TU thresholds to trigger index build: 5 TUs for the first build, 100 for subsequent rebuilds. - rebuild is always done on the regular indexing threads, and is affected by blockUntilIdle. This means unit/lit tests run the production configuration. - fixed a bug where we'd rebuild after attempting to load shards, even if there were no shards. - the BackgroundIndexTests don't really test the subtleties of the rebuild policy (for determinism, we call blockUntilIdle, so rebuild-on-idle is enough to pass the tests). Instead, we expose the rebuilder as a separate class and have fine-grained tests for it. Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64291 llvm-svn: 365531 2019-07-10 02:30:49 +08:00
			`namespace clang {`
			`namespace clangd {`

			`// The BackgroundIndexRebuilder builds the serving data structures periodically`
			`// in response to events in the background indexer. The goal is to ensure the`
			`// served data stays fairly fresh, without wasting lots of CPU rebuilding it`
			`// often.`
			`//`
			`// The index is always built after a set of shards are loaded from disk.`
			`// This happens when clangd discovers a compilation database that we've`
			`// previously built an index for. It's a fairly fast process that yields lots`
			`// of data, so we wait to get all of it.`
			`//`
			`// The index is built after indexing a few translation units, if it wasn't built`
			`// already. This ensures quick startup if there's no existing index.`
			`// Waiting for a few random TUs yields coverage of the most common headers.`
			`//`
			`// The index is rebuilt every N TUs, to keep if fresh as files are indexed.`
			`//`
			`// The index is rebuilt every time the queue goes idle, if it's stale.`
			`//`
			`// All methods are threadsafe. They're called after FileSymbols is updated`
			`// etc. Without external locking, the rebuilt index may include more updates`
			`// than intended, which is fine.`
			`//`
			`// This class is exposed in the header so it can be tested.`
			`class BackgroundIndexRebuilder {`
			`public:`
[clangd] Don't rebuild background index until we indexed one TU per thread. Summary: This increases the odds that the boosted file (cpp file matching header) will be ready. (It always enqueues first, so it'll be present unless another thread indexes two files before the first thread indexes one.) Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64682 llvm-svn: 366199 2019-07-16 18:17:06 +08:00			`BackgroundIndexRebuilder(SwapIndex Target, FileSymbols Source,`
			`unsigned Threads)`
[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket. == Background == Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads. By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to. This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market. == The problem == The heavyweight_hardware_concurrency() API was introduced so that only one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std::thread::hardware_concurrency() -- which can only return processors from the current "processor group". == The changes in this patch == To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO). When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead. The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware core will be used. When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once. Differential Revision: https://reviews.llvm.org/D71775 2020-02-14 11:49:57 +08:00			`: TUsBeforeFirstBuild(llvm::heavyweight_hardware_concurrency(Threads)`
			`.compute_thread_count()),`
			`Target(Target), Source(Source) {}`
[clangd] Rewrite of logic to rebuild the background index serving structures. Summary: Previously it was rebuilding every 5s by default, which was much too frequent in the long run - the goal was to provide an early build. There were also some bugs. There were also some bugs, and a dedicated thread was used in production but not tested. - rebuilds are triggered by #TUs built, rather than time. This should scale more sensibly to fast vs slow machines. - there are two separate indexed-TU thresholds to trigger index build: 5 TUs for the first build, 100 for subsequent rebuilds. - rebuild is always done on the regular indexing threads, and is affected by blockUntilIdle. This means unit/lit tests run the production configuration. - fixed a bug where we'd rebuild after attempting to load shards, even if there were no shards. - the BackgroundIndexTests don't really test the subtleties of the rebuild policy (for determinism, we call blockUntilIdle, so rebuild-on-idle is enough to pass the tests). Instead, we expose the rebuilder as a separate class and have fine-grained tests for it. Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64291 llvm-svn: 365531 2019-07-10 02:30:49 +08:00
			`// Called to indicate a TU has been indexed.`
			`// May rebuild, if enough TUs have been indexed.`
			`void indexedTU();`
			`// Called to indicate that all worker threads are idle.`
			`// May reindex, if the index is not up to date.`
			`void idle();`
			`// Called to indicate we're going to load a batch of shards from disk.`
			`// startLoading() and doneLoading() must be paired, but multiple loading`
			`// sessions may happen concurrently.`
			`void startLoading();`
			`// Called to indicate some shards were actually loaded from disk.`
Revert "Revert r366458, r366467 and r366468" This reverts commit 9c377105da0be7c2c9a3c70035ce674c71b846af. [clangd][BackgroundIndexLoader] Directly store DependentTU while loading shard Summary: We were deferring the population of DependentTU field in LoadedShard until BackgroundIndexLoader was consumed. This actually triggers a use after free since the shards FileToTU was pointing at could've been moved while consuming the Loader. Reviewers: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D64980 llvm-svn: 366559 2019-07-19 18:18:52 +08:00			`void loadedShard(size_t ShardCount);`
[clangd] Rewrite of logic to rebuild the background index serving structures. Summary: Previously it was rebuilding every 5s by default, which was much too frequent in the long run - the goal was to provide an early build. There were also some bugs. There were also some bugs, and a dedicated thread was used in production but not tested. - rebuilds are triggered by #TUs built, rather than time. This should scale more sensibly to fast vs slow machines. - there are two separate indexed-TU thresholds to trigger index build: 5 TUs for the first build, 100 for subsequent rebuilds. - rebuild is always done on the regular indexing threads, and is affected by blockUntilIdle. This means unit/lit tests run the production configuration. - fixed a bug where we'd rebuild after attempting to load shards, even if there were no shards. - the BackgroundIndexTests don't really test the subtleties of the rebuild policy (for determinism, we call blockUntilIdle, so rebuild-on-idle is enough to pass the tests). Instead, we expose the rebuilder as a separate class and have fine-grained tests for it. Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64291 llvm-svn: 365531 2019-07-10 02:30:49 +08:00			`// Called to indicate we're finished loading shards from disk.`
			`// May rebuild (if any were loaded).`
			`void doneLoading();`

			`// Ensures we won't start any more rebuilds.`
			`void shutdown();`

[clangd] Don't rebuild background index until we indexed one TU per thread. Summary: This increases the odds that the boosted file (cpp file matching header) will be ready. (It always enqueues first, so it'll be present unless another thread indexes two files before the first thread indexes one.) Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64682 llvm-svn: 366199 2019-07-16 18:17:06 +08:00			`// Thresholds for rebuilding as TUs get indexed.`
			`const unsigned TUsBeforeFirstBuild; // Typically one per worker thread.`
			`const unsigned TUsBeforeRebuild = 100;`

[clangd] Rewrite of logic to rebuild the background index serving structures. Summary: Previously it was rebuilding every 5s by default, which was much too frequent in the long run - the goal was to provide an early build. There were also some bugs. There were also some bugs, and a dedicated thread was used in production but not tested. - rebuilds are triggered by #TUs built, rather than time. This should scale more sensibly to fast vs slow machines. - there are two separate indexed-TU thresholds to trigger index build: 5 TUs for the first build, 100 for subsequent rebuilds. - rebuild is always done on the regular indexing threads, and is affected by blockUntilIdle. This means unit/lit tests run the production configuration. - fixed a bug where we'd rebuild after attempting to load shards, even if there were no shards. - the BackgroundIndexTests don't really test the subtleties of the rebuild policy (for determinism, we call blockUntilIdle, so rebuild-on-idle is enough to pass the tests). Instead, we expose the rebuilder as a separate class and have fine-grained tests for it. Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64291 llvm-svn: 365531 2019-07-10 02:30:49 +08:00			`private:`
			`// Run Check under the lock, and rebuild if it returns true.`
			`void maybeRebuild(const char *Reason, std::function<bool()> Check);`
			`bool enoughTUsToRebuild() const;`

			`// All transient state is guarded by the mutex.`
			`std::mutex Mu;`
			`bool ShouldStop = false;`
			`// Index builds are versioned. ActiveVersion chases StartedVersion.`
			`unsigned StartedVersion = 0;`
			`unsigned ActiveVersion = 0;`
			`// How many TUs have we indexed so far since startup?`
			`unsigned IndexedTUs = 0;`
			`unsigned IndexedTUsAtLastRebuild = 0;`
			`// Are we loading shards? May be multiple concurrent sessions.`
			`unsigned Loading = 0;`
Revert "Revert r366458, r366467 and r366468" This reverts commit 9c377105da0be7c2c9a3c70035ce674c71b846af. [clangd][BackgroundIndexLoader] Directly store DependentTU while loading shard Summary: We were deferring the population of DependentTU field in LoadedShard until BackgroundIndexLoader was consumed. This actually triggers a use after free since the shards FileToTU was pointing at could've been moved while consuming the Loader. Reviewers: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D64980 llvm-svn: 366559 2019-07-19 18:18:52 +08:00			`unsigned LoadedShards; // In the current loading session.`
[clangd] Rewrite of logic to rebuild the background index serving structures. Summary: Previously it was rebuilding every 5s by default, which was much too frequent in the long run - the goal was to provide an early build. There were also some bugs. There were also some bugs, and a dedicated thread was used in production but not tested. - rebuilds are triggered by #TUs built, rather than time. This should scale more sensibly to fast vs slow machines. - there are two separate indexed-TU thresholds to trigger index build: 5 TUs for the first build, 100 for subsequent rebuilds. - rebuild is always done on the regular indexing threads, and is affected by blockUntilIdle. This means unit/lit tests run the production configuration. - fixed a bug where we'd rebuild after attempting to load shards, even if there were no shards. - the BackgroundIndexTests don't really test the subtleties of the rebuild policy (for determinism, we call blockUntilIdle, so rebuild-on-idle is enough to pass the tests). Instead, we expose the rebuilder as a separate class and have fine-grained tests for it. Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64291 llvm-svn: 365531 2019-07-10 02:30:49 +08:00
			`SwapIndex *Target;`
			`FileSymbols *Source;`
			`};`

			`} // namespace clangd`
			`} // namespace clang`

			`#endif`