Commit Graph

9 Commits

Author SHA1 Message Date
Sam McCall 4d006520b8 [clangd] Clean up unused includes. NFCI
Add includes where needed to fix build.
Haven't systematically added used headers, so there is still accidental
dependency on transitive includes.
2022-02-26 12:00:16 +01:00
Christian Kühnel 5bd643d31d [clangd] cleanup of header guard names
Renaming header guards to match the LLVM convention.
This patch was created by automatically applying the fixes from
clang-tidy.

I've removed the [NFC]  tag from the title, as we're adding header guards in some files and thus might trigger behavior changes.

Differential Revision: https://reviews.llvm.org/D113896
2021-12-02 15:58:35 +00:00
Sam McCall 3c5745cb1f [clangd] Make background index thread count calculation clearer
Summary:
This confusion was inadvertently introduced in a change to the
heavyweight_hardware_concurrency API: 8404aeb56a

- don't indirect through the rebuilder policy when building the thread pool
- document that rebuilder thresholds are exposed for testing only
- don't use 0 as a sentinel value for "all threads", as we use it as a
  sentinel value for "synchronous" (though unsupported for BackgroundIndex)
- rather than pick some new sentinel value, just always use 4 threads for tests

Reviewers: kadircet, aganea

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D82352
2020-06-25 00:18:53 +02:00
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Kadir Cetinkaya 91e5f4b46b Revert "Revert r366458, r366467 and r366468"
This reverts commit 9c377105da.

[clangd][BackgroundIndexLoader] Directly store DependentTU while loading shard

Summary:
We were deferring the population of DependentTU field in LoadedShard
until BackgroundIndexLoader was consumed. This actually triggers a use after
free since the shards FileToTU was pointing at could've been moved while
consuming the Loader.

Reviewers: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D64980

llvm-svn: 366559
2019-07-19 10:18:52 +00:00
Azharuddin Mohammed 9c377105da Revert r366458, r366467 and r366468
r366458 is causing test failures. r366467 and r366468 had to be reverted as
they were casuing conflict while reverting r366458.

r366468 [clangd] Remove dead code from BackgroundIndex
r366467 [clangd] BackgroundIndex stores shards to the closest project
r366458 [clangd] Refactor background-index shard loading

llvm-svn: 366551
2019-07-19 09:26:33 +00:00
Kadir Cetinkaya 40073f922a [clangd] Refactor background-index shard loading
Reviewers: sammccall

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D64712

llvm-svn: 366458
2019-07-18 16:25:36 +00:00
Sam McCall 06377ae2e5 [clangd] Don't rebuild background index until we indexed one TU per thread.
Summary:
This increases the odds that the boosted file (cpp file matching header)
will be ready. (It always enqueues first, so it'll be present unless
another thread indexes *two* files before the first thread indexes one.)

Reviewers: kadircet

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64682

llvm-svn: 366199
2019-07-16 10:17:06 +00:00
Sam McCall 2f760c44e6 [clangd] Rewrite of logic to rebuild the background index serving structures.
Summary:
Previously it was rebuilding every 5s by default, which was much too frequent
in the long run - the goal was to provide an early build. There were also some
bugs. There were also some bugs, and a dedicated thread was used in production
but not tested.

 - rebuilds are triggered by #TUs built, rather than time. This should scale
   more sensibly to fast vs slow machines.
 - there are two separate indexed-TU thresholds to trigger index build: 5 TUs
   for the first build, 100 for subsequent rebuilds.
 - rebuild is always done on the regular indexing threads, and is affected by
   blockUntilIdle. This means unit/lit tests run the production configuration.
 - fixed a bug where we'd rebuild after attempting to load shards, even if there
   were no shards.
 - the BackgroundIndexTests don't really test the subtleties of the rebuild
   policy (for determinism, we call blockUntilIdle, so rebuild-on-idle is enough
   to pass the tests). Instead, we expose the rebuilder as a separate class and
   have fine-grained tests for it.

Reviewers: kadircet

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64291

llvm-svn: 365531
2019-07-09 18:30:49 +00:00