Commit Graph

15 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith 5207f19d10 ADT: Allow IntrusiveRefCntPtr construction from std::unique_ptr, NFC
Allow a `std::unique_ptr` to be moved into the an `IntrusiveRefCntPtr`,
and remove a couple of now-unnecessary `release()` calls.

Differential Revision: https://reviews.llvm.org/D92888
2020-12-08 17:33:19 -08:00
Reid Kleckner d7c5037e6b Prune TargetInfo.h include from ParsedAttr.h, NFC
Saves ~400 includes of related headers:

$ diff -u <(sort thedeps-before.txt) <(sort thedeps-after.txt) \
    | grep '^[-+] ' | sort | uniq -c | sort -nr
    468 -    llvm-project/clang/include/clang/Basic/TargetInfo.h
    468 -    llvm-project/clang/include/clang/Basic/TargetCXXABI.h
    368 -    llvm-project/llvm/include/llvm/Support/CodeGen.h
    368 -    llvm-project/clang/include/clang/Basic/XRayInstr.h
    368 -    llvm-project/clang/include/clang/Basic/CodeGenOptions.h
    368 -    llvm-project/clang/include/clang/Basic/CodeGenOptions.def
    367 -    llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h
    367 -    llvm-project/clang/include/clang/Basic/DebugInfoOptions.h
2020-03-11 20:47:11 -07:00
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Bjorn Pettersson 1f43ea41c3 Prune Pass.h include from DataLayout.h. NFCI
Summary:
Reduce include dependencies by no longer including Pass.h from
DataLayout.h. That include seemed irrelevant to DataLayout, as
well as being irrelevant to several users of DataLayout.

Reviewers: rnk

Reviewed By: rnk

Subscribers: mehdi_amini, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69261

llvm-svn: 375436
2019-10-21 17:51:54 +00:00
Jonas Devlieghere 2b3d49b610 [Clang] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

Differential revision: https://reviews.llvm.org/D66259

llvm-svn: 368942
2019-08-14 23:04:18 +00:00
Diego Astiazaran ba55970c15 [Tooling] Expose ExecutorConcurrency option.
D65628 requires a flag to specify the number of threads for a clang-doc step. It would be good to use ExecutorConcurrency after exposing it instead of creating a new one that has the same purpose.

Differential Revision: https://reviews.llvm.org/D65833

llvm-svn: 368196
2019-08-07 18:35:28 +00:00
Ilya Biryukov 91e5cdfc93 [Tooling] Avoid working-dir races in AllTUsToolExecutor
Reviewers: ioeric

Reviewed By: ioeric

Subscribers: jdoerfert, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D59683

llvm-svn: 356743
2019-03-22 11:01:13 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Haojian Wu 63ecf07311 [Tooling] Correct the total number of files being processed when `filter` is provided.
Reviewers: ioeric

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D54104

llvm-svn: 346135
2018-11-05 15:08:00 +00:00
Haojian Wu cd5e59f552 [Tooling] Add "-filter" option to AllTUsExecution
Summary: We can run the tools on a subset files of compilation database.

Reviewers: ioeric

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D54092

llvm-svn: 346131
2018-11-05 13:42:05 +00:00
Eric Liu 6857692687 [Tooling] Wait for all threads to finish before resetting CWD.
llvm-svn: 342028
2018-09-12 08:29:47 +00:00
Ilya Biryukov 008eda0807 [Tooling] Restore working dir in ClangTool.
Summary:
And add an option to disable this behavior. The option is only used in
AllTUsExecutor to avoid races when running concurrently on multiple
threads.

This fixes PR38869 introduced by r340937.

Reviewers: ioeric, steveire

Reviewed By: ioeric

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D51864

llvm-svn: 341910
2018-09-11 07:29:09 +00:00
Haojian Wu 9f36c7e704 [Tooling] Optimize memory usage in InMemoryToolResults.
Avoid storing duplicated "std::string"s.

clangd's global-symbol-builder takes 20+GB memory running across LLVM
repository. With this patch, the used memory is ~10GB (running on 48
threads, most of meory are AST-related).

Subscribers: klimek, cfe-commits

Differential Revision: https://reviews.llvm.org/D45479

llvm-svn: 329784
2018-04-11 08:13:07 +00:00
Eric Liu 6828ee3c1d [Tooling] Don't deduplicate tool results in the All-TUs executor.
Summary:
As result deduplication or reduction is not supported in the framework,
we should leave the deplication to tools (if needed) until the framework supports it.

Reviewers: bkramer

Subscribers: klimek, cfe-commits

Differential Revision: https://reviews.llvm.org/D42111

llvm-svn: 322691
2018-01-17 17:37:11 +00:00
Eric Liu e25f3676b0 Add a tool executor that runs actions on all TUs in the compilation database.
Summary: Tool results are deduplicated by the result key.

Reviewers: hokein

Subscribers: klimek, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D41729

llvm-svn: 321864
2018-01-05 10:32:16 +00:00