llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexandre Ganea	8404aeb56a	[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket. == Background == Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads. By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to. This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market. == The problem == The heavyweight_hardware_concurrency() API was introduced so that only one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group". == The changes in this patch == To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO). When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead. The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware core will be used. When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once. Differential Revision: https://reviews.llvm.org/D71775	2020-02-14 10:24:22 -05:00
Sam McCall	18e6a65bae	[Support] Fix race in threading test, found by TSan	2020-01-25 15:22:12 +01:00
Sam McCall	a9c3c176ad	Reland "[Support] Add a way to run a function on a detached thread"" This reverts commit `7bc7fe6b78`. The immediate callers have been fixed to pass nullopt where appropriate.	2019-10-23 15:51:44 +02:00
Sam McCall	7bc7fe6b78	Revert "[Support] Add a way to run a function on a detached thread" This reverts commit `40668abca4`. This causes clang tests to fail, as stacksize=0 is being explicitly passed and is no longer a no-op.	2019-10-23 15:10:35 +02:00
Sam McCall	40668abca4	[Support] Add a way to run a function on a detached thread This roughly mimics `std::thread(...).detach()` except it allows to customize the stack size. Required for https://reviews.llvm.org/D50993. I've decided against reusing the existing `llvm_execute_on_thread` because it's not obvious what to do with the ownership of the passed function/arguments: 1. If we pass possibly owning functions data to `llvm_execute_on_thread`, we'll lose the ability to pass small non-owning non-allocating functions for the joining case (as it's used now). Is it important enough? 2. If we use the non-owning interface in the new use case, we'll force clients to transfer ownership to the spawned thread manually, but similar code would still have to exist inside `llvm_execute_on_thread(_async)` anyway (as we can't just pass the same non-owning pointer to pthreads and Windows implementations, and would be forced to wrap it in some structure, and deal with its ownership. Patch by Dmitry Kozhevnikov! Differential Revision: https://reviews.llvm.org/D51103	2019-10-23 12:48:38 +02:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Teresa Johnson	c0ef9e4316	Rename interface for querying physical hardware concurrency Based on post-commit review for D25585/r284180, rename hardware_physical_concurrency to heavyweight_hardware_concurrency, to better reflect what type of tasks it should be used for and to enable other systems to map this to something other than the number of physical cores. llvm-svn: 284390	2016-10-17 14:56:53 +00:00
Teresa Johnson	2bd812c5dc	Add interface for querying physical hardware concurrency Summary: This will be used by ThinLTO to set the amount of backend parallelism, which performs better when restricted to the number of physical cores (on X86 at least, where getHostNumPhysicalCores is currently defined). If not available this falls back to thread::hardware_concurrency. Note I didn't add to the thread class since that is a typedef to std::thread where available. Reviewers: mehdi_amini Subscribers: beanz, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D25585 llvm-svn: 284180	2016-10-14 00:13:59 +00:00

8 Commits