The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
Summary:
The CheckAtomic module performs two tests to determine if passing
'-latomic' to the linker is required: one for 64-bit atomics, and
another for non-64-bit atomics. clangd only uses the result from
HAVE_CXX_ATOMICS64_WITHOUT_LIB. This is incomplete because there are
uses of non-64-bit atomics in the code, such as the ReplyOnce::Replied
of type std::atomic<bool> defined in clangd/ClangdLSPServer.cpp.
Fix by also checking for the result of HAVE_CXX_ATOMICS_WITHOUT_LIB.
See also: https://reviews.llvm.org/D68964
Reviewers: ilya-biryukov, nridge, kadircet, beanz, compnerd, luismarques
Reviewed By: luismarques
Tags: #clang
Differential Revision: https://reviews.llvm.org/D69869
Summary:
Currently template parameters has symbolkind `Unknown`. This patch
introduces a new kind `TemplateParm` for templatetemplate, templatetype and
nontypetemplate parameters.
Also adds tests in clangd hover feature.
Reviewers: sammccall
Subscribers: kristof.beyls, ilya-biryukov, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73696
That's where nowadays those tests reside, those outliers were created
before the migration but committed after,
so they just awkwardly reside in the old place.
Summary:
Recursion is a powerful tool, but like any tool
without care it can be dangerous. For example,
if the recursion is unbounded, you will
eventually run out of stack and crash.
You can of course track the recursion depth
but if it is hardcoded, there can always be some
other environment when that depth is too large,
so said magic number would need to be env-dependent.
But then your program's behavior is suddenly more env-dependent.
Also, recursion, while it does not outright stop optimization,
recursive calls are less great than normal calls,
for example they hinder inlining.
Recursion is banned in some coding guidelines:
* SEI CERT DCL56-CPP. Avoid cycles during initialization of static objects
* JPL 2.4 Do not use direct or indirect recursion.
* I'd say it is frowned upon in LLVM, although not banned
And is plain unsupported in some cases:
* OpenCL 1.2, 6.9 Restrictions: i. Recursion is not supported.
So there's clearly a lot of reasons why one might want to
avoid recursion, and replace it with worklist handling.
It would be great to have a enforcement for it though.
This implements such a check.
Here we detect both direct and indirect recursive calls,
although since clang-tidy (unlike clang static analyzer)
is CTU-unaware, if the recursion transcends a single standalone TU,
we will naturally not find it :/
The algorithm is pretty straight-forward:
1. Build call-graph for the entire TU.
For that, the existing `clang::CallGraph` is re-used,
although it had to be modified to also track the location of the call.
2. Then, the hard problem: how do we detect recursion?
Since we have a graph, let's just do the sane thing,
and look for Strongly Connected Function Declarations - widely known as `SCC`.
For that LLVM provides `llvm::scc_iterator`,
which is internally an Tarjan's DFS algorithm, and is used throught LLVM,
so this should be as performant as possible.
3. Now that we've got SCC's, we discard those that don't contain loops.
Note that there may be more than one loop in SCC!
4. For each loopy SCC, we call out each function, and print a single example
call graph that shows recursion -- it didn't seem worthwhile enumerating
every possible loop in SCC, although i suppose it could be implemented.
* To come up with that call graph cycle example, we start at first SCC node,
see which callee of the node is within SCC (and is thus known to be in cycle),
and recurse into it until we hit the callee that is already in call stack.
Reviewers: JonasToth, aaron.ballman, ffrankies, Eugene.Zelenko, erichkeane, NoQ
Reviewed By: aaron.ballman
Subscribers: Charusso, Naghasan, bader, riccibruno, mgorny, Anastasia, xazax.hun, cfe-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D72362
Summary:
Storing not just the callee, but the actual call may be interesting for some use-cases.
In particular, D72362 would like that to better pretty-print the cycles in call graph.
Reviewers: NoQ, erichkeane
Reviewed By: NoQ
Subscribers: martong, Charusso, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74081
When running on Windows under the following locale:
D:\llvm-project>python
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
('French_Canada', '1252')
This patch fixes the following issue:
# command stderr:
Traceback (most recent call last):
File "D:/llvm-project/clang-tools-extra/test/../test\clang-tidy\check_clang_tidy.py", line 249, in <module>
main()
File "D:/llvm-project/clang-tools-extra/test/../test\clang-tidy\check_clang_tidy.py", line 245, in main
run_test_once(args, extra_args)
File "D:/llvm-project/clang-tools-extra/test/../test\clang-tidy\check_clang_tidy.py", line 162, in run_test_once
diff_output.decode() +
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 2050: invalid continuation byte
This is caused by diff reporting no EOL on the last line, and unfortunately this is written in French with accentuation on my locale.
Differential Revision: https://reviews.llvm.org/D74498
This reverts commit 80a34ae311 with fixes.
Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
This reverts commit 80a34ae311 with fixes.
On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
Summary:
Make it possible for the client to adjust the ranking by using the score Clangd
calculates for the completion items.
Reviewers: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74547
Summary:
Though this is not needed when using clangd's own index, other indexes
(e.g. kythe) need it, as classes and their constructors are different
symbols, otherwise we will miss renaming constructors.
Reviewers: kbobyrev
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74411
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.
There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
Summary: Addresses [[ https://bugs.llvm.org/show_bug.cgi?id=44816 | bugprone-infinite-loop false positive with CATCH2 ]] by disabling the check on loops where the condition is known to always eval as false, in other words not a loop.
Reviewers: aaron.ballman, alexfh, hokein, gribozavr2, JonasToth
Reviewed By: gribozavr2
Subscribers: xazax.hun, cfe-commits
Tags: #clang, #clang-tools-extra
Differential Revision: https://reviews.llvm.org/D74374
Summary:
Informative only, useful for positioning UI, interacting with other sources of
completion etc. As requested by an embedder of clangd.
Reviewers: usaxena95
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74305
Summary:
When renaming a class with template constructors, we are missing the
occurrences of the template constructors, because getUSRsForDeclaration doesn't
give USRs of the templated constructors (they are not in the normal `ctors()`
method).
Reviewers: kbobyrev
Subscribers: jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74216
The test got re-enabled after d54d71b67e landed.
However it seems that the order is still not deterministic as it
currently passes with -DLLVM_ENABLE_EXPENSIVE_CHECKS=OFF but randomly
fails with expensive checks ON.
Summary:
- This option forces a preamble rebuild to handle the odd case
of a missing header file being added
Reviewers: sammccall
Subscribers: ilya-biryukov, javed.absar, MaskRay, jkorous, arphaman, jfb, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73916
Summary: Such implementations may override the class's own implementation, and even be a danger in case someone later comes and adds one to the class itself. Most times this has been encountered have been a mistake.
Reviewers: stephanemoore, benhamilton, dmaclach
Reviewed By: stephanemoore, benhamilton, dmaclach
Subscribers: dmaclach, mgorny, cfe-commits
Tags: #clang-tools-extra, #clang
Differential Revision: https://reviews.llvm.org/D72876
Summary:
Clangd does not find references of designated iniitializers yet and, as a
result, is unable to rename such references. This patch addresses this issue.
Resolves: https://github.com/clangd/clangd/issues/247
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: merge_guards_bot, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72867
This change has two components. The moves the generated file
for a namespace to the directory named after the namespace in
a file named 'index.<format>'. This greatly improves the browsing
experience since the index page is shown by default for a directory.
The second improves the markdown output by adding the links to the
referenced pages for children objects and the link back to the source
code.
Patch By: Clayton
Differential Revision: https://reviews.llvm.org/D72954
whether a call is to a builtin.
We already had a general mechanism to do this but for some reason
weren't using it. In passing, check for the other unary operators that
can intervene in a reasonably-direct function call (we already handled
'&' but missed '*' and '+').
This reverts commit aaae6b1b61,
reinstating af80b8ccc5, with a fix to
clang-tidy.
Summary:
DeclarationName for cxx constructor is special, it is not an identifier.
thus the "Spelled" flag are not set for all ctor references, this patch
fixes it.
Reviewers: kbobyrev
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74125
This patch is based on D72746 and prevents non-spelled references from
being renamed which would cause incorrect behavior otherwise.
Reviewed by: hokein
Differential Revision: https://reviews.llvm.org/D74112
This patch allows the index does to provide a way to distinguish
implicit references (e.g. coming from macro expansions) from the spelled
ones. The corresponding flag was added to RefKind and symbols that are
referenced without spelling their name explicitly are now marked
implicit. This allows fixing incorrect behavior when renaming a symbol
that was referenced in macro expansions would try to rename macro
invocations.
Differential Revision: D72746
Reviewed by: hokein
Summary:
This is a fairly ugly hack - we back off several features for any variable
whose type isn't deduced, to avoid computing/caching linkage.
Better suggestions welcome.
Fixes https://github.com/clangd/clangd/issues/274
Reviewers: kadircet, kbobyrev
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73960
Summary: By default it's 512K, which is way to small for clang parser to run on. There is no way to do it via platform-independent API, so it's implemented via pthreads directly in clangd/Threading.cpp.
Fixes https://github.com/clangd/clangd/issues/273
Patch by Dmitry Kozhevnikov!
Reviewers: ilya-biryukov, sammccall, arphaman
Reviewed By: ilya-biryukov, sammccall, arphaman
Subscribers: dexonsmith, umanwizard, jfb, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D50993
Summary:
Currently we delay AST rebuilds by 500ms after each edit, to wait for
further edits. This is a win if a rebuild takes 5s, and a loss if it
takes 50ms.
This patch sets debouncepolicy = clamp(min, ratio * rebuild_time, max).
However it sets min = max = 500ms so there's no policy change or actual
customizability - will do that in a separate patch.
See https://github.com/clangd/clangd/issues/275
Reviewers: hokein
Subscribers: ilya-biryukov, javed.absar, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73873