Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
This swaps out the OpenMPDefaultClauseKind enum with a
llvm::omp::DefaultKind enum which is stored in OMPConstants.h.
This should not change any functionality.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D74513
This patch implements an almost complete handling of OpenMP
contexts/traits such that we can reuse most of the logic in Flang
through the OMPContext.{h,cpp} in llvm/Frontend/OpenMP.
All but construct SIMD specifiers, e.g., inbranch, and the device ISA
selector are define in `llvm/lib/Frontend/OpenMP/OMPKinds.def`. From
these definitions we generate the enum classes `TraitSet`,
`TraitSelector`, and `TraitProperty` as well as conversion and helper
functions in `llvm/lib/Frontend/OpenMP/OMPContext.{h,cpp}`.
The above enum classes are used in the parser, sema, and the AST
attribute. The latter is not a collection of multiple primitive variant
arguments that contain encodings via numbers and strings but instead a
tree that mirrors the `match` clause (see `struct OpenMPTraitInfo`).
The changes to the parser make it more forgiving when wrong syntax is
read and they also resulted in more specialized diagnostics. The tests
are updated and the core issues are detected as before. Here and
elsewhere this patch tries to be generic, thus we do not distinguish
what selector set, selector, or property is parsed except if they do
behave exceptionally, as for example `user={condition(EXPR)}` does.
The sema logic changed in two ways: First, the OMPDeclareVariantAttr
representation changed, as mentioned above, and the sema was adjusted to
work with the new `OpenMPTraitInfo`. Second, the matching and scoring
logic moved into `OMPContext.{h,cpp}`. It is implemented on a flat
representation of the `match` clause that is not tied to clang.
`OpenMPTraitInfo` provides a method to generate this flat structure (see
`struct VariantMatchInfo`) by computing integer score values and boolean
user conditions from the `clang::Expr` we keep for them.
The OpenMP context is now an explicit object (see `struct OMPContext`).
This is in anticipation of construct traits that need to be tracked. The
OpenMP context, as well as the `VariantMatchInfo`, are basically made up
of a set of active or respectively required traits, e.g., 'host', and an
ordered container of constructs which allows duplication. Matching and
scoring is kept as generic as possible to allow easy extension in the
future.
---
Test changes:
The messages checked in `OpenMP/declare_variant_messages.{c,cpp}` have
been auto generated to match the new warnings and notes of the parser.
The "subset" checks were reversed causing the wrong version to be
picked. The tests have been adjusted to correct this.
We do not print scores if the user did not provide one.
We print spaces to make lists in the `match` clause more legible.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: merge_guards_bot, rampitec, mgorny, hiraditya, aheejin, fedor.sergeev, simoncook, bollu, guansong, dexonsmith, jfb, s.egerton, llvm-commits, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71830
Summary:
Zero-parameter K&R definitions specify that the function has no
parameters, but they are still not prototypes, so calling the function
with the wrong number of parameters is just a warning, not an error.
The C11 standard doesn't seem to directly define what a prototype is,
but it can be inferred from 6.9.1p7: "If the declarator includes a
parameter type list, the list also specifies the types of all the
parameters; such a declarator also serves as a function prototype
for later calls to the same function in the same translation unit."
This refers to 6.7.6.3p5: "If, in the declaration “T D1”, D1 has
the form
D(parameter-type-list)
or
D(identifier-list_opt)
[...]". Later in 6.11.7 it also refers only to the parameter-type-list
variant as prototype: "The use of function definitions with separate
parameter identifier and declaration lists (not prototype-format
parameter type and identifier declarators) is an obsolescent feature."
We already correctly treat an empty parameter list as non-prototype
declaration, so we can just take that information.
GCC also warns about this with -Wstrict-prototypes.
This shouldn't affect C++, because there all FunctionType's are
FunctionProtoTypes. I added a simple test for that.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D66919
AddGoldPlugin does more than adding `-plugin path/to/LLVMgold.so`.
It works with lld and GNU ld, and adds other LTO options.
So AddGoldPlugin is no longer a suitable name.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D74591
This reverts commit 0a1123eb43.
Want to revert this because it's causing trouble for PowerPC
I also fixed test fp-model.c which was looking for an incorrect error message
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
Having tests that depend on clang inside llvm/ are not a good idea since
it can break incremental `ninja check-llvm`.
Fixes https://llvm.org/PR44798
Reviewed By: lebedev.ri, MaskRay, rsmith
Differential Revision: https://reviews.llvm.org/D74051
Summary: Adds the RedHat Linux triple to the list of 64-bit RISC-V triples.
Without this the gcc libraries wouldn't be found by clang on a redhat/fedora
system, as the search list included `/usr/lib/gcc/riscv64-redhat-linux-gnu`
but the correct path didn't include the `-gnu` suffix.
Reviewers: lenary, asb, dlj
Reviewed By: lenary
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74399
Summary:
Currently template parameters has symbolkind `Unknown`. This patch
introduces a new kind `TemplateParm` for templatetemplate, templatetype and
nontypetemplate parameters.
Also adds tests in clangd hover feature.
Reviewers: sammccall
Subscribers: kristof.beyls, ilya-biryukov, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73696
The code generation is exactly the same as it was.
But not that the special handling of untied tasks is still handled by
emitUntiedSwitch in clang.
Differential Revision: https://reviews.llvm.org/D69828
This option add a line break then a lambda is inside a function call.
Reviewers : djasper, klimek, krasimir, MyDeveloperDay
Reviewed By: MyDeveloperDay
Differential Revision: https://reviews.llvm.org/D44609
Summary:
Storing not just the callee, but the actual call may be interesting for some use-cases.
In particular, D72362 would like that to better pretty-print the cycles in call graph.
Reviewers: NoQ, erichkeane
Reviewed By: NoQ
Subscribers: martong, Charusso, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74081
DynTypedNode and ASTNodeKind are implemented as part of the clang AST
library, which uses the main clang namespace. There doesn't seem to be a
need for this extra level of namespacing.
I left behind aliases in the ast_type_traits namespace for out of tree
clients of these APIs. To provide aliases for the enumerators, I used
this pattern:
namespace ast_type_traits {
constexpr TraversalKind TK_AsIs = ::clang::TK_AsIs;
}
I think the typedefs will be useful for migration, but we might be able
to drop these enumerator aliases.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D74499
This reverts commit 80a34ae311 with fixes.
Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
This reverts commit 80a34ae311 with fixes.
On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
According to OpenMP 5.0, cancel and cancellation point constructs are
supported in taskloop directive. Added support for cancellation in
taskloop, master taskloop and parallel master taskloop.
Summary:
Both EOF and the max value of unsigned char is platform dependent. In this
patch we try our best to deduce the value of EOF from the Preprocessor,
if we can't we fall back to -1.
Reviewers: Szelethus, NoQ
Subscribers: whisperity, xazax.hun, kristof.beyls, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalh
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74473
When the clang baremetal driver selects the rt.builtins static library
it prefix with "-l" and appends ".a". The result is a nonsense option
which lld refuses to accept.
Differential Revision: https://reviews.llvm.org/D73904
Change-Id: Ic753b6104e259fbbdc059b68fccd9b933092d828
Finalization can introduce new blocks we need to outline as well so it
makes sense to identify the blocks that need to be outlined after
finalization happened. There was also a minor unit test adjustment to
account for the fact that we have a single outlined exit block now.
Reapply 8a56d64d76 with minor fixes.
The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.
This reverts commit 3aac953afa.
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74372