MLIRContext allows its users to access directly to the DialectRegistry it
contains. While sometimes useful for registering additional dialects on an
already existing context, this breaks the encapsulation by essentially giving
raw accesses to a part of the context's internal state. Remove this mutable
access and instead provide a method to append a given DialectRegistry to the
one already contained in the context. Also provide a shortcut mechanism to
construct a context from an already existing registry, which seems to be a
common use case in the wild. Keep read-only access to the registry contained in
the context in case it needs to be copied or used for constructing another
context.
With this change, DialectRegistry is no longer concerned with loading the
dialects and deciding whether to invoke delayed interface registration. Loading
is concentrated in the MLIRContext, and the functionality of the registry
better reflects its name.
Depends On D96137
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96331
These properties were useful for a few things before traits had a better integration story, but don't really carry their weight well these days. Most of these properties are already checked via traits in most of the code. It is better to align the system around traits, and improve the performance/cost of traits in general.
Differential Revision: https://reviews.llvm.org/D96088
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
This prevents needless reinitialization for clients that want to reuse a pass manager multiple times. A new `getRegisryHash` function is exposed by the context to give a rough indicator of when the context registry has changed.
Differential Revision: https://reviews.llvm.org/D95493
Add factory to create streams for logging the reproducer. Allows for more general logging (beyond file) and logging the configuration/module separately (logged in order, configuration before module).
Also enable querying filename of ToolOutputFile.
Differential Revision: https://reviews.llvm.org/D94868
This revision adds a new `initialize(MLIRContext *)` hook to passes that allows for them to initialize any heavy state before the first execution of the pass. A concrete use case of this is with patterns that rely on PDL, given that PDL is compiled at run time it is imperative that compilation results are cached as much as possible. The first use of this hook is in the Canonicalizer, which has the added benefit of reducing the number of expensive accesses to the context when collecting patterns.
Differential Revision: https://reviews.llvm.org/D93147
Now that passes have support for running nested pipelines, the inliner can now allow for users to provide proper nested pipelines to use for optimization during inlining. This revision also changes the behavior of optimization during inlining to optimize before attempting to inline, which should lead to a more accurate cost model and prevents the need for users to schedule additional duplicate cleanup passes before/after the inliner that would already be run during inlining.
Differential Revision: https://reviews.llvm.org/D91211
This avoids dumping the module post emitting a reproducer, which results in
many MB logs where a reproducer has already been neatly generated.
Differential Revision: https://reviews.llvm.org/D93165
This was a somewhat important restriction in the past when ModuleOp was distinctly the top-level container operation, as well as before the pass manager had support for running nested pass managers natively. With these two issues fading away, there isn't really a good reason to enforce that a ModuleOp is the thing running within a pass manager. As such, this revision removes the restriction and allows for users to pass in the name of the operation that the pass manager will be scheduled on.
The only remaining dependency on BuiltinOps from Pass after this revision is due to FunctionPass, which will be resolved in a followup revision.
Differential Revision: https://reviews.llvm.org/D92450
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.
Differential Revision: https://reviews.llvm.org/D91572
A recent refactoring removed the need to interleave verifier passes and instead opted to verify during the normal execution of passes instead. As such, the old verify pass is no longer necessary and can be removed.
Differential Revision: https://reviews.llvm.org/D91212
Previous the textual form of the pass pipeline would implicitly nest,
instead we opt for the explicit form here: this has less surprise.
This also avoids asserting in the bindings when passing a pass pipeline
with incorrect nesting.
Differential Revision: https://reviews.llvm.org/D91233
The previous behavior was fragile when building an OpPassManager using a
string, as it was forcing the client to ensure the string to outlive the
entire PassManager.
This isn't a performance sensitive area either that would justify
optimizing further.
This is an error prone behavior, I frequently have ~20 min debugging sessions when I hit
an unexpected implicit nesting. This default makes the C++ API safer for users.
Depends On D90669
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90671
This simplifies a few parts of the pass manager, but in particular we don't add as many
verifierpass as there are passes in the pipeline, and we can now enable/disable the
verifier after the fact on an already built PassManager.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90669
Instead of performing a transformation, such pass yields a new pass pipeline
to run on the currently visited operation.
This feature can be used for example to implement a sub-pipeline that
would run only on an operation with specific attributes. Another example
would be to compute a cost model and dynamic schedule a pipeline based
on the result of this analysis.
Discussion: https://llvm.discourse.group/t/rfc-dynamic-pass-pipeline/1637
Recommit after fixing an ASAN issue: the callback lambda needs to be
allocated to a temporary to have its lifetime extended to the end of the
current block instead of just the current call expression.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D86392
This reverts commit 385c3f43fc.
Test mlir/test/Pass:dynamic-pipeline-fail-on-parent.mlir.test fails
when run with ASAN:
ERROR: AddressSanitizer: stack-use-after-scope on address ...
Reviewed By: bkramer, pifon2a
Differential Revision: https://reviews.llvm.org/D88079
Instead of performing a transformation, such pass yields a new pass pipeline
to run on the currently visited operation.
This feature can be used for example to implement a sub-pipeline that
would run only on an operation with specific attributes. Another example
would be to compute a cost model and dynamic schedule a pipeline based
on the result of this analysis.
Discussion: https://llvm.discourse.group/t/rfc-dynamic-pass-pipeline/1637
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D86392
This crash only happens when a function pass is followed by a module
pass. In this case the splitting of the pass pipeline didn't handle
properly the verifier passes and ended up with an odd number of pass in
the pipeline, breaking an assumption of the local crash reproducer
executor and hitting an assertion.
Differential Revision: https://reviews.llvm.org/D88000
This is allowing to build an OpPassManager from a StringRef instead of an
Identifier, which enables building pipelines without an MLIRContext.
An identifier is still cached on-demand on the OpPassManager for efficiency
during the IR traversal.
This allows to defers the check for traits to the execution instead of forcing it on the pipeline creation.
In particular, this is making our pipeline creation tolerant to dialects not being loaded in the context yet.
Reviewed By: rriddle, GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D86915
This makes OpPassManager more of a "container" of passes and not responsible to drive the execution.
As such we also make it constructible publicly, which will allow to build arbitrary pipeline decoupled from the execution. We'll make use of this facility to expose "dynamic pipeline" in the future.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D86391
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
mlir::registerDialect<mlir::standalone::StandaloneDialect>();
mlir::registerDialect<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Essentially takes the lld/Common/Threads.h wrappers and moves them to
the llvm/Support/Paralle.h algorithm header.
The changes are:
- Remove policy parameter, since all clients use `par`.
- Rename the methods to `parallelSort` etc to match LLVM style, since
they are no longer C++17 pstl compatible.
- Move algorithms from llvm::parallel:: to llvm::, since they have
"parallel" in the name and are no longer overloads of the regular
algorithms.
- Add range overloads
- Use the sequential algorithm directly when 1 thread is requested
(skips task grouping)
- Fix the index type of parallelForEachN to size_t. Nobody in LLVM was
using any other parameter, and it made overload resolution hard for
for_each_n(par, 0, foo.size(), ...) because 0 is int, not size_t.
Remove Threads.h and update LLD for that.
This is a prerequisite for parallel public symbol processing in the PDB
library, which is in LLVM.
Reviewed By: MaskRay, aganea
Differential Revision: https://reviews.llvm.org/D79390
This is useful for several reasons:
* In some situations the user can guarantee that thread-safety isn't necessary and don't want to pay the cost of synchronization, e.g., when parsing a very large module.
* For things like logging threading is not desirable as the output is not guaranteed to be in stable order.
This flag also subsumes the pass manager flag for multi-threading.
Differential Revision: https://reviews.llvm.org/D79266
These libraries are distinct from other things in Analysis in that they
operate only on core IR concepts. This also simplifies dependencies
so that Dialect -> Analysis -> Parser -> IR. Previously, the parser depended
on portions of the the Analysis directory as well, which sometimes
caused issues with the way the cmake makefile generator discovers
dependencies on generated files during compilation.
Differential Revision: https://reviews.llvm.org/D79240
The current implementation uses CrashRecoveryContext, but this only supports recovering in a certain number of cases. This revision adds a signal handler to support even more situations.
This revision was able to properly generate a reproducer for a segfault in the Inliner, that the current recovery couldn't.
Differential Revision: https://reviews.llvm.org/D78315
This revision adds a mode to the crash reproducer generator to attempt to generate a more local reproducer. This will attempt to generate a reproducer right before the offending pass that fails. This is useful for the majority of failures that are specific to a single pass, and situations where some passes in the pipeline are not registered with a specific tool.
Differential Revision: https://reviews.llvm.org/D78314
This moves the threading check to runOnOperation. This produces a much cleaner interface for the adaptor pass, and will allow for the ability to enable/disable threading in a much cleaner way in the future.
Differential Revision: https://reviews.llvm.org/D78313
These have proved incredibly useful for interleaving values between a range w.r.t to streams. After this revision, the mlir/Support/STLExtras.h is empty. A followup revision will remove it from the tree.
Differential Revision: https://reviews.llvm.org/D78067
Summary: ClassID is a bit janky right now as it involves passing a magic pointer around. This revision hides the internal implementation mechanism within a new class TypeID. This class is a value-typed wrapper around the original ClassID implementation.
Differential Revision: https://reviews.llvm.org/D77768
Summary: This hook allows for passes to specify the command line argument without the need for registration. More concretely this will allow for generating pass crash reproducers without needing to have the passes registered. This should remove the need for production tools to register passes, leaving that solely to development tools like mlir-opt.
Differential Revision: https://reviews.llvm.org/D77907
This is avoid the user to shoot themselves in the foot and encounter
strange crashes that are confusing until one run with TSAN.
Differential Revision: https://reviews.llvm.org/D75399
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.