This includes a basic implementation for the OpenMP target
operation. Currently, the if, thread_limit, private, shared, device, and nowait clauses are included in this implementation.
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Reviewed By: ftynse, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D102816
In cases where arithmetic (addi/muli) ops are performed on an scf.for loops induction variable with a single use, we can fold those ops directly into the scf.for loop.
For example, in the following code:
```
scf.for %i = %c0 to %arg1 step %c1 {
%0 = addi %arg2, %i : index
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
we can lift `%0` up into the scf.for loop range, as it is the only user of %i:
```
%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
Reviewed By: mehdi_amini, ftynse, Anthony
Differential Revision: https://reviews.llvm.org/D104289
This commit moves the type translator from LLVM to MLIR to a public header for use by external projects or other code
Differential Revision: https://reviews.llvm.org/D104726
The patch changes the pretty printed FillOp operand order from output, value to value, output. The change is a follow up to https://reviews.llvm.org/D104121 that passes the fill value using a scalar input instead of the former capture semantics.
Differential Revision: https://reviews.llvm.org/D104356
During slice computation of affine loop fusion, detect one id as the mod
of another id w.r.t a constant in a more generic way. Restrictions on
co-efficients of the ids is removed. Also, information from the
previously calculated ids is used for simplification of affine
expressions, e.g.,
If `id1` = `id2`,
`id_n - divisor * id_q - id_r + id1 - id2 = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
If `c` is a non-zero integer,
`c*id_n - c*divisor * id_q - c*id_r = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D104614
Correct a documentation typo, and delete a duplicate test in
`pdl-to-pdl-interp-rewriter.mlir`.
Reviewed By: pr4tgpt, bondhugula, rriddle
Differential Revision: https://reviews.llvm.org/D104688
This revision refactors the usage of multithreaded utilities in MLIR to use a common
thread pool within the MLIR context, in addition to a new utility that makes writing
multi-threaded code in MLIR less error prone. Using a unified thread pool brings about
several advantages:
* Better thread usage and more control
We currently use the static llvm threading utilities, which do not allow multiple
levels of asynchronous scheduling (even if there are open threads). This is due to
how the current TaskGroup structure works, which only allows one truly multithreaded
instance at a time. By having our own ThreadPool we gain more control and flexibility
over our job/thread scheduling, and in a followup can enable threading more parts of
the compiler.
* The static nature of TaskGroup causes issues in certain configurations
Due to the static nature of TaskGroup, there have been quite a few problems related to
destruction that have caused several downstream projects to disable threading. See
D104207 for discussion on some related fallout. By having a ThreadPool scoped to
the context, we don't have to worry about destruction and can ensure that any
additional MLIR thread usage ends when the context is destroyed.
Differential Revision: https://reviews.llvm.org/D104516
Slowly we are moving toward full support of sparse tensor *outputs*. First
step was support for all-dense annotated "sparse" tensors. This step adds
support for truly sparse tensors, but only for operations in which the values
of a tensor change, but not the nonzero structure (this was refered to as
"simply dynamic" in the [Bik96] thesis).
Some background text was posted on discourse:
https://llvm.discourse.group/t/sparse-tensors-in-mlir/3389/25
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D104577
This used to be important for reducing lock contention when accessing identifiers, but
the cost of the cache can be quite large if parsing in a multi-threaded context. After
D104167, the win of keeping a cache is not worth the cost.
Differential Revision: https://reviews.llvm.org/D104737
Operations currently rely on the string name of attributes during attribute lookup/removal/replacement, in build methods, and more. This unfortunately means that some of the most used APIs in MLIR require string comparisons, additional hashing(+mutex locking) to construct Identifiers, and more. This revision remedies this by caching identifiers for all of the attributes of the operation in its corresponding AbstractOperation. Just updating the autogenerated usages brings up to a 15% reduction in compile time, greatly reducing the cost of interacting with the attributes of an operation. This number can grow even higher as we use these methods in handwritten C++ code.
Methods for accessing these cached identifiers are exposed via `<attr-name>AttrName` methods on the derived operation class. Moving forward, users should generally use these methods over raw strings when an attribute name is necessary.
Differential Revision: https://reviews.llvm.org/D104167
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename SubTensorOp -> tensor.extract_slice, SubTensorInsertOp -> tensor.insert_slice.
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Note: This is a fixed version of https://reviews.llvm.org/D104499, which was reverted due to a missing update to two CMakeFile.txt.
Differential Revision: https://reviews.llvm.org/D104676
Adapt the FillOp definition to use a scalar operand instead of a capture. This patch is a follow up to https://reviews.llvm.org/D104109. As the input operands are in front of the output operands the patch changes the internal operand order of the FillOp. The pretty printed version of the operation remains unchanged though. The patch also adapts the linalg to standard lowering to ensure the c signature of the FillOp remains unchanged as well.
Differential Revision: https://reviews.llvm.org/D104121
TosaMakeBroadcastable needs to include tosa.div, which was added later in the
specification.
Reviewed By: sjarus, NatashaKnk
Differential Revision: https://reviews.llvm.org/D104157
The approximation relays on range reduced version y \in [0, pi/2]. An input x will have
the property that sin(x) = sin(y), -sin(y), cos(y), -cos(y) depends on which quadrable x
is in, where sin(y) and cos(y) are approximated with 5th degree polynomial (of x^2).
As a result a single pattern can be used to compute approximation for both sine and cosine.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D104582
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename ops: SubTensorOp --> ExtractTensorOp, SubTensorInsertOp --> InsertTensorOp
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Differential Revision: https://reviews.llvm.org/D104499
Redirect the copy ctor to the actual class instead of
overwriting it with `TypeID` based ctor.
This allows the final Pass classes to have extra fields and logic for their copy.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D104302
* Remove dependency: Standard --> MemRef
* Add dependencies: GPUToNVVMTransforms --> MemRef, Linalg --> MemRef, MemRef --> Tensor
* Note: The `subtensor_insert_propagate_dest_cast` test case in MemRef/canonicalize.mlir will be moved to Tensor/canonicalize.mlir in a subsequent commit, which moves over the remaining Tensor ops from the Standard dialect to the Tensor dialect.
Differential Revision: https://reviews.llvm.org/D104506
This revision adds a BufferizationAliasInfo which maintains and updates information about which tensors will alias once bufferized, which bufferized tensors are equivalent to others and how to handle clobbers.
Bufferization greedily tries to bufferize inplace by:
1. first trying to bufferize SubTensorInsertOp inplace, in reverse order (these are deemed the most expensives).
2. then trying to bufferize all non SubTensorOp / SubTensorInsertOp, in reverse order.
3. lastly trying to bufferize all SubTensorOp in reverse order.
Reverse order is a heuristic that seems to work nicely because structured tensor codegen very often proceeds by:
1. take a subset of a tensor
2. compute on that subset
3. insert the result subset into the full tensor and yield a new tensor.
BufferizationAliasInfo + equivalence sets + clobber analysis allows bufferizing nested
subtensor/compute/subtensor_insert sequences inplace to a certain extent.
To fully realize inplace bufferization, additional container-containee analysis will be necessary and is left for a subsequent commit.
Differential revision: https://reviews.llvm.org/D104110
This revision adds support for passing a functor to SourceMgrDiagnosticHandler for filtering out FileLineColLocs when emitting a diagnostic. More specifically, this can be useful in situations where there may be large CallSiteLocs with locations that aren't necessarily important/useful for users.
For now the filtering support is limited to FileLineColLocs, but conceptually we could allow filtering for all locations types if a need arises in the future.
Differential Revision: https://reviews.llvm.org/D103649
Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.
Differential Revision: https://reviews.llvm.org/D75837
This functionality is similar to delayed registration of dialect interfaces. It
allows external interface models to be registered before the dialect containing
the attribute/operation/type interface is loaded, or even before the context is
created.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104397
This is similar to attribute and type interfaces and mostly the same mechanism
(FallbackModel / ExternalModel, ODS generation). There are minor differences in
how the concept-based polymorphism is implemented for operations that are
accounted for by ODS backends, and this essentially adds a test and exposes the
API.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104294
ODS currently emits the interface trait class as a nested class inside the
interface class. As an unintended consequence, the default implementations of
interface methods have implicit access to static fields of the interface class,
e.g. those declared in `extraClassDeclaration`, including private methods (!),
or in the parent class. This may break the use of default implementations for
external models, which are not defined in the interface class, and generally
complexifies the abstraction.
Emit intraface traits outside of the interface class itself to avoid accidental
implicit visibility. Public static fields can still be accessed via explicit
qualification with a class name, e.g., `MyOpInterface::staticMethod()` instead
of `staticMethod`.
Update the documentation to clarify the role of `extraClassDeclaration` in
interfaces.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104384
Based on dicussion in
[this](https://llvm.discourse.group/t/remove-canonicalizer-for-memref-dim-via-shapedtypeopinterface/3641)
thread the pattern to resolve the `memref.dim` of a value that is a
result of an operation that implements the
`InferShapedTypeOpInterface` is moved to a separate pass instead of
running it as a canonicalization pass. This allows shape resolution to
happen when explicitly required, instead of automatically through a
canonicalization.
Differential Revision: https://reviews.llvm.org/D104321
Many tests fails by D101969 (https://reviews.llvm.org/D101969)
on big-endian machines. This patch changes bit order of
TrailingOperandStorage in big-endian machines. This patch
works on System Z (Triple = "s390x-ibm-linux", CPU = "z14").
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104225
When the vscode extension is published, it may be unclear how to contribute improvements to the extension. This revision makes it clear that contributions should follow the traditional LLVM guidelines.
This patch changes the (not recommended) static registration API from:
static PassRegistration<MyPass> reg("my-pass", "My Pass Description.");
to:
static PassRegistration<MyPass> reg;
And the explicit registration from:
void registerPass("my-pass", "My Pass Description.",
[] { return createMyPass(); });
To:
void registerPass([] { return createMyPass(); });
It is expected that Pass implementations overrides the getArgument() method
instead. This will ensure that pipeline description can be printed and parsed
back.
Differential Revision: https://reviews.llvm.org/D104421
Adds an integration test for the SPMM (sparse matrix multiplication) kernel, which multiplies a sparse matrix by a dense matrix, resulting in a dense matrix. This is just a simple modification on the existing matrix-vector multiplication kernel.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104334
Make store to load fwd condition for -memref-dataflow-opt less
conservative. Post dominance info is not really needed. Add additional
check for common cases.
Differential Revision: https://reviews.llvm.org/D104174
To control the number of outer parallel loops, we need to process the
outer loops first and hence pre-order walk fixes the issue.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D104361
This allows for dialects to do different post-processing depending on operations with the inliner (my use case requires different attribute propagation rules depending on call op). This hook runs before the regular processInlinedBlocks method.
Differential Revision: https://reviews.llvm.org/D104399
In a region with multiple blocks the verifier will try to look for
dominance and may get successor list for blocks, even though a block
may be empty or does not end with a terminator.
Differential Revision: https://reviews.llvm.org/D104411
We have several ways of introducing a scalar invariant value into
linalg generic ops (should we limit this somewhat?). This revision
makes sure we handle all of them correctly in the sparse compiler.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D104335
Default implementations of interfaces may rely on extra class
declarations, which aren't currently generated in the external model,
that in turn may rely on functions defined in the main Attribute/Type
class, which wouldn't be available on the external model.
This is a very careful start with alllowing sparse tensors at the
left-hand-side of tensor index expressions (viz. sparse output).
Note that there is a subtle difference between non-annotated tensors
(dense, remain n-dim, handled by classic bufferization) and all-dense
annotated "sparse" tensors (linearized to 1-dim without overhead
storage, bufferized by sparse compiler, backed by runtime support library).
This revision gently introduces some new IR to facilitate annotated outputs,
to be generalized to truly sparse tensors in the future.
Reviewed By: gussmith23, bixia
Differential Revision: https://reviews.llvm.org/D104074
The index cast operation accepts vector types. Implement its lowering in this patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D104280
It may be desirable to provide an interface implementation for an attribute or
a type without modifying the definition of said attribute or type. Notably,
this allows to implement interfaces for attributes and types outside of the
dialect that defines them and, in particular, provide interfaces for built-in
types. Provide the mechanism to do so.
Currently, separable registration requires the attribute or type to have been
registered with the context, i.e. for the dialect containing the attribute or
type to be loaded. This can be relaxed in the future using a mechanism similar
to delayed dialect interface registration.
See https://llvm.discourse.group/t/rfc-separable-attribute-type-interfaces/3637
Depends On D104233
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104234
The patch replaces the existing capture functionality by scalar operands that have been introduced by https://reviews.llvm.org/D104109. Scalar operands behave as tensor operands except for the fact that they are not indexed. As a result ScalarDefs can be accessed directly as no indexing expression is needed.
The patch only updates the OpDSL. The C++ side is updated by a follow up patch.
Differential Revision: https://reviews.llvm.org/D104220
This doesn't add any canonicalizations, but executes the same
simplification on bufferSemantic linalg.generic ops by using
linalg::ReshapeOp instead of linalg::TensorReshapeOp.
Differential Revision: https://reviews.llvm.org/D103513
This is useful for "build tuple" type ops. In my case, in npcomp, I have
an op:
```
// Result type is `!torch.tuple<!torch.tensor, !torch.tensor>`.
torch.prim.TupleConstruct %0, %1 : !torch.tensor, !torch.tensor
```
and the context is required for the `Torch::TupleType::get` call (for
the case of an empty tuple).
The handling of these FmtContext's in the code is pretty ad-hoc -- I didn't
attempt to rationalize it and just made a targeted fix. As someone
unfamiliar with the code I had a hard time seeing how to more broadly fix
the situation.
Differential Revision: https://reviews.llvm.org/D104274
The parser of generic op did not recognize the output from mlir-opt when there
are multiple outputs. One would wrap the result types with braces, and one would
not. The patch makes the behavior the same.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D104256
This changes the pass manager to not rerun the verifier when a pass says it
didn't change anything or after an OpToOpPassAdaptor, since neither of those
cases need verification (and if the pass lied, then there will be much larger
semantic problems than will be caught by the verifier).
This maintains behavior in EXPENSIVE_CHECKS mode.
Differential Revision: https://reviews.llvm.org/D104243
Interface patterns are unique in that they get added to every operation that also implements that interface, given that they aren't tied to individual operations. When the same interface pattern gets added to multiple operations (such as the current behavior with Linalg), an reference to each of these patterns is added to every op (meaning that an operation will now have N references to effectively the same pattern). This revision fixes this problematic behavior in Linalg, and can bring upwards of a 25% reduction in compile time in Linalg based workloads.
Differential Revision: https://reviews.llvm.org/D104160
This changes the outer verification loop to not recurse into
IsolatedFromAbove operations - instead return them up to a place
where a parallel for loop can process them all in parallel. This
also changes Dominance checking to happen on IsolatedFromAbove
chunks of the region tree, which makes it easy to fold operation
and dominance verification into a single simple parallel regime.
This speeds up firtool in CIRCT from ~40s to 31s on a large
testcase in -verify-each mode (the default). The .fir parser and
module passes in particular benefit from this - FModule passes
(roughly analogous to function passes) were already running the
verifier in parallel as part of the pass manager. This allows
the whole-module passes to verify their enclosed functions /
FModules in parallel.
-verify-each mode is still faster (26.3s on the same testcase),
but we do expect the verifier to take *some* time.
Differential Revision: https://reviews.llvm.org/D104207
This change adds `AutomaticAllocationScope` to the
memref.alloca_scope op. Additionally, it also clarifies
that alloca_scope is is conceptually a passthrough operation.
Reviewed By: ftynse, bondhugula
Differential Revision: https://reviews.llvm.org/D104227
There's no need for `toSmallVector()` as `SmallVector.h` already provides a `to_vector` free function that takes a range.
Reviewed By: Quuxplusone
Differential Revision: https://reviews.llvm.org/D104024
Actually, no vector types are supported so far. We should add the traits once
the vector types are supported (e.g. ElementwiseMappable.traits).
Instead add Elementwise trait to each op.
Differential Revision: https://reviews.llvm.org/D104103
Up to now all structured op operands are assumed to be shaped. The patch relaxes this assumption and allows scalar input operands. In contrast to shaped operands scalar operands are not indexed and directly forwarded to the body of the operation. As all other operands, scalar operands are associated to an indexing map that in case of a scalar or a 0D-operand has an empty range.
We will use scalar operands as a replacement for the capture mechanism. In contrast to captures, the approach ensures we can generate the function signature from the operand list and it prevents outdated capture values in case a transformation updates only the capture operand but not the hidden body of a named operation.
Removing captures and updating existing operations such as linalg.fill is left for a later patch.
The patch depends on https://reviews.llvm.org/D103891 and https://reviews.llvm.org/D103890.
Differential Revision: https://reviews.llvm.org/D104109
The padding of such ops is not generated in a vectorized way. Instead, emit a tensor::GenerateOp.
We may vectorize GenerateOps in the future.
Differential Revision: https://reviews.llvm.org/D103879