Range type that allows for wrapping different value & shape ranges with
correspondence to Shape's ValueShape type - initially aliased to
ValueRange (which corresponds to the trivial mapping from a ShapedType's
Value's shape to shape). Just plain alias, before expanding.
Differential Revision: https://reviews.llvm.org/D99133
AffineForOp's folding hook is expected to fold away trivially empty
affine.for. This allows simplification to happen as part of the
canonicalizer and from wherever the folding hook is used. While more
complex analysis based zero trip count detection is available from other
passes in analysis and transforms, simple and inexpensive folding had
been missing.
Also, update/improve affine.for op documentation clarifying semantics of
the result values for zero trip count loops.
Differential Revision: https://reviews.llvm.org/D106123
Introduce a new rewrite driver (MultiOpPatternRewriteDriver) to rewrite
a supplied list of ops and other ops. Provide a knob to restrict
rewrites strictly to those ops or also to affected ops (but still not to
completely related ops).
This rewrite driver is commonly needed to run any simplification and
cleanup at the end of a transforms pass or transforms utility in a way
that only simplifies relevant IR. This makes it easy to write test cases
while not performing unrelated whole IR simplification that may
invalidate other state at the caller.
The introduced utility provides more freedom to developers of transforms
and transform utilities to perform focussed and local simplification. In
several cases, it provides greater efficiency as well as more
simplification when compared to repeatedly calling
`applyOpPatternsAndFold`; in other cases, it avoids the need to
undesirably call `applyPatternsAndFoldGreedily` to do unrelated
simplification in a FuncOp.
Update a few transformations that were earlier using
applyOpPatternsAndFold (SimplifyAffineStructures,
affineDataCopyGenerate, a linalg transform).
TODO:
- OpPatternRewriteDriver can be removed as it's a special case of
MultiOpPatternRewriteDriver, i.e., both can be merged.
Differential Revision: https://reviews.llvm.org/D106232
We are able to bind NativeCodeCall result as binding operation. To make
table-gen have better understanding in the form of helper function,
we need to specify the number of return values in the NativeCodeCall
template. A VoidNativeCodeCall is added for void case.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D102160
libMLIRPublicAPI.so came into existence early when the Python and C-API were being co-developed because the Python extensions need a single DSO which exports the C-API to link against. It really should never have been exported as a mondo library in the first place, which has caused no end of problems in different linking modes, etc (i.e. the CAPI tests depended on it).
This patch does a mechanical move that:
* Makes the C-API tests link directly to their respective libraries.
* Creates a libMLIRPythonCAPI as part of the Python bindings which assemble to exact DSO that they need.
This has the effect that the C-API is no longer monolithic and can be subset and used piecemeal in a modular fashion, which is necessary for downstreams to only pay for what they use. There are additional, more fundamental changes planned for how the Python API is assembled which should make it more out of tree friendly, but this minimal first step is necessary to break the fragile dependency between the C-API and Python API.
Downstream actions required:
* If using the C-API and linking against MLIRPublicAPI, you must instead link against its constituent components. As a reference, the Python API dependencies are in lib/Bindings/Python/CMakeLists.txt and approximate the full set of dependencies available.
* If you have a Python API project that was previously linking against MLIRPublicAPI (i.e. to add its own C-API DSO), you will want to `s/MLIRPublicAPI/MLIRPythonCAPI/` and all should be as it was. There are larger changes coming in this area but this part is incremental.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D106369
This allows caller to use non-const functions, e.g., `getOperandNumber`, etc. It
is expected that OpOperand is not modified in a callback function.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D106322
The unstrided transposed conv can be represented as a regular convolution.
Lower to this variant to handle the basic case. This includes transitioning from
the TC defined convolution operation and a yaml defined one.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D106389
Added the named op variants for quantized matmul and quantized batch matmul
with the necessary lowerings/tests from tosa's matmul/fully connected ops.
Current version does not use the contraction op interface as its verifiers
are not compatible with scalar operations.
Differential Revision: https://reviews.llvm.org/D105063
Allows for grouping OpTraits with list of OpTrait to make it easier to group OpTraits together without needing to use list concats (e.g., enable using `[Traits, ..., UsefulGroupOfTraits, Others, ...]` instead of `[Traits, ...] # UsefulGroupOfTraits # [Others, ...]`). Flatten in construction of Operation. This recurses here as the expectation is that these aren't expected to be deeply nested (most likely only 1 level of nesting).
Differential Revision: https://reviews.llvm.org/D106223
For example, we will generate incorrect code for the pattern,
def : Pat<((FooOp (FooOp, $a, $b), $b)), (...)>;
We didn't allow $b to be bond twice with same operand of same op.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D105677
The `reifyReturnTypeShapesPerResultDim` method supports shape
inference for rsults that are ranked types. These are used lower in
the codegeneration stack than its counter part `reifyReturnTypeShapes`
which also supports unranked types, and is more suited for use higher
up the compilation stack. To have separation of concerns, this method
is split into its own interface.
See discussion : https://llvm.discourse.group/t/better-layering-for-infershapedtypeopinterface/3823
Differential Revision: https://reviews.llvm.org/D106133
This is the first step to support software pipeline for scf.for loops.
This is only the transformation to create pipelined kernel and
prologue/epilogue.
The scheduling needs to be given by user as many different algorithm
and heuristic could be applied.
This currently doesn't handle loop arguments, this will be added in a
follow up patch.
Differential Revision: https://reviews.llvm.org/D105868
This makes it more explicit what the scope of this pass is. The name
of this pass predates fusion on tensors using tile + fuse, and hence
the confusion.
Differential Revision: https://reviews.llvm.org/D106132
Added shape inference handles cases for convolution operations. This includes
conv2d, conv3d, depthwise_conv2d, and transpose_conv2d. With transpose conv
we use the specified output shape when possible however will shape propagate
if the output shape attribute has dynamic values.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D105645
This deletes all the pooling ops in LinalgNamedStructuredOpsSpec.tc. All the
uses are replaced with the yaml pooling ops.
Reviewed By: gysit, rsuderman
Differential Revision: https://reviews.llvm.org/D106181
This simplifies the vector to LLVM lowering. Previously, both vector.load/store and vector.transfer_read/write lowered directly to LLVM. With this commit, there is a single path to LLVM vector load/store instructions and vector.transfer_read/write ops must first be lowered to vector.load/store ops.
* Remove vector.transfer_read/write to LLVM lowering.
* Allow non-unit memref strides on all but the most minor dimension for vector.load/store ops.
* Add maxTransferRank option to populateVectorTransferLoweringPatterns.
* vector.transfer_reads with changing element type can no longer be lowered to LLVM. (This functionality is needed only for SPIRV.)
Differential Revision: https://reviews.llvm.org/D106118
The dialect-specific cast between builtin (ex-standard) types and LLVM
dialect types was introduced long time before built-in support for
unrealized_conversion_cast. It has a similar purpose, but is restricted
to compatible builtin and LLVM dialect types, which may hamper
progressive lowering and composition with types from other dialects.
Replace llvm.mlir.cast with unrealized_conversion_cast, and drop the
operation that became unnecessary.
Also make unrealized_conversion_cast legal by default in
LLVMConversionTarget as the majority of convesions using it are partial
conversions that actually want the casts to persist in the IR. The
standard-to-llvm conversion, which is still expected to run last, cleans
up the remaining casts standard-to-llvm conversion, which is still
expected to run last, cleans up the remaining casts
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D105880
This may be necessary in partial multi-stage conversion when a container type
from dialect A containing types from dialect B goes through the conversion
where only dialect A is converted to the LLVM dialect. We will need to keep a
pointer-to-non-LLVM type in the IR until a further conversion can convert
dialect B types to LLVM types.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D106076
Changes include the following:
1. Single iteration reduction loops being sibling fused at innermost insertion level
are skipped from being considered as sequential loops.
Otherwise, the slice bounds of these loops is reset.
2. Promote loops that are skipped in previous step into outer loops.
3. Two utility function - buildSliceTripCountMap, getSliceIterationCount - are moved from
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp to mlir/lib/Analysis/Utils.cpp
Reviewed By: bondhugula, vinayaka-polymage
Differential Revision: https://reviews.llvm.org/D104249
Arbitrary shifts have some complications, but shift by invariants
(viz. tensor index exp only at left hand side) can be easily
handled with the conjunctive rule.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D106002
Instead of relying on adhoc bounds calculations, use a projection-based
implementation. This simplifies the implementation and finds more static
constant sizes than previously/
Differential Revision: https://reviews.llvm.org/D106054
Factor out the functionality into a new function, so that it can be used for creating PadTensorOp tiles.
Differential Revision: https://reviews.llvm.org/D105458
The documentation on these was out of sync with the implementation. Also
the declaration of inputs was repeated when it is already part of the
ArithmeticCastOp definition.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D105934
Adds zero-preserving unary operators from std. Also adds xor.
Performs minor refactoring to remove "zero" node, and pushed
the irregular logic for negi (not support in std) into one place.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105928
Previously, linalg bufferization always had to be conservative at function boundaries and assume the most dynamic strided memref layout.
This revision introduce the mechanism to specify a linalg.buffer_layout function argument attribute that carries an affine map used to set a less pessimistic layout.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D105859
Integral AND and OR follow the simple conjunction and disjuction rules
for lattice building. This revision also completes some of the Merge
refactoring by moving the remainder parts that are merger specific from
sparsification into utils files.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105851
Pool operations perform the same shape propagation. Included the shape
propagation and tests for these avg_pool2d and max_pool2d.
Differential Revision: https://reviews.llvm.org/D105665
Right now, we only accept x/c with nonzero c, since this
conceptually can be treated as a x*(1/c) conjunction for both
FP and INT as far as lattice computations go. The codegen
keeps the division though to preserve precise semantics.
See discussion:
https://llvm.discourse.group/t/sparse-tensors-in-mlir/3389/28
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105731
Annotate LinalgNamedStructuredOps.yaml with a comment stating the file is auto-generated and should not be edited manually.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D105809
After the Math has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the Math
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D105702
This enables checking the printing flags when formatting names
in SSANameState.
Depends On D105299
Reviewed By: mehdi_amini, bondhugula
Differential Revision: https://reviews.llvm.org/D105300
Use a modeling similar to SCF ParallelOp to support arbitrary parallel
reductions. The two main differences are: (1) reductions are named and declared
beforehand similarly to functions using a special op that provides the neutral
element, the reduction code and optionally the atomic reduction code; (2)
reductions go through memory instead because this is closer to the OpenMP
semantics.
See https://llvm.discourse.group/t/rfc-openmp-reduction-support/3367.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D105358
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
For the getters, it is bad practice to keep the reference
around for too long, as explained in the new comment
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105599
This patch adds full qualification to the types and function calls used inside of (Static)InterfaceMethod in OpInterfaces. Without this patch using many of these interfaces in a downstream project yields compiler errors as the types and default implementations are mostly copied verbatim. Without then putting using namespace mlir; in the header file of the implementations of those interfaces, compilation is impossible.
Using fully qualified lookup fixes this issue.
Differential Revision: https://reviews.llvm.org/D105619
This reverts commit 6c0fd4db79.
This simple implementation is unfortunately not extensible and needs to be reverted.
The extensible way should be to extend https://reviews.llvm.org/D104321.
Introduce the exp and log function in OpDSL. Add the soft plus operator to test the emitted IR in Python and C++.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D105420
Remove the GenericOpBase class formerly used to factor out common logic shared be GenericOp and IndexedGenericOp. After removing IndexedGenericOp, the base class is not used anymore.
Differential Revision: https://reviews.llvm.org/D105307
This class and classes that extend it are general utilities for any dialect
that is being converted into the LLVM dialect. They are in no way specific to
Standard-to-LLVM conversion and should not make their users depend on it.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D105542
Simplify vector unrolling pattern to be more aligned with rest of the
patterns and be closer to vector distribution.
The new implementation uses ExtractStridedSlice/InsertStridedSlice
instead of the Tuple ops. After this change the ops based on Tuple don't
have any more used so they can be removed.
This allows removing signifcant amount of dead code and will allow
extending the unrolling code going forward.
Differential Revision: https://reviews.llvm.org/D105381
Refactor the original code to rewrite a PadTensorOp into a
sequence of InitTensorOp, FillOp and InsertSliceOp without
vectorization by default. `GenericPadTensorOpVectorizationPattern`
provides a customized OptimizeCopyFn to vectorize the
copying step.
Reviewed By: silvas, nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D105293
"Standard-to-LLVM" conversion is one of the oldest passes in existence. It has
become quite large due to the size of the Standard dialect itself, which is
being split into multiple smaller dialects. Furthermore, several conversion
features are useful for any dialect that is being converted to the LLVM
dialect, which, without this refactoring, creates a dependency from those
conversions to the "standard-to-llvm" one.
Put several of the reusable utilities from this conversion to a separate
library, namely:
- type converter from builtin to LLVM dialect types;
- utility for building and accessing values of LLVM structure type;
- utility for building and accessing values that represent memref in the LLVM
dialect;
- lowering options applicable everywhere.
Additionally, remove the type wrapping/unwrapping notion from the type
converter that is no longer relevant since LLVM types has been reimplemented as
first-class MLIR types.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D105534
Fix FlatAffineConstraints::getConstantBoundOnDimSize to ensure that
returned bounds on dim size are always non-negative regardless of the
constraints on that dimension. Add an assertion at the user.
Differential Revision: https://reviews.llvm.org/D105171
This changes the custom syntax of the emitc.include operation for standard includes.
Reviewed By: marbre
Differential Revision: https://reviews.llvm.org/D105281
Different constraints may share the same predicate, in this case, we
will generate duplicate ODS verification function.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D104369
Remove `getDynOperands` and `createOrFoldDimOp` from MemRef.h to decouple MemRef a bit from Tensor. These two functions are used in other dialects/transforms.
Differential Revision: https://reviews.llvm.org/D105260
Basically every kind of parseOptional* method in DialectAsmParser has a corresponding parse* method which will emit an error if the requested token has not been found. An odd one out of this rule is parseOptionalString which does not have a corresponding parseString method.
This patch adds that method and implements it in basically the same fashion as parseKeyword, by first going through parseOptionalString and emitting an error on failure.
Differential Revision: https://reviews.llvm.org/D105406
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.
Differential Revision: https://reviews.llvm.org/D105395
This revision extends the sparse compiler support from fp/int addition and multiplication to fp/int negation and subtraction, thereby increasing the scope of sparse kernels that can be compiled.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105306
Fusion by linearization should not happen when
- The reshape is expanding and it is a consumer
- The reshape is collapsing and is a producer.
The bug introduced in this logic by some recent refactoring resulted
in a crash.
To enforce this (negetive) use case, add a test that reproduces the
error and verifies the fix.
Differential Revision: https://reviews.llvm.org/D104970
Add the min operation to OpDSL and introduce a min pooling operation to test the implementation. The patch is a sibling of the max operation patch https://reviews.llvm.org/D105203 and the min operation is again lowered to a compare and select pair.
Differential Revision: https://reviews.llvm.org/D105345
To make TensorExp clearer, this change refactors the e0/e1 fields into a union: e0/e1 for a binary op tensor expression, and tensor_num for a tensor-kinded tensor expression.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D105303
Add the max operation to the OpDSL and introduce a max pooling operation to test the implementation. As MLIR has no builtin max operation, the max function is lowered to a compare and select pair.
Differential Revision: https://reviews.llvm.org/D105203
Tosa's PassDetail.h may be used in non-TOSA transforms. Include
TosaDialect to avoid transient dependency.
Differential Revision: https://reviews.llvm.org/D105324
Added InferReturnTypeComponents for NAry operations, reshape, and reverse.
With the additional tosa-infer-shapes pass, we can infer/propagate shapes
across a set of TOSA operations. Current version does not modify the
FuncOp type by inserting an unrealized conversion cast prior to any new
non-matchin returns.
Differential Revision: https://reviews.llvm.org/D105312
The context can be created with threading disabled, to avoid creating a thread pool
that may be destroyed when injecting another one later.
Differential Revision: https://reviews.llvm.org/D105302
Synchronizing multiple custom targets requires not only target but also
file dependencies. Building Linalg involves running yaml-gen followed by
tablegen. Currently, these custom targets are only synchronized using a
target dependency resulting in issues in specific incremental build
setups (https://llvm.discourse.group/t/missing-build-cmake-tblgen-dependency/3727/10).
This patch introduces a novel LLVM_TARGET_DEPENDS variable to the
TableGen.cmake file to provide a way to specify file dependencies.
Additionally, it adapts the Linalg CMakeLists.txt to introduce the
necessary file dependency between yaml-gen and tablegen.
Differential Revision: https://reviews.llvm.org/D105272
This results in significant deduplication of code. This patch is not expected to change any functionality, it's just some simplification in preparation for future work. Also slightly simplified some code that was being touched anyway and added some unit tests for some functions that were touched.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D105152
Rationale:
Follow-up on migrating lattice and tensor expression related methods into the new utility.
This also prepares the next step of generalizing the op kinds that are handled.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105219
This revision drops the comprehensive bufferization Function pass, which has issues when trying to bufferize constants.
Instead, only support the comprehensive-module-bufferize by default.
Differential Revision: https://reviews.llvm.org/D105228
Add helpers to facilitate adding arguments and results to operations
that implement the `FunctionLike` trait. These operations already have a
convenient argument and result *erasure* mechanism, but a corresopnding
utility for insertion is missing. This introduces such a utility.
* Split memref.dim into two operations: memref.dim and tensor.dim. Both ops have the same builder interface and op argument names, so that they can be used with templates in patterns that apply to both tensors and memrefs (e.g., some patterns in Linalg).
* Add constant materializer to TensorDialect (needed for folding in affine.apply etc.).
* Remove some MemRefDialect dependencies, make some explicit.
Differential Revision: https://reviews.llvm.org/D105165
Uses elementwise interface to generalize canonicalization pattern and add a new
pattern for vector.contract case.
Differential Revision: https://reviews.llvm.org/D104343
Similarly to batch_mat vec outer most dim is a batching dim
and this op does |b| matrix-vector-products :
C[b, i] = sum_k(A[b, i, k] * B[b, k])
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D104739
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
Extend the OpDSL syntax with an optional `domain` function to specify an explicit dimension order. The extension is needed to provide more control over the dimension order instead of deducing it implicitly depending on the formulation of the tensor comprehension. Additionally, the patch also ensures the symbols are ordered according to the operand definitions of the operation.
Differential Revision: https://reviews.llvm.org/D105117
* Previously, we were only generating .h.inc files. We foresee the need to also generate implementations and this is a step towards that.
* Discussed in https://llvm.discourse.group/t/generating-cpp-inc-files-for-dialects/3732/2
* Deviates from the discussion above by generating a default constructor in the .cpp.inc file (and adding a tablegen bit that disables this in case if this is user provided).
* Generating the destructor started as a way to flush out the missing includes (produces a link error), but it is a strict improvement on its own that is worth doing (i.e. by emitting key methods in the .cpp file, we root vtables in one translation unit, which is a non-controversial improvement).
Differential Revision: https://reviews.llvm.org/D105070
Depends On D104999
Automatic reference counting based on the liveness analysis can add a lot of reference counting overhead at runtime. If the IR is known to be constrained to few particular "shapes", it's much more efficient to provide a custom reference counting policy that will specify where it is required to update the async value reference count.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D105037
This revision adds the minimal plumbing to create a simple ComprehensiveModuleBufferizePass that can behave conservatively in the presence of CallOps.
A topological sort of caller/callee is performed and, if the call-graph is cycle-free, analysis can proceed.
Differential revision: https://reviews.llvm.org/D104859
Depends On D104891
Outlining scf.parallel body as a function requires async-parallel-for pass to be a ModuleOp pass
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D104998
Depends On D104850
Add a test that verifies that canonicalization removes all async overheads if it is statically known that the scf.parallel operation will be computed using a single block.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D104891
This patch brings support for setting runtime preemption specifiers of
LLVM's GlobalValues. In LLVM semantics, if the `dso_local` attribute
is not explicitly requested, then it is inferred based on linkage and
visibility. We model this same behavior with a UnitAttribute: if it is
present, then we explicitly request the GlobalValue to marked as
`dso_local`, otherwise we rely on the GlobalValue itself to make this
decision.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D104983
Adapt the StructuredOp verifier to ensure all operands are either in the input or the output group. The change is possible after adding support for scalar input operands (https://reviews.llvm.org/D104220).
Differential Revision: https://reviews.llvm.org/D104783
This enables creating a replacement rule where range of positional replacements
need not be spelled out, or are not known (e.g., enable having a rewrite that
forward all operands to a call generically).
Differential Revision: https://reviews.llvm.org/D104955
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
Given a select that returns the logical negation of the condition, replace it with a not of the condition.
Differential Revision: https://reviews.llvm.org/D104966
Reduce code duplication: Move various helper functions, that are duplicated in TensorDialect, MemRefDialect, LinalgDialect, StandardDialect, into a new StaticValueUtils.cpp.
Differential Revision: https://reviews.llvm.org/D104687
Depends On D104780
Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks.
Algorithm outline:
1. Collapse scf.parallel dimensions into a single dimension
2. Compute the block size for the parallel operations from the 1d problem size
3. Launch parallel tasks
4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space
5. Each parallel task computes the original parallel operation body using scf.for loop nest
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D104850
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.
Reviewed By: ftynse, herhut
Differential Revision: https://reviews.llvm.org/D104780
Moves iteration lattice/merger code into new SparseTensor/Utils directory. A follow-up CL will add lattice/merger unit tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104757
This commit moves the type translator from LLVM to MLIR to a public header for use by external projects or other code.
Unlike a previous attempt (https://reviews.llvm.org/D104726), this patch moves the type conversion into separate files which remedies the linker error which was only caught by CI.
Differential Revision: https://reviews.llvm.org/D104834
scf::ForOp bufferization analysis proceeds just like for any other op (including FuncOp) at its boundaries; i.e. if:
1. The tensor operand is inplaceable.
2. The matching result has no subsequent read (i.e. all reads dominate the scf::ForOp).
3. In and does not create a RAW interference.
then it can bufferize inplace.
Still there are a few differences:
1. bbArgs for an scf::ForOp are always considered inplaceable when seen from ops inside the body. This is because a) either the matching tensor operand is not inplaceable and an alloc will be inserted (which makes bbArg itself inplaceable); or b) the tensor operand and bbArg are both already inplaceable.
2. Bufferization within the scf::ForOp body has implications to the outside world : the scf.yield terminator may well ping-pong values of the same type. This muddies the water for alias analysis and is not supported atm. Such cases result in a pass failure.
Differential revision: https://reviews.llvm.org/D104490
Add an index_dim annotation to specify the shape to loop mapping of shape-only tensors. A shape-only tensor serves is not accessed withing the body of the operation but is required to span the iteration space of certain operations such as pooling.
Differential Revision: https://reviews.llvm.org/D104767
Extend the OpDSL with index attributes. After tensors and scalars, index attributes are the third operand type. An index attribute represents a compile-time constant that is limited to index expressions. A use cases are the strides and dilations defined by convolution and pooling operations.
The patch only updates the OpDSL. The C++ yaml codegen is updated by a followup patch.
Differential Revision: https://reviews.llvm.org/D104711
This includes a basic implementation for the OpenMP target
operation. Currently, the if, thread_limit, private, shared, device, and nowait clauses are included in this implementation.
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Reviewed By: ftynse, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D102816
In cases where arithmetic (addi/muli) ops are performed on an scf.for loops induction variable with a single use, we can fold those ops directly into the scf.for loop.
For example, in the following code:
```
scf.for %i = %c0 to %arg1 step %c1 {
%0 = addi %arg2, %i : index
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
we can lift `%0` up into the scf.for loop range, as it is the only user of %i:
```
%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
Reviewed By: mehdi_amini, ftynse, Anthony
Differential Revision: https://reviews.llvm.org/D104289
This commit moves the type translator from LLVM to MLIR to a public header for use by external projects or other code
Differential Revision: https://reviews.llvm.org/D104726
The patch changes the pretty printed FillOp operand order from output, value to value, output. The change is a follow up to https://reviews.llvm.org/D104121 that passes the fill value using a scalar input instead of the former capture semantics.
Differential Revision: https://reviews.llvm.org/D104356
Correct a documentation typo, and delete a duplicate test in
`pdl-to-pdl-interp-rewriter.mlir`.
Reviewed By: pr4tgpt, bondhugula, rriddle
Differential Revision: https://reviews.llvm.org/D104688
This revision refactors the usage of multithreaded utilities in MLIR to use a common
thread pool within the MLIR context, in addition to a new utility that makes writing
multi-threaded code in MLIR less error prone. Using a unified thread pool brings about
several advantages:
* Better thread usage and more control
We currently use the static llvm threading utilities, which do not allow multiple
levels of asynchronous scheduling (even if there are open threads). This is due to
how the current TaskGroup structure works, which only allows one truly multithreaded
instance at a time. By having our own ThreadPool we gain more control and flexibility
over our job/thread scheduling, and in a followup can enable threading more parts of
the compiler.
* The static nature of TaskGroup causes issues in certain configurations
Due to the static nature of TaskGroup, there have been quite a few problems related to
destruction that have caused several downstream projects to disable threading. See
D104207 for discussion on some related fallout. By having a ThreadPool scoped to
the context, we don't have to worry about destruction and can ensure that any
additional MLIR thread usage ends when the context is destroyed.
Differential Revision: https://reviews.llvm.org/D104516
Operations currently rely on the string name of attributes during attribute lookup/removal/replacement, in build methods, and more. This unfortunately means that some of the most used APIs in MLIR require string comparisons, additional hashing(+mutex locking) to construct Identifiers, and more. This revision remedies this by caching identifiers for all of the attributes of the operation in its corresponding AbstractOperation. Just updating the autogenerated usages brings up to a 15% reduction in compile time, greatly reducing the cost of interacting with the attributes of an operation. This number can grow even higher as we use these methods in handwritten C++ code.
Methods for accessing these cached identifiers are exposed via `<attr-name>AttrName` methods on the derived operation class. Moving forward, users should generally use these methods over raw strings when an attribute name is necessary.
Differential Revision: https://reviews.llvm.org/D104167
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename SubTensorOp -> tensor.extract_slice, SubTensorInsertOp -> tensor.insert_slice.
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Note: This is a fixed version of https://reviews.llvm.org/D104499, which was reverted due to a missing update to two CMakeFile.txt.
Differential Revision: https://reviews.llvm.org/D104676
Adapt the FillOp definition to use a scalar operand instead of a capture. This patch is a follow up to https://reviews.llvm.org/D104109. As the input operands are in front of the output operands the patch changes the internal operand order of the FillOp. The pretty printed version of the operation remains unchanged though. The patch also adapts the linalg to standard lowering to ensure the c signature of the FillOp remains unchanged as well.
Differential Revision: https://reviews.llvm.org/D104121
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename ops: SubTensorOp --> ExtractTensorOp, SubTensorInsertOp --> InsertTensorOp
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Differential Revision: https://reviews.llvm.org/D104499
Redirect the copy ctor to the actual class instead of
overwriting it with `TypeID` based ctor.
This allows the final Pass classes to have extra fields and logic for their copy.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D104302
* Remove dependency: Standard --> MemRef
* Add dependencies: GPUToNVVMTransforms --> MemRef, Linalg --> MemRef, MemRef --> Tensor
* Note: The `subtensor_insert_propagate_dest_cast` test case in MemRef/canonicalize.mlir will be moved to Tensor/canonicalize.mlir in a subsequent commit, which moves over the remaining Tensor ops from the Standard dialect to the Tensor dialect.
Differential Revision: https://reviews.llvm.org/D104506
This revision adds support for passing a functor to SourceMgrDiagnosticHandler for filtering out FileLineColLocs when emitting a diagnostic. More specifically, this can be useful in situations where there may be large CallSiteLocs with locations that aren't necessarily important/useful for users.
For now the filtering support is limited to FileLineColLocs, but conceptually we could allow filtering for all locations types if a need arises in the future.
Differential Revision: https://reviews.llvm.org/D103649
Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.
Differential Revision: https://reviews.llvm.org/D75837
This functionality is similar to delayed registration of dialect interfaces. It
allows external interface models to be registered before the dialect containing
the attribute/operation/type interface is loaded, or even before the context is
created.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104397
This is similar to attribute and type interfaces and mostly the same mechanism
(FallbackModel / ExternalModel, ODS generation). There are minor differences in
how the concept-based polymorphism is implemented for operations that are
accounted for by ODS backends, and this essentially adds a test and exposes the
API.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104294
ODS currently emits the interface trait class as a nested class inside the
interface class. As an unintended consequence, the default implementations of
interface methods have implicit access to static fields of the interface class,
e.g. those declared in `extraClassDeclaration`, including private methods (!),
or in the parent class. This may break the use of default implementations for
external models, which are not defined in the interface class, and generally
complexifies the abstraction.
Emit intraface traits outside of the interface class itself to avoid accidental
implicit visibility. Public static fields can still be accessed via explicit
qualification with a class name, e.g., `MyOpInterface::staticMethod()` instead
of `staticMethod`.
Update the documentation to clarify the role of `extraClassDeclaration` in
interfaces.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104384
Based on dicussion in
[this](https://llvm.discourse.group/t/remove-canonicalizer-for-memref-dim-via-shapedtypeopinterface/3641)
thread the pattern to resolve the `memref.dim` of a value that is a
result of an operation that implements the
`InferShapedTypeOpInterface` is moved to a separate pass instead of
running it as a canonicalization pass. This allows shape resolution to
happen when explicitly required, instead of automatically through a
canonicalization.
Differential Revision: https://reviews.llvm.org/D104321
Many tests fails by D101969 (https://reviews.llvm.org/D101969)
on big-endian machines. This patch changes bit order of
TrailingOperandStorage in big-endian machines. This patch
works on System Z (Triple = "s390x-ibm-linux", CPU = "z14").
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104225
This patch changes the (not recommended) static registration API from:
static PassRegistration<MyPass> reg("my-pass", "My Pass Description.");
to:
static PassRegistration<MyPass> reg;
And the explicit registration from:
void registerPass("my-pass", "My Pass Description.",
[] { return createMyPass(); });
To:
void registerPass([] { return createMyPass(); });
It is expected that Pass implementations overrides the getArgument() method
instead. This will ensure that pipeline description can be printed and parsed
back.
Differential Revision: https://reviews.llvm.org/D104421
This allows for dialects to do different post-processing depending on operations with the inliner (my use case requires different attribute propagation rules depending on call op). This hook runs before the regular processInlinedBlocks method.
Differential Revision: https://reviews.llvm.org/D104399
This is a very careful start with alllowing sparse tensors at the
left-hand-side of tensor index expressions (viz. sparse output).
Note that there is a subtle difference between non-annotated tensors
(dense, remain n-dim, handled by classic bufferization) and all-dense
annotated "sparse" tensors (linearized to 1-dim without overhead
storage, bufferized by sparse compiler, backed by runtime support library).
This revision gently introduces some new IR to facilitate annotated outputs,
to be generalized to truly sparse tensors in the future.
Reviewed By: gussmith23, bixia
Differential Revision: https://reviews.llvm.org/D104074
The index cast operation accepts vector types. Implement its lowering in this patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D104280
It may be desirable to provide an interface implementation for an attribute or
a type without modifying the definition of said attribute or type. Notably,
this allows to implement interfaces for attributes and types outside of the
dialect that defines them and, in particular, provide interfaces for built-in
types. Provide the mechanism to do so.
Currently, separable registration requires the attribute or type to have been
registered with the context, i.e. for the dialect containing the attribute or
type to be loaded. This can be relaxed in the future using a mechanism similar
to delayed dialect interface registration.
See https://llvm.discourse.group/t/rfc-separable-attribute-type-interfaces/3637
Depends On D104233
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104234
Interface patterns are unique in that they get added to every operation that also implements that interface, given that they aren't tied to individual operations. When the same interface pattern gets added to multiple operations (such as the current behavior with Linalg), an reference to each of these patterns is added to every op (meaning that an operation will now have N references to effectively the same pattern). This revision fixes this problematic behavior in Linalg, and can bring upwards of a 25% reduction in compile time in Linalg based workloads.
Differential Revision: https://reviews.llvm.org/D104160
This change adds `AutomaticAllocationScope` to the
memref.alloca_scope op. Additionally, it also clarifies
that alloca_scope is is conceptually a passthrough operation.
Reviewed By: ftynse, bondhugula
Differential Revision: https://reviews.llvm.org/D104227
Actually, no vector types are supported so far. We should add the traits once
the vector types are supported (e.g. ElementwiseMappable.traits).
Instead add Elementwise trait to each op.
Differential Revision: https://reviews.llvm.org/D104103
Up to now all structured op operands are assumed to be shaped. The patch relaxes this assumption and allows scalar input operands. In contrast to shaped operands scalar operands are not indexed and directly forwarded to the body of the operation. As all other operands, scalar operands are associated to an indexing map that in case of a scalar or a 0D-operand has an empty range.
We will use scalar operands as a replacement for the capture mechanism. In contrast to captures, the approach ensures we can generate the function signature from the operand list and it prevents outdated capture values in case a transformation updates only the capture operand but not the hidden body of a named operation.
Removing captures and updating existing operations such as linalg.fill is left for a later patch.
The patch depends on https://reviews.llvm.org/D103891 and https://reviews.llvm.org/D103890.
Differential Revision: https://reviews.llvm.org/D104109
* Add a helper function that returns the constant padding value (if applicable).
* Remove existing getConstantYieldValueFromBlock function, which does almost the same.
* Adapted from D103243.
Differential Revision: https://reviews.llvm.org/D104004
Add `tensor.insert` op to make `tensor.extract`/`tensor.insert` work in pairs
for `scalar` domain. Like `subtensor`/`subtensor_insert` work in pairs in
`tensor` domain, and `vector.transfer_read`/`vector.transfer_write` work in
pairs in `vector` domain.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D104139
Add support to Python bindings for the MLIR execution engine to load a
specified list of shared libraries - for eg. to use MLIR runtime
utility libraries.
Differential Revision: https://reviews.llvm.org/D104009
## Introduction
This proposal describes the new op to be added to the `std` (and later moved `memref`)
dialect called `alloca_scope`.
## Motivation
Alloca operations are easy to misuse, especially if one relies on it while doing
rewriting/conversion passes. For example let's consider a simple example of two
independent dialects, one defines an op that wants to allocate on-stack and
another defines a construct that corresponds to some form of looping:
```
dialect1.looping_op {
%x = dialect2.stack_allocating_op
}
```
Since the dialects might not know about each other they are going to define a
lowering to std/scf/etc independently:
```
scf.for … {
%x_temp = std.alloca …
… // do some domain-specific work using %x_temp buffer
… // and store the result into %result
%x = %result
}
```
Later on the scf and `std.alloca` is going to be lowered to llvm using a
combination of `llvm.alloca` and unstructured control flow.
At this point the use of `%x_temp` is bound to either be either optimized by
llvm (for example using mem2reg) or in the worst case: perform an independent
stack allocation on each iteration of the loop. While the llvm optimizations are
likely to succeed they are not guaranteed to do so, and they provide
opportunities for surprising issues with unexpected use of stack size.
## Proposal
We propose a new operation that defines a finer-grain allocation scope for the
alloca-allocated memory called `alloca_scope`:
```
alloca_scope {
%x_temp = alloca …
...
}
```
Here the lifetime of `%x_temp` is going to be bound to the narrow annotated
region within `alloca_scope`. Moreover, one can also return values out of the
alloca_scope with an accompanying `alloca_scope.return` op (that behaves
similarly to `scf.yield`):
```
%result = alloca_scope {
%x_temp = alloca …
…
alloca_scope.return %myvalue
}
```
Under the hood the `alloca_scope` is going to lowered to a combination of
`llvm.intr.stacksave` and `llvm.intr.strackrestore` that are going to be invoked
automatically as control-flow enters and leaves the body of the `alloca_scope`.
The key value of the new op is to allow deterministic guaranteed stack use
through an explicit annotation in the code which is finer-grain than the
function-level scope of `AutomaticAllocationScope` interface. `alloca_scope`
can be inserted at arbitrary locations and doesn’t require non-trivial
transformations such as outlining.
## Which dialect
Before memref dialect is split, `alloca_scope` can temporarily reside in `std`
dialect, and later on be moved to `memref` together with the rest of
memory-related operations.
## Implementation
An implementation of the op is available [here](https://reviews.llvm.org/D97768).
Original commits:
* Add initial scaffolding for alloca_scope op
* Add alloca_scope.return op
* Add no region arguments and variadic results
* Add op descriptions
* Add failing test case
* Add another failing test
* Initial implementation of lowering for std.alloca_scope
* Fix backticks
* Fix getSuccessorRegions implementation
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D97768
This is the first step to convert vector ops to MMA operations in order to
target GPUs tensor core ops. This currently only support simple cases,
transpose and element-wise operation will be added later.
Differential Revision: https://reviews.llvm.org/D102962
Create a ComplexUnaryOp base class and use it for AbsOp, ReOp and ImOp.
Sort all ops in lexicographic order.
Differential Revision: https://reviews.llvm.org/D104095
These interfaces allow for a composite attribute or type to opaquely provide access to any held attributes or types. There are several intended use cases for this interface. The first of which is to allow the printer to create aliases for non-builtin dialect attributes and types. In the future, this interface will also be extended to allow for SymbolRefAttr to be placed on other entities aside from just DictionaryAttr and ArrayAttr.
To limit potential test breakages, this revision only adds the new interfaces to the builtin attributes/types that are currently hardcoded during AsmPrinter alias generation. In a followup the remaining builtin attributes/types, and non-builtin attributes/types can be extended to support it.
Differential Revision: https://reviews.llvm.org/D102945
This allows for using other type interfaces in the builtin dialect, which currently results in a compile time failure (as it generates duplicate interface declarations).
This adds Sdot2d op, which is similar to the usual Neon
intrinsic except that it takes 2d vector operands, reflecting the
structure of the arithmetic that it's performing: 4 separate
4-dimensional dot products, whence the vector<4x4xi8> shape.
This also adds a new pass, arm-neon-2d-to-intr, lowering
this new 2d op to the 1d intrinsic.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D102504
This allows for building an outline of the symbols and symbol tables within the IR. This allows for easy navigations to functions/modules and other symbol/symbol table operations within the IR.
Differential Revision: https://reviews.llvm.org/D103729
This allow creating a matrix with all elements set to a given value. This is
needed to be able to implement a simple dot op.
Differential Revision: https://reviews.llvm.org/D103870
This is a roll forward of D102679.
This patch simplifies the implementation of Sequence and makes it compatible with llvm::reverse.
It exposes the reverse iterators through rbegin/rend which prevents a dangling reference in std::reverse_iterator::operator++().
Note: Compared to D102679, this patch introduces a `asSmallVector()` member function and fixes compilation issue with GCC 5.
Differential Revision: https://reviews.llvm.org/D103948
This brings us closer to replacing the LLVM data layout string with a
first-class layout modeling in MLIR.
Depends On D103945
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D103946
This reverts commit 08664d005c, which according to
https://reviews.llvm.org/D103373 was pushed accidentally, and I believe it
causes timeouts in some internal mlir tests.
A common mistake for newcomers to MLIR is to try to store extra member
on the Op class. However these are intended to be thing wrapper around
an Operation*, all the storage is meant to be encoded in attribute on
the underlying Operation. This can be confusing to debug, so better
catch it at build time.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103869