The discussion on forum:
https://llvm.discourse.group/t/bug-in-partial-dialect-conversion/4115
The `applyPartialConversion` didn't handle the operations, that were
marked as illegal inside dynamic legality callback.
Instead of reporting error, if such operation was not converted to legal set,
the method just added it to `unconvertedSet` in the same way as unknown operations.
This patch fixes that and handle dynamically illegal operations as well.
The patch includes 2 fixes for existing passes:
* `tensor-bufferize` - explicitly mark `std.return` as legal.
* `convert-parallel-loops-to-gpu` - ugly fix with marking visited operations
to avoid recursive legality checks.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108505
Add a constant propagator for gpu.launch op in cases where the
grid/thread IDs can be trivially determined to take a single constant
value of zero.
Differential Revision: https://reviews.llvm.org/D109994
Note that this revision adds a very tiny bit of constant folding in the
sparse compiler lattice construction. Although I am generally trying to
avoid such canonicalizations (and rely on other passes to fix this instead),
the benefits of avoiding a very expensive disjunction lattice construction
justify having this special code (at least for now).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109939
It is the case that, for all positive a and b such that b divides a
(e mod (a * b)) mod b = e mod b. For example, ((d0 mod 35) mod 5) can
be simplified to (d0 mod 5), but ((d0 mod 35) mod 4) cannot be simplified
further (x = 36 is a counterexample).
This change enables more complex simplifications. For example,
((d0 * 72 + d1) mod 144) mod 9 can now simplify to (d0 * 72 + d1) mod 9
and thus to d1 mod 9. Expressions with chained modulus operators are
reasonably common in tensor applications, and this change _should_
improve code generation for such expressions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109930
Even with all parallel loops reading the output value is still allowed so we
don't have to handle reduction loops differently.
Differential Revision: https://reviews.llvm.org/D109851
Add the addTileLoopIvsToIndexOpResults method to shift the IndexOp results after tiling.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109761
The pattern is returning success even if it does no work leading to pattern application running up to the max iteration count and failing.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D109791
TosaOp defintion had an artificial constraint that the input/output types
needed to be ranked to invoke the quantization builder. This is correct as an
unranked tensor could still be quantized.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D109863
We are having issues running the integration test of the sparse compiler
on AArch64 (crashing in the lib). This revision adds more assertions.
Reviewed By: jsetoain
Differential Revision: https://reviews.llvm.org/D109861
Express the input shape definitions of convolution and pooling operations in terms of the output shapes, filter shapes, strides, and dilations.
Reviewed By: shabalin, rsuderman, stellaraccident
Differential Revision: https://reviews.llvm.org/D109815
This enables the sparsification of more kernels, such as convolutions
where there is a x(i+j) subscript. It also enables more tensor invariants
such as x(1) or other affine subscripts such as x(i+1). Currently, we
reject sparsity altogether for such tensors. Despite this restriction,
however, we can already handle a lot more kernels with compound subscripts
for dense access (viz. convolution with dense input and sparse filter).
Some unit tests and an integration test demonstrate new capability.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109783
Adds a new rewrite directive returnType that can be added at the end of an op's
argument list to explicitly specify return types.
```
(OpX $v0, $v1, (returnType "$_builder.getI32Type()"))
```
Pass in a bound value to copy its return type, or pass a native code call to
dynamically create new types.
```
(OpX $v0, $v1, (returnType $v0, (NativeCodeCall<"..."> $v1)))
```
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D109472
There are two main versions of depthwise conv depending whether the multiplier
is 1 or not. In cases where m == 1 we should use the version without the
multiplier channel as it can perform greater optimization.
Add lowering for the quantized/float versions to have a multiplier of one.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D108959
This revision fixes a corner case that could appear due to incorrect insertion point behavior in comprehensive bufferization.
Differential Revision: https://reviews.llvm.org/D109830
Summary: Making the late transformations opt-in results in less surprising behavior when composing multiple calls to the codegen strategy.
Reviewers:
Subscribers:
Differential revision: https://reviews.llvm.org/D109820
AliasInfo can now use union-find for a much more efficient implementation.
This brings no functional changes but large performance gains on more complex examples.
Differential Revision: https://reviews.llvm.org/D109819
Create a new document that explain both stages of the process in a single
place, merge and deduplicate the content from the two previous documents. Also
extend the documentation to account for the recent changes in pass structure
due to standard dialect splitting and translation being more flexible.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D109605
This seems in-line with the intent and how we build tools around it.
Update the description for the flag accordingly.
Also use an injected thread pool in MLIROptMain, now we will create
threads up-front and reuse them across split buffers.
Differential Revision: https://reviews.llvm.org/D109802
Both copy/alloc ops are using memref dialect after this change.
Reviewed By: silvas, mehdi_amini
Differential Revision: https://reviews.llvm.org/D109480
Add the makeComposedExtractSliceOp method that creates an ExtractSliceOp and folds chains of ExtractSliceOps by computing the sum of their offsets and by multiplying their strides.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109601
This tiling option scalarizes all dynamic dimensions, i.e., it tiles all dynamic dimensions by 1.
This option is useful for linalg ops with partly dynamic tensor dimensions. E.g., such ops can appear in the partial iteration after loop peeling. After scalarizing dynamic dims, those ops can be vectorized.
Differential Revision: https://reviews.llvm.org/D109268
Only scf.for loops are supported at the moment. linalg.tiled_loop support will be added in a subsequent commit.
Only static tensor sizes are supported. Loops for dynamic tensor sizes can be peeled, but the generated code is not optimal due to a missing canonicalization pattern.
Differential Revision: https://reviews.llvm.org/D109043
* Revert https://reviews.llvm.org/D107307 so that both LHS and RHS have
the same layout with K0 as the innermost dimension.
* Continuing from https://reviews.llvm.org/D107003, move also 'K'
to the outer side, so that now the inter-tile dimensions as all outer,
and the intra-tile dimensions are all inner.
Reviewed By: asaadaldien
Differential Revision: https://reviews.llvm.org/D109692
This revision allows hoisting static alloc/dealloc pairs as high as possible during ComprehensiveBufferization.
This also aligns such allocated buffers to 128B by default.
This change exhibited some issues wrt insertion points and a missing copy that are also fixed in this revision; tests are updated accordingly.
Differential Revision: https://reviews.llvm.org/D109684
mlir-cpu-runner/math_polynomial_approx.mlir
This test case is currently failing on SystemZ, but it does not appear to
necessarily be a target specific problem. See discussion at
https://bugs.llvm.org/show_bug.cgi?id=51204.
Previously, we would insert a DimOp and rely on later canonicalizations.
Unfortunately, reifyShape kind of rewrites are not canonicalizations anymore.
This introduces undesirable pass dependencies.
Instead, immediately reify the result shape and avoid the DimOp altogether.
This is akin to a local folding, which avoids introducing more reliance on `-resolve-shaped-type-result-dims` (similar to compositions of `affine.apply` by construction to avoid chains of size > 1).
It does not completely get rid of the reliance on the pass as the process is merely local: calling the pass may still be necessary for global effects. Indeed, one of the tests still requires the pass.
Differential Revision: https://reviews.llvm.org/D109571
Types and attributes now have a `hasTrait` function that allow users to check
if a type defines a trait.
Also, AbstractType and AbstractAttribute has now a `hasTraitFn` field to carry
the implementation of the `hasTrait` function of the concrete type or attribute.
This patch also adds the remaining functions to access type and attribute traits
in TableGen.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D105202
The conversion pattern is particularly useful for conversion of
block arguments in the master op.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D109610
Tosa.while shape inference requires repeatedly running shape inference across
the body of the loop until the types become static as we do not know the number
of iterations required by the loop body. Once the least specific arguments are
known they are propagated to both regions.
To determine the final end type, the least restrictive types are determined
from all yields.
Differential Revision: https://reviews.llvm.org/D108801
The original version of the bufferization pattern for linalg.generic would
manually clone operations within the region to the bufferized clone of the
operation. This triggers legality requirements on those operations in the
conversion infra. Instead, this now uses the rewriter to inline the region
instead, avoiding those legality requirements.
Differential Revision: https://reviews.llvm.org/D109581
Generate an scf.for instead of an scf.if for the partial iteration. This is for consistency reasons: The peeling of linalg.tiled_loop also uses another loop for the partial iteration.
Note: Canonicalizations patterns may rewrite partial iterations to scf.if afterwards.
Differential Revision: https://reviews.llvm.org/D109568
Extend the signature of the tile loop nest region builder to take all operand values to use and not just the scf::For iterArgs. This change allows us to pass in all block arguments of TiledLoop and use them directly instead of replacing them after the loop generation.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D109569
This revision fixes the traversal order of extract_slice during the inplace analysis.
It was previously thought that such ops could be analyzed at the very end.
This is unfortunately not true as the AliasInfo for dependents of these ops need to be updated.
This change allows the aliases introduced by the bufferization of extract_slice to be properly propagated.
Differential Revision: https://reviews.llvm.org/D109519
Switches to adding target specific, private includes instead of adding
global includes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D109494
This PR adds missing AtomicRMWKind::min/max cases which we would like to use for min/max reduction loop vectorizations.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104881
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
Further enhance the set of operations that can be handled by the sparse compiler
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109413
Conversion to the LLVM dialect is being refactored to be more progressive and
is now performed as a series of independent passes converting different
dialects. These passes may produce `unrealized_conversion_cast` operations that
represent pending conversions between built-in and LLVM dialect types.
Historically, a more monolithic Standard-to-LLVM conversion pass did not need
these casts as all operations were converted in one shot. Previous refactorings
have led to the requirement of running the Standard-to-LLVM conversion pass to
clean up `unrealized_conversion_cast`s even though the IR had no standard
operations in it. The pass must have been also run the last among all to-LLVM
passes, in contradiction with the partial conversion logic. Additionally, the
way it was set up could produce invalid operations by removing casts between
LLVM and built-in types even when the consumer did not accept the uncasted
type, or could lead to cryptic conversion errors (recursive application of the
rewrite pattern on `unrealized_conversion_cast` as a means to indicate failure
to eliminate casts).
In fact, the need to eliminate A->B->A `unrealized_conversion_cast`s is not
specific to to-LLVM conversions and can be factored out into a separate type
reconciliation pass, which is achieved in this commit. While the cast operation
itself has a folder pattern, it is insufficient in most conversion passes as
the folder only applies to the second cast. Without complex legality setup in
the conversion target, the conversion infra will either consider the cast
operations valid and not fold them (a separate canonicalization would be
necessary to trigger the folding), or consider the first cast invalid upon
generation and stop with error. The pattern provided by the reconciliation pass
applies to the first cast operation instead. Furthermore, having a separate
pass makes it clear when `unrealized_conversion_cast`s could not have been
eliminated since it is the only reason why this pass can fail.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109507
Fix extra space print for llvm global op when the 'unamed_addr'
attribute was empty. This led to two spaces being printed in the custom
form between non-whitespace chars. A round trip would add an extra space
to a typical spaced form. NFC.
Differential Revision: https://reviews.llvm.org/D109502
OpenMP reductions need a neutral element, so we match some known reduction
kinds (integer add/mul/or/and/xor, float add/mul, integer and float min/max) to
define the neutral element and the atomic version when possible to express
using atomicrmw (everything except float mul). The SCF-to-OpenMP pass becomes a
module pass because it now needs to introduce new symbols for reduction
declarations in the module.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D107549
Fold dim ops of scf.for results to dim ops of the respective iter args if the loop is shape preserving.
Differential Revision: https://reviews.llvm.org/D109430
Fold dim ops of linalg.tiled_loop results to dim ops of the respective iter args if the loop is shape preserving.
Differential Revision: https://reviews.llvm.org/D109431
Run a small analysis to see if the runtime type of the iter_arg is changing. Fold only if the runtime type stays the same. (Same as `DimOfIterArgFolder` in SCF.)
Differential Revision: https://reviews.llvm.org/D109299
When tiling a LinalgOp, extract_slice/insert_slice pairs are inserted. To avoid going out-of-bounds when the tile size does not divide the shape size evenly (at the boundary), AffineMin ops are inserted. Some ops have assumptions regarding the dimensions of inputs/outputs. E.g., in a `A * B` matmul, `dim(A, 1) == dim(B, 0)`. However, loop bounds use either `dim(A, 1)` or `dim(B, 0)`.
With this change, AffineMin ops are expressed in terms of loop bounds instead of tensor sizes. (Both have the same runtime value.) This simplifies canonicalizations.
Differential Revision: https://reviews.llvm.org/D109267
This patch (e4635e6328) fixed a bug where a newly generated/reused
constant wouldn't dominate a folded operation. It did so by calling
isBeforeInBlock to move the constant around on demand. This introduced
a significant compile time regression, because "isBeforeInBlock" is
O(n) in the size of a block the first time it is called, and the cache
is invalidated any time canonicalize changes something big in the block.
This fixes LLVM PR51738 and this CIRCT issue:
https://github.com/llvm/circt/issues/1700
This does affect the order of constants left in the top of a block,
I staged in the testsuite changes in rG42431b8207a5.
Differential Revision: https://reviews.llvm.org/D109454
This patch refactors the existing implementation of computing an explicit
representation of an identifier as a floordiv in terms of other identifiers and
exposes this computation as a public function.
The computation of this representation is required to support local identifiers
in PresburgerSet subtract, complement and isEqual.
Reviewed By: bondhugula, arjunp
Differential Revision: https://reviews.llvm.org/D106662
Add loop coalesce utility for affine.for. This expects loops to have
been normalized a-priori. This works for both constant as well non
constant upper bounds having single/multiple result upper bound affine
map.
With contributions from Arnab Dutta and Uday Bondhugula.
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D108126
Right now all but the last bullet are relying on applied "must not" that
isn't there and the last bullet is a "must".
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D109389
The lowering has been incorrectly using the operands of the original op instead
of rewritten operands provided to matchAndRewrite call. This may lead to
spurious materializations and generally invalid IR.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D109355
The conversion has been incorrectly using the operands of the original
operation instead of the converted operands provided to the matchAndRewrite
call. This may lead to spurious materializations and generally invalid IR if
the producer of the original operands is deleted in the process of conversion.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D109356
It looks like it was a typo. Instead of `*maybeConstantIndex`,
`initTensorOp.getStaticSize(*maybeConstantIndex)` should be used to access the
dim size of the tensor. There is a test for that in `canonicalize.mlir`, but it
was working correctly because `ReplaceStaticShapeDims` was canonicalizing DimOp
before `FoldInitTensorWithDimOp`. So, to make the patterns more "orthogonal",
this case is disabled.
Differential Revision: https://reviews.llvm.org/D109247
Previously only await inside the async function (coroutine after lowering to async runtime) would check the error state
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D109229
Create a gpu memset op and corresponding CUDA and ROCm wrappers.
Reviewed By: herhut, lorenrose1013
Differential Revision: https://reviews.llvm.org/D107548
FuncOp always lowers to an LLVM external linkage presently. This makes it impossible to define functions in mlir which are local to the current module. Until MLIR FuncOps have a more formal linkage specification, this commit allows funcop's to have an optionally specified llvm.linkage attribute, whose value will be used as the linkage of the llvm funcop when lowered.
Differential Revision: https://reviews.llvm.org/D108524
Support LLVM linkage
This makes the IR more readable, in particular when this will be used on
the builtin func outside of the LLVM dialect.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D109209
This simplifies setting up sparse tensors through C-style data structures.
Useful for runtimes that want to interact with MLIR-generated code
without knowning about all bufferization details (viz. memrefs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109251
The sparse index order must always be satisfied, but this
may give a choice in topsorts for several cases. We broke
ties in favor of any dense index order, since this gives
good locality. However, breaking ties in favor of pushing
unrelated indices into sparse iteration spaces gives better
asymptotic complexity. This revision improves the heuristic.
Note that in the long run, we are really interested in using
ML for ML to find the best loop ordering as a replacement for
such heuristics.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109100
DialectAsmParser::parseKeyword is rejecting `'i' digit+` while it is
a valid identifier according to mlir/docs/LangRef.md.
Integer types actually used to be TOK_KEYWORD a while back before the
change: 6af866c58d.
This patch Modifies `isCurrentTokenAKeyword` to return true for tokens that
match integer types too.
The motivation for this change is the parsing of `!fir.type<{` `component-name: component-type,`+ `}>`
type in FIR that represent Fortran derived types. The component-names are
parsed as keywords, and can very well be i32 or any ixxx (which are
valid Fortran derived type component names).
The Quant dialect type parser had to be modified since it relied on `iw` not
being parsed as keywords.
Differential Revision: https://reviews.llvm.org/D108913
The limitation on iter_args introduced with D108806 is too restricting. Changes of the runtime type should be allowed.
Extends the dim op canonicalization with a simple analysis to determine when it is safe to canonicalize.
Differential Revision: https://reviews.llvm.org/D109125
* Now that packaging has stabilized, removes old mechanisms for loading extensions, preferring direct importing.
* Removes _cext_loader.py, _dlloader.py as unnecessary.
* Fixes the path where the CAPI dll is written on Windows. This enables that path of least resistance loading behavior to work with no further drama (see: https://bugs.python.org/issue36085).
* With this patch, `ninja check-mlir` on Windows with Python bindings works for me, modulo some failures that are actually due to a couple of pre-existing Windows bugs. I think this is the first time the Windows Python bindings have worked upstream.
* Downstream changes needed:
* If downstreams are using the now removed `load_extension`, `reexport_cext`, etc, then those should be replaced with normal import statements as done in this patch.
Reviewed By: jdd, aartbik
Differential Revision: https://reviews.llvm.org/D108489
The translation to LLVM IR used to construct sequential constants by recurring
down to individual elements, creating constant values for them, and wrapping
them into aggregate constants in post-order. This is highly inefficient for
large constants with known data such as DenseElementsAttr. Use LLVM's
ConstantData for the innermost dimension instead. LLVM does seem to support
data constants for nested sequential constants so the outer dimensions are
still handled recursively. Nevertheless, this speeds up the translation of
large constants with equal dimensions by up to 30x.
Users are advised to rewrite large constants to use flat types before
translating to LLVM IR if more efficiency in translation is necessary. This is
not done automatically as the translation is not aware of the expectations of
the overall compilation flow about type changes and indexing, in particular for
global constants with external linkage.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D109152
Add an operation omp.critical.declare to declare names/symbols of
critical sections. Named omp.critical operations should use symbols
declared by omp.critical.declare. Having a declare operation ensures
that the names of critical sections are global and unique. In the
lowering flow to LLVM IR, the OpenMP IRBuilder creates unique names
for critical sections.
Reviewed By: ftynse, jeanPerier
Differential Revision: https://reviews.llvm.org/D108713
This upstreams the Cpp emitter, initially presented with [1], from [2]
to MLIR core. Together with the previously upstreamed EmitC dialect [3],
the target allows to translate MLIR to C/C++.
[1] https://reviews.llvm.org/D76571
[2] https://github.com/iml130/mlir-emitc
[3] https://reviews.llvm.org/D103969
Co-authored-by: Jacques Pienaar <jpienaar@google.com>
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Co-authored-by: Oliver Scherf <oliver.scherf@iml.fraunhofer.de>
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D104632
Use the recently introduced OpenMPIRBuilder facility to transate OpenMP
workshare loops with reductions to LLVM IR calling OpenMP runtime. Most of the
heavy lifting is done at the OpenMPIRBuilder. When other OpenMP dialect
constructs grow support for reductions, the translation can be updated to
operate on, e.g., an operation interface for all reduction containers instead
of workshare loops specifically. Designing such a generic translation for the
single operation that currently supports reductions is premature since we don't
know how the reduction modeling itself will be generalized.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107343
(1) renamed SparseTensor to SparseTensorCOO, the other one remains SparseTensorStorage to focus on contrast
(2) documents difference between public API exclusively for compiler-generated code and methods that could be used by other runtimes (TBD) that want to interact with MLIR
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109039
Add method to get NameLoc. Treat null child location as unknown to avoid
needing to create UnknownLoc in C API where child loc is not needed.
Differential Revision: https://reviews.llvm.org/D108678
This patch is to add Image Operands in SPIR-V Dialect and also let ImageDrefGather to use Image Operands.
Image Operands are used in many image instructions. "Image Operands encodes what oprands follow, as per Image Operands". And ususally, they are optional to image instructions.
The format of image operands looks like:
%0 = spv.ImageXXXX %1, ... %3 : f32 ["Bias|Lod"](%4, %5 : f32, f32) -> ...
This patch doesn’t implement all operands (see Section 3.14 in SPIR-V Spec) but provides a skeleton of it. There is TODO in verifyImageOperands function.
Co-authored: Alan Liu <alanliu.yf@gmail.com>
Reviewed by: antiagainst
Differential Revision: https://reviews.llvm.org/D108501
In D104421, we changed the API for pass registration.
Before you would write:
void registerPass("my-pass", "My Pass Description.",
[] { return createMyPass(); });
while now you’d only write:
void registerPass([] { return createMyPass(); });
If you’re using TableGen to define your pass registration, you shouldn’t have anything to do. If you’re using directly the C++ API here are some changes.
Your project may also be broken even if you use TableGen and you call the
generated registration API in case your pass implementation didn’t inherit from
the MyPassBase class generated by TableGen.
If you don't use TableGen, the "my-pass" and "My Pass Description." fields must
be provided by overriding methods on the pass itself:
llvm::StringRef getArgument() const final { return "my-pass"; }
llvm::StringRef getDescription() const final {
return "My Pass Description.";
}
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104429
Trying to reduce confusion by having the name of the public method match that of the private method for handling the recursion. Also adding some comments to SparseTensorStorage::fromCOO to help clarify what the recursive calls are doing in the dense case.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D108954
The output tensor was added for tiling purposes. With use of
`TilingInterface` for tiling pad operations, there is no need for an
explicit operand for the shape of result of `linalg.pad_tensor`
op. The interface allows the tiling pattern to query the value that
can be used for the "init" needed for tiling dynamically.
Differential Revision: https://reviews.llvm.org/D108613
Currently the builtin dialect is the default namespace used for parsing
and printing. As such module and func don't need to be prefixed.
In the case of some dialects that defines new regions for their own
purpose (like SpirV modules for example), it can be beneficial to
change the default dialect in order to improve readability.
Differential Revision: https://reviews.llvm.org/D107236
This aligns the printer with the parser contract: the operation isn't part of the user-controllable part of the syntax.
Differential Revision: https://reviews.llvm.org/D108804
This makes the hook return a printer if available, instead of using LogicalResult to
indicate if a printer was available (and invoked). This allows the caller to detect that
the dialect has a printer for a given operation without actually invoking the printer.
It'll be leveraged in a future revision to move printing the op name itself under control
of the ASMPrinter.
Differential Revision: https://reviews.llvm.org/D108803
Don't assert fail on strided memrefs when dropping unit dims.
Instead just leave them unchanged.
Differential Revision: https://reviews.llvm.org/D108205
* This allows multiple MLIR-API embedding downstreams to co-exist in the same process.
* I believe this is the last thing needed to enable isolated embedding.
Differential Revision: https://reviews.llvm.org/D108605
An interface to allow for tiling of operations is introduced. The
tiling of the linalg.pad_tensor operation is modified to use this
interface.
Differential Revision: https://reviews.llvm.org/D108611
The StringAttr version doesn't need a context, so we can just use the
existing `SymbolRefAttr::get` form. The StringRef version isn't preferred
so we want to encourage people to use StringAttr.
There is an additional form of getSymbolRefAttr that takes a (SymbolTrait
implementing) operation. This should also be moved, but I'll do that as
a separate patch.
Differential Revision: https://reviews.llvm.org/D108922
* It is pretty clear that no one has tried this yet since it was both incomplete and broken.
* Fixes a symbol hiding issues keeping even the generic builder from constructing an operation with successors.
* Adds ODS support for successors.
* Adds CAPI `mlirBlockGetParentRegion`, `mlirRegionEqual` + tests (and missing test for `mlirBlockGetParentOperation`).
* Adds Python property: `Block.region`.
* Adds Python methods: `Block.create_before` and `Block.create_after`.
* Adds Python property: `InsertionPoint.block`.
* Adds new blocks.py test to verify a plausible CFG construction case.
Differential Revision: https://reviews.llvm.org/D108898
41d4aa7de6 introduced incorrect code in
extraTraitClassDeclaration: `this` refers to the trait class and not the
operation class so `->getContext()` is not valid. Use `$_op` instead.
SymbolRefAttr is fundamentally a base string plus a sequence
of nested references. Instead of storing the string data as
a copies StringRef, store it as an already-uniqued StringAttr.
This makes a lot of things simpler and more efficient because:
1) references to the symbol are already stored as StringAttr's:
there is no need to copy the string data into MLIRContext
multiple times.
2) This allows pointer comparisons instead of string
comparisons (or redundant uniquing) within SymbolTable.cpp.
3) This allows SymbolTable to hold a DenseMap instead of a
StringMap (which again copies the string data and slows
lookup).
This is a moderately invasive patch, so I kept a lot of
compatibility APIs around. It would be nice to explore changing
getName() to return a StringAttr for example (right now you have
to use getNameAttr()), and eliminate things like the StringRef
version of getSymbol.
Differential Revision: https://reviews.llvm.org/D108899
* Add `DimOfIterArgFolder`.
* Move existing cross-dialect canonicalization patterns to `LoopCanonicalization.cpp`.
* Rename `SCFAffineOpCanonicalization` pass to `SCFForLoopCanonicalization`.
* Expand documentaton of scf.for: The type of loop-carried variables may not change with iterations. (Not even the dynamic type.)
Differential Revision: https://reviews.llvm.org/D108806
* Add batched version of all `addId` variants, so that multiple IDs can be added at a time.
* Rename `addId` and variants to `insertId` and `appendId`. Most external users call `appendId`. Splitting `addId` into two functions also makes it possible to provide batched version for both. (Otherwise, the overloads are ambigious when calling `addId`.)
Differential Revision: https://reviews.llvm.org/D108532
Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support. This
method is the only one in CUDA runtime wrappers library that creates
a dependence on libLLVMSupport due to its use of SmallVector and
ArrayRef. The code can be as easily/compactly written without those ADT.
The dependence on LLVMSupport adds a significant amount of additional
complexity for external things that want to link this library in (both
statically or as a shared object) since libLLVMSupport includes numerous
other objects that are sensitive to C++ compiler version and ABI.
Differential Revision: https://reviews.llvm.org/D108684
Needed to switch to extract to support tosa.reverse using dynamic shapes.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108744
This prepares general sparse to sparse conversions. The code that
needs to be generated using this new feature is now simply:
(1) coo = sparse_tensor_1->asCOO(); // source format1
(2) sparse_tensor_2 = newSparseTensor(coo); // destination format2
By using COO as an intermediate, we can do *all* conversions without
having to implement the full O(N^2) conversion matrix. Note that we
can always improve particular conversions individually if a faster
solution is required.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108681
This allows for using a different type when accessing a parameter than the
one used for storage. This allows for returning parameters by reference,
enables using more optimized/convient reference results, and more.
Differential Revision: https://reviews.llvm.org/D108593
This allows for parsing strings that have escape sequences, which require constructing
a string (as they can't be represented by looking at the Token contents directly).
Differential Revision: https://reviews.llvm.org/D108589
This allows for iterating and interacting with the uses of a specific subset of
results as opposed to just the full range.
Differential Revision: https://reviews.llvm.org/D108586
Includes the quantized version of average pool lowering to linalg dialect.
This includes a lit test for the transform. It is not 100% correct as the
multiplier / shift should be done in i64 however this is negligable rounding
difference.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108676
Lowering to table was incorrect as it did not apply a 128 offset before
extracting the value from the table. Fixed and correct tensor length on input
table.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108436
* Add support for affine.max ops to SCF loop peeling pattern.
* Add support for affine.max ops to `AffineMinSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalizationPattern` to `AffineOpSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalization` pass to `SCFAffineOpCanonicalization`.
Differential Revision: https://reviews.llvm.org/D108009
The emplace commands are variadic and should take all the constructor arguments directly, since they implicitly call the constructor themselves in order to avoid the cost of constructing and then moving/copying temporaries.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D108670
When padding quantized operations, the padding needs to equal the zero point
of the input value. Corrected the pass to change the padding value if quantized.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108440
Add notes for discarding private-visible functions in the Toy tutorial chapter 4.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D108026
Recent changes outside sparse compiler exposed the requirement of running a
new pass (lower-affine) but this only became apparent with private testing.
By adding some vectorized runs to integration test, we will detect the need
for such changes earlier and also widen codegen coverage of course.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D108667
This canonicalization simplifies affine.min operations inside "for loop"-like operations (e.g., scf.for and scf.parallel) based on two invariants:
* iv >= lb
* iv < lb + step * ((ub - lb - 1) floorDiv step) + 1
This commit adds a new pass `canonicalize-scf-affine-min` (instead of being a canonicalization pattern) to avoid dependencies between the Affine dialect and the SCF dialect.
Differential Revision: https://reviews.llvm.org/D107731
Introduces new Ops to represent 1. alias.scope metadata in LLVM, and 2. domains for these scopes. These correspond to the metadata described in https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata. Lists of scopes are modeled the same way as access groups - as an ArrayAttr on the Op (added in https://reviews.llvm.org/D97944).
Lowering 'noalias' attributes on function parameters is already supported. However, lowering `noalias` metadata on individual Ops is not, which is added in this change. LLVM uses the same keyword for these, but this change introduces a separate attribute name 'noalias_scopes' to represent this distinct concept.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107870
I found myself typing this code several times at different places
by now, so time to make this a general utility instead. Given
a permutation, it returns the permuted position of the input,
for example (i,j,k) -> (k,i,j) yields position 1 for input 0.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D108347
If additional static type information can be deduced from a insert_slice's size operands, insert an explicit cast of the op's source operand.
This enables other canonicalization patterns that are matching for tensor_cast ops such as `ForOpTensorCastFolder` in SCF.
Differential Revision: https://reviews.llvm.org/D108617
Rationale:
Passing in a pointer to the memref data in order to implement the
dense to sparse conversion was a bit too low-level. This revision
improves upon that approach with a cleaner solution of generating
a loop nest in MLIR code itself that prepares the COO object before
passing it to our "swiss army knife" setup. This is much more
intuitive *and* now also allows for dynamic shapes.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108491
This revision adds native ODS support for VariadicOfVariadic operand
groups. An example of this is the SwitchOp, which has a variadic number
of nested operand ranges for each of the case statements, where the
number of case statements is variadic. Builtin ODS support allows for
generating proper accessors for the nested operand ranges, builder
support, and declarative format support. VariadicOfVariadic operands
are supported by providing a segment attribute to use to store the
operand groups, mapping similarly to the AttrSizedOperand trait
(but with a user defined attribute name).
`build` methods for VariadicOfVariadic operand expect inputs of the
form `ArrayRef<ValueRange>`. Accessors for the variadic ranges
return a new `OperandRangeRange` type, which represents a
contiguous range of `OperandRange`. In the declarative assembly
format, VariadicOfVariadic operands and types are by default
formatted as a comma delimited list of value lists:
`(<value>, <value>), (), (<value>)`.
Differential Revision: https://reviews.llvm.org/D107774
This allows for inlining into an empty block or to the beginning of a block. NFC as the existing implementations now foward to this overload.
Differential Revision: https://reviews.llvm.org/D108572
This revision fixes a bug where an operation would get replaced with
a pre-existing constant that didn't dominate it. This can occur when
a pattern inserts operations to be folded at the beginning of the
constants insertion block. This revision fixes the bug by moving the
existing constant before the replaced operation in such cases. This is
fine because if a constant didn't already exist, a new one would have
been inserted before this operation anyways.
Differential Revision: https://reviews.llvm.org/D108498
Do not apply loop peeling to loops that are contained in the partial iteration of an already peeled loop. This is to avoid code explosion when dealing with large loop nests. Can be controlled with a new pass option `skip-partial`.
Differential Revision: https://reviews.llvm.org/D108542
* This is the native data layout for PyTorch and npcomp was using the prior version before cleanup.
Differential Revision: https://reviews.llvm.org/D108527
* Resolves a TODO by making this configurable by downstreams.
* This seems to be the last thing allowing full use of the Python bindings as a library within another project (i.e. be embedding them).
Differential Revision: https://reviews.llvm.org/D108523
Presently, the lowering of nested scf.parallel loops to OpenMP creates one omp.parallel region, with two (nested) OpenMP worksharing loops on the inside. When lowered to LLVM and executed, this results in incorrect results. The reason for this is as follows:
An OpenMP parallel region results in the code being run with whatever number of threads available to OpenMP. Within a parallel region a worksharing loop divides up the total number of requested iterations by the available number of threads, and distributes accordingly. For a single ws loop in a parallel region, this works as intended.
Now consider nested ws loops as follows:
omp.parallel {
A: omp.ws %i = 0...10 {
B: omp.ws %j = 0...10 {
code(%i, %j)
}
}
}
Suppose we ran this on two threads. The first workshare loop would decide to execute iterations 0, 1, 2, 3, 4 on thread 0, and iterations 5, 6, 7, 8, 9 on thread 1. The second workshare loop would decide the same for its iteration. This means thread 0 would execute i \in [0, 5) and j \in [0, 5). Thread 1 would execute i \in [5, 10) and j \in [5, 10). This means that iterations i in [5, 10), j in [0, 5) and i in [0, 5), j in [5, 10) never get executed, which is clearly wrong.
This permits two options for a remedy:
1) Change the semantics of the omp.wsloop to be distinct from that of the OpenMP runtime call or equivalently #pragma omp for. This could then allow some lowering transformation to remedy the aforementioned issue. I don't think this is desirable for an abstraction standpoint.
2) When lowering an scf.parallel always surround the wsloop with a new parallel region (thereby causing the innermost wsloop to use the number of threads available only to it).
This PR implements the latter change.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108426
Multiple operations were still defined as TC ops that had equivalent versions
as YAML operations. Reducing to a single compilation path guarantees that
frontends can lower to their equivalent operations without missing the
optimized fastpath.
Some operations are maintained purely for testing purposes (mainly conv{1,2,3}D
as they are included as sole tests in the vectorizaiton transforms.
Differential Revision: https://reviews.llvm.org/D108169
Folding in the MLIR uses the order of the type directly
but folding in the underlying implementation must take
the dim ordering into account. These tests clarify that
behavior and verify it is done right.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108474
The boilerplate was setting up some arrays for testing. To fully illustrate
python - MLIR potential, however, this data should also come from numpy land.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108336
Tosa rescale can contain uint8 types. Added support for these types
using an unrealized conversion cast. Optimistically it would be better to
use bitcast however it does not support unsigned integers.
Differential Revision: https://reviews.llvm.org/D108427
Previously, ExecuteRegionOps with multiple return values would fail a round-trip test due to missing parenthesis around the types.
Differential Revision: https://reviews.llvm.org/D108402
Apply the "for loop peeling" pattern from SCF dialect transforms. This pattern splits scf.for loops into full and partial iterations. In the full iteration, all masked loads/stores are canonicalized to unmasked loads/stores.
Differential Revision: https://reviews.llvm.org/D107733
Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
This shares more code with existing utilities. Also, to be consistent,
we moved dimension permutation on the DimOp to the tensor lowering phase.
This way, both pre-existing DimOps on sparse tensors (not likely but
possible) as well as compiler generated DimOps are handled consistently.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108309
Its possible for the clamp to have invalid min/max values on its range. To fix
this we validate the range of the min/max and clamp to a valid range.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108256
LLVM considers global variables marked as externals to be defined within the module if it is initialized (including to an undef). Other external globals are considered as being defined externally and imported into the current translation unit. Lowering of MLIR Global Ops does not properly propagate undefined initializers, resulting in a global which is expected to be defined within the current TU, not being defined.
Differential Revision: https://reviews.llvm.org/D108252
MSVC needs to know where to put the archive (.lib) as well as the runtime
(.dll). If left to the default location, multiple rules to generate the same
file will be produced, creating a Ninja error.
Differential Revision: https://reviews.llvm.org/D108181
* Rename ids to values in FlatAffineValueConstraints.
* Overall cleanup of comments in FlatAffineConstraints and FlatAffineValueConstraints.
Differential Revision: https://reviews.llvm.org/D107947
* Extract "value" functionality of `FlatAffineConstraints` into a new derived `FlatAffineValueConstraints` class. Current users of `FlatAffineConstraints` can use `FlatAffineValueConstraints` without additional code changes, thus NFC.
* `FlatAffineConstraints` no longer associates dimensions with SSA Values. All functionality that requires this, is moved to `FlatAffineValueConstraints`.
* `FlatAffineConstraints` no longer makes assumptions about where Values associated with dimensions are coming from.
Differential Revision: https://reviews.llvm.org/D107725
This method bitcasts a DenseElementsAttr elementwise to one of the same
shape with a different element type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D107612
Reduction axis should come after all parallel axis to work with vectorization.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108005
These operations are not lowered to from any source dialect and are only
used for redundant tests. Removing these named ops, along with their
associated tests, will make migration to YAML operations much more
convenient.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D107993
Expand ParallelLoopTilingPass with an inbound_check mode.
In default mode, the upper bound of the inner loop is from the min op; in
inbound_check mode, the upper bound of the inner loop is the step of the outer
loop and an additional inbound check will be emitted inside of the inner loop.
This was 'FIXME' in the original codes and a typical usage is for GPU backends,
thus the outer loop and inner loop can be mapped to blocks/threads in seperate.
Differential Revision: https://reviews.llvm.org/D105455
The primary pattern for this pass clones many operations from producers
to consumers. Doing this top down prevents duplicated work when a
producer has multiple consumers, if it also is consuming another
linalg.generic.
As an example, a chain of ~2600 generics that are fused into ~70
generics was resulting in 16255 pattern invocations. This took 14
seconds on one machine but takes only 0.3 seconds with top-down
traversal.
Differential Revision: https://reviews.llvm.org/D107818
While the changes are extensive, they basically fall into a few
categories:
1) Moving the TestDialect itself.
2) Updating C++ code in tablegen to explicitly use ::mlir, since it
will be put in a headers that shouldn't expect a 'using'.
3) Updating some generic MLIR Interface definitions to do the same thing.
4) Updating the Tablegen generator in a few places to be explicit about
namespaces
5) Doing the same thing for llvm references, since we no longer pick
up the definitions from mlir/Support/LLVM.h
Differential Revision: https://reviews.llvm.org/D88251
The approach for handling reductions in the outer most
dimension follows that for inner most dimensions, outlined
below
First, transpose to move reduction dims, if needed
Convert reduction from n-d to 2-d canonical form
Then, for outer reductions, we emit the appropriate op
(add/mul/min/max/or/and/xor) and combine the results.
Differential Revision: https://reviews.llvm.org/D107675
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.
CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.
In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0
Also see discussion in D105706.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D107540
Move StaticVerifierFunctionEmitter to CodeGenHelper.h so that it can be
used for both ODS and DRR.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D106636
Using the python API to easily set up sparse kernels, this test
exhaustively builds, compilers, and runs SpMM for all annotations
on a sparse tensor, making sure every version generates the correct
result. This test also illustrates using the python API to set up
a sparse kernel and sparse compilation.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D107943
This reverts the revert 28c04794df.
The failing MLIR test that caused the revert should be fixed in this
version.
Also includes a PPC test fix previously in 1f87c7c478.
This can be useful when one needs to know which unrolled iteration an Op belongs to, for example, conveying noalias information among memory-affecting ops in parallel-access loops.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107789
Existing linalg.conv2d is not well optimized for performance. Changed to a
version that is more aligned for optimziation. Include the corresponding
transposes to use this optimized version.
This also splits the conv and depthwise conv into separate implementations
to avoid overly complex lowerings.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D107504
This is a bit cleaner and removes issues with 2d vectors. It also has a
big impact on constant folding, hence the test changes.
Differential Revision: https://reviews.llvm.org/D107896
The conversion is a straightforward one-to-one mapping with optional unrolling
for nD vectors, similarly to other cast operations.
Depends On D107889
Reviewed By: cota, akuegel
Differential Revision: https://reviews.llvm.org/D107891
The constraint was checking that the type is not an LLVM structure or array
type, but was not checking that it is an LLVM-compatible type, making it accept
incorrect types. As a result, some LLVM dialect ops could process values that
are not compatible with the LLVM dialect leading to further issues with
conversions and translations that assume all values are LLVM-compatible. Make
LLVM_AnyNonAggregate only accept LLVM-compatible types.
Reviewed By: cota, akuegel
Differential Revision: https://reviews.llvm.org/D107889
Reimplement this function in terms of `composeMatchingMap`.
Also fix a bug in `composeMatchingMap` where local dims of `this` could be missing in `localCst`.
Differential Revision: https://reviews.llvm.org/D107813
This function overload is similar to the existing `FlatAffineConstraints::addLowerOrUpperBound`. It constrains a dimension based on an affine map. However, in contrast to the other overloading, it does not attempt to align dimensions/symbols of the affine map with the dimensions/symbols of the constraint set. Instead, dimensions/symbols are expected to already be aligned.
Differential Revision: https://reviews.llvm.org/D107727
This function aligns an affine map (and operands) with given dims and syms SSA values.
This is useful in conjunction with `FlatAffineConstraints::addLowerOrUpperBound`, which requires the `boundMap` to be aligned with the constraint set's dims and syms.
Differential Revision: https://reviews.llvm.org/D107728
Some folding cases are trivial to fold away, specifically no-op cases where
an operation's input and output are the same. Canonicalizing these away
removes unneeded operations.
The current version includes tensor cast operations to resolve shape
discreprencies that occur when an operation's result type differs from the
input type. These are resolved during a tosa shape propagation pass.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D107321
Dilation only requires increasing the padding on the left/right side of the
input, and including dilation in the convolution. This implementation still
lacks support for strided convolutions.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D107680
When using an attribute where a value is expected previously this would fail
complaining about unbound symbol. Instead make error clear and mention common
failure reason.