* This already half existed in terms of reading the raw buffer backing a DenseElementsAttr.
* Documented the precise expectations of the buffer layout.
* Extended the Python API to support construction from bitcasted buffers, allowing construction of all primitive element types (even those that lack a compatible representation in Python).
* Specifically, the Python API can now load all integer types at all bit widths and all floating point types (f16, f32, f64, bf16).
Differential Revision: https://reviews.llvm.org/D111284
It was bundling quite a lot of patterns that convert high-D
vector ops into low-D elementary ops. It might not be good
for all of the patterns to happen for a particular downstream
user. For example, `ShapeCastOpRewritePattern` rewrites
`vector.shape_cast` into data movement extract/insert ops.
Instead, split the entry point into multiple ones so users
can pull in patterns on demand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111225
Update OpDSL to support unsigned integers by adding unsigned min/max/cast signatures. Add tests in OpDSL and on the C++ side to verify the proper signed and unsigned operations are emitted.
The patch addresses an issue brought up in https://reviews.llvm.org/D111170.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D111230
Create the Arithmetic dialect that contains basic integer and floating
point arithmetic operations. Ops that did not meet this criterion were
moved to the Math dialect.
First of two atomic patches to remove integer and floating point
operations from the standard dialect. Ops will be removed from the
standard dialect in a subsequent patch.
Reviewed By: ftynse, silvas
Differential Revision: https://reviews.llvm.org/D110200
For the type lattice, we (now) use the "less specialized or equal" partial
order, leading to the bottom representing the empty set, and the top
representing any type.
This naming is more in line with the generally used conventions, where the top
of the lattice is the full set, and the bottom of the lattice is the empty set.
A typical example is the powerset of a finite set: generally, meet would be the
intersection, and join would be the union.
```
top: {a,b,c}
/ | \
{a,b} {a,c} {b,c}
| X X |
{a} { b } {c}
\ | /
bottom: { }
```
This is in line with the examined lattice representations in LLVM:
* lattice for `BitTracker::BitValue` in `Hexagon/BitTracker.h`
* lattice for constant propagation in `HexagonConstPropagation.cpp`
* lattice in `VarLocBasedImpl.cpp`
* lattice for address space inference code in `InferAddressSpaces.cpp`
Reviewed By: silvas, jpienaar
Differential Revision: https://reviews.llvm.org/D110766
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
This patch extends Linalg core vectorization with support for min/max reductions
in linalg.generic ops. It enables the reduction detection for min/max combiner ops.
It also renames MIN/MAX combining kinds to MINS/MAXS to make the sign explicit for
floating point and signed integer types. MINU/MAXU should be introduce din the future
for unsigned integer types.
Reviewed By: pifon2a, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D110854
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
This allows us to generate interfaces in a namespace,
following other TableGen'erated code.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108311
Move the generalization pattern to the other Linalg transforms to make it available to the codegen strategy.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110728
This option is needed for passes that are known to reach a fix point, but may
need many iterations depending on the size of the input IR.
Differential Revision: https://reviews.llvm.org/D111058
This change allows better interop with external clients of comprehensive bufferization functions
but is otherwise NFC for the MLIR pass itself.
Differential Revision: https://reviews.llvm.org/D111121
The discussion in https://reviews.llvm.org/D110425 demonstrated that "packing"
may be a confusing term to define the behavior of this op in presence of the
attribute. Instead, indicate the intended effect of preventing the folder from
being applied.
Reviewed By: nicolasvasilache, silvas
Differential Revision: https://reviews.llvm.org/D111046
This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.
It also contains three small changes.
1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.
Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>
Reviewed by:antiagainst
Differential revision: https://reviews.llvm.org/D110207
NFC. Drop unnecessary use of OpBuilder in buildTripCountMapAndOperands.
Rename this to getTripCountMapAndOperands and remove stale comments.
Differential Revision: https://reviews.llvm.org/D110993
Exposes mlir::TypeID to the C API as MlirTypeID along with various accessors
and helper functions.
Differential Revision: https://reviews.llvm.org/D110897
The pooling ops are among the last remaining hard coded Linalg operations that have no region attached. They got obsolete due to the OpDSL pooling operations. Removing them allows us to delete specialized code and tests that are not needed for the OpDSL counterparts that rely on the standard code paths.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110909
Add support for dynamic shared memory for GPU launch ops: add an
optional operand to gpu.launch and gpu.launch_func ops to specify the
amount of "dynamic" shared memory to use. Update lowerings to connect
this operand to the GPU runtime.
Differential Revision: https://reviews.llvm.org/D110800
This revision exposes some minimal funcitonality to allow comprehensive
bufferization to interop with external projects.
Differential Revision: https://reviews.llvm.org/D110875
* This could have been removed some time ago as it only had one op left in it, which is redundant with the new approach.
* `matmul_i8_i8_i32` (the remaining op) can be trivially replaced by `matmul`, which natively supports mixed precision.
Differential Revision: https://reviews.llvm.org/D110792
The former is redundant because the later carries it as part of
its builder. Add a getContext() helper method to DialectAsmParser
to make this more convenient, and stop passing the context around
explicitly. This simplifies ODS generated parser hooks for attrs
and types.
This resolves PR51985
Recommit 4b32f8bac4 after fixing a dependency.
Differential Revision: https://reviews.llvm.org/D110796
The former is redundant because the later carries it as part of
its builder. Add a getContext() helper method to DialectAsmParser
to make this more convenient, and stop passing the context around
explicitly. This simplifies ODS generated parser hooks for attrs
and types.
This resolves PR51985
Differential Revision: https://reviews.llvm.org/D110796
Added interface implementations for AllocOp and CloneOp defined in the MemRef diallect.
Adapted the BufferDeallocation pass to be compatible with the interface introduced in this CL.
Differential Revision: https://reviews.llvm.org/D109350
This revision retires a good portion of the complexity of the codegen strategy and puts the logic behind pass logic.
Differential revision: https://reviews.llvm.org/D110678
Quantized int type should include I32 types as its the output of a quantizd
convolution or matmul operation.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D110651
Unroll-and-jam currently doesn't work when the loop being unroll-and-jammed
or any of its inner loops has iter_args. This patch modifies the
unroll-and-jam utility to support loops with iter_args.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110085
Adapt the signature of the PaddingValueComputationFunction callback to either return the padding value or failure to signal padding is not desired.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110572
New mode option that allows for either running the default fusion kind that happens today or doing either of producer-consumer or sibling fusion. This will also be helpful to minimize the compile-time of the fusion tests.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D110102
Let the calling pass or pattern replace the uses of the original root operation. Internally, the tileAndFuse still replaces uses and updates operands but only of newly created operations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110169
This revision adds a
```
FlatAffineValueConstraints(ValueRange ivs, ValueRange lbs, ValueRange ubs)
```
method and use it in hoist padding.
Differential Revision: https://reviews.llvm.org/D110427
This revision extracts padding hoisting in a new file and cleans it up in prevision of future improvements and extensions.
Differential Revision: https://reviews.llvm.org/D110414
This patch adds functionality to FlatAffineConstraints to remove local
variables using equalities. This helps in keeping output representation of
FlatAffineConstraints smaller.
This patch is part of a series of patches aimed at generalizing affine
dependence analysis.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110056
We currently, incorrectly, assume that a range always has at least
one element when building a contiguous range. This commit adds
a proper empty check to avoid crashing.
Differential Revision: https://reviews.llvm.org/D110457
This patch introduces a generic reduction detection utility that works
across different dialecs. It is mostly a generalization of the reduction
detection algorithm in Affine. The reduction detection logic in Affine,
Linalg and SCFToOpenMP have been replaced with this new generic utility.
The utility takes some basic components of the potential reduction and
returns: 1) the reduced value, and 2) a list with the combiner operations.
The logic to match reductions involving multiple combiner operations disabled
until we can properly test it.
Reviewed By: ftynse, bondhugula, nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D110303
This has a few benefits:
* It allows for defining parsers/printer code blocks that
can be shared between operations and attribute/types.
* It removes the weird duplication of generic parser/printer hooks,
which means that newly added hooks only require touching one class.
Differential Revision: https://reviews.llvm.org/D110375
These are among the last operations still defined explicitly in C++. I've
tried to keep this commit as NFC as possible, but these ops
definitely need a non-NFC cleanup at some point.
Differential Revision: https://reviews.llvm.org/D110440
* If the input is a constant splat value, we just
need to reshape it.
* If the input is a general constant with one user,
we can also constant fold it, without bloating
the IR.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D110439
This commits updates the remaining usages of the ArrayRef<Value> based
matchAndRewrite/rewrite methods in favor of the new OpAdaptor
overload.
Differential Revision: https://reviews.llvm.org/D110360
This has been a TODO for a long time, and it brings about many advantages (namely nice accessors, and less fragile code). The existing overloads that accept ArrayRef are now treated as deprecated and will be removed in a followup (after a small grace period). Most of the upstream MLIR usages have been fixed by this commit, the rest will be handled in a followup.
Differential Revision: https://reviews.llvm.org/D110293
Initially, the padding transformation and the related operation were only used
to guarantee static shapes of subtensors in tiled operations. The
transformation would not insert the padding operation if the shapes were
already static, and the overall code generation would actively remove such
"noop" pads. However, this transformation can be also used to pack data into
smaller tensors and marshall them into faster memory, regardless of the size
mismatches. In context of expert-driven transformation, we should assume that,
if padding is requested, a potentially padded tensor must be always created.
Update the transformation accordingly. To do this, introduce an optional
`packing` attribute to the `pad_tensor` op that serves as an indication that
the padding is an intentional choice (as opposed to side effect of type
normalization) and should be left alone by cleanups.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110425
Add support for intersecting, subtracting, complementing and checking equality of sets having divisions.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110138
This canonicalization pattern complements the tensor.cast(pad_tensor) one in
propagating constant type information when possible. It contributes to the
feasibility of pad hoisting.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110343
* Do not discard static result type information that cannot be inferred from lower/upper padding.
* Add optional argument to `PadTensorOp::inferResultType` for specifying known result dimensions.
Differential Revision: https://reviews.llvm.org/D110380
This is only noticeable when using an attribute across dialects I think.
Previously the namespace would be ommited, but it wouldn't matter as
long as the generated code stays within a single namespace.
Differential Revision: https://reviews.llvm.org/D110367
This test makes sure kernels map to efficient sparse code, i.e. all
compressed for-loops, no co-iterating while loops. In addition, this
revision removes the special constant folding inside the sparse
compiler in favor of Mahesh' new generic linalg folding. Thanks!
NOTE: relies on Mahesh fix, which needs to be rebased first
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110001
DialectAsmParser has a `parseAttribute` member that takes a
contextual type, but DialectAsmPrinter doesn't have the corresponding
member to take advantage of it. As such, custom attribute
implementations can't really use it. This adds the obvious missing
method which fills this hole.
Differential Revision: https://reviews.llvm.org/D110211
Add a PointerProxy similar to the existing iterator_facade_base::ReferenceProxy and return it from the arrow operator. This prevents iterator facades with a reference type that is not a true reference to take the address of a temporary.
Forward the reference type of the mapped_iterator to the iterator adaptor which in turn forwards it to the iterator facade. This fixes mlir::op_iterator::operator->() to take the address of a temporary.
Make some polishing changes to op_iterator and op_filter_iterator.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D109490
Add a helper method to check if an index vector contains a permutation of its indices. Additionally, refactor applyPermutationToVector to take int64_t.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110135
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
This patch adds mergeLocalIds andmergeSymbolIds as public functions
for FlatAffineConstraints and FlatAffineValueConstraints respectively.
mergeLocalIds is also required to support divisions in intersection,
subtraction, equality checks, and complement for PresburgerSet.
This patch is part of a series of patches aimed at generalizing affine
dependence analysis.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110045
Lots of custom ops have hand-rolled comma-delimited parsing loops, as does
the MLIR parser itself. Provides a standard interface for doing this that
is less error prone and less boilerplate.
While here, extend Delimiter to support <> and {} delimited sequences as
well (I have a use for <> in CIRCT specifically).
Differential Revision: https://reviews.llvm.org/D110122
This revision refactors ElementsAttr into an Attribute Interface.
This enables a common interface with which to interact with
element attributes, without needing to modify the builtin
dialect. It also removes a majority (if not all?) of the need for
the current OpaqueElementsAttr, which was originally intended as
a way to opaquely represent data that was not representable by
the other builtin constructs.
The new ElementsAttr interface not only allows for users to
natively represent their data in the way that best suits them,
it also allows for efficient opaque access and iteration of the
underlying data. Attributes using the ElementsAttr interface
can directly expose support for interacting with the held
elements using any C++ data type they claim to support. For
example, DenseIntOrFpElementsAttr supports iteration using
various native C++ integer/float data types, as well as
APInt/APFloat, and more. ElementsAttr instances that refer to
DenseIntOrFpElementsAttr can use all of these data types for
iteration:
```c++
DenseIntOrFpElementsAttr intElementsAttr = ...;
ElementsAttr attr = intElementsAttr;
for (uint64_t value : attr.getValues<uint64_t>())
...;
for (APInt value : attr.getValues<APInt>())
...;
for (IntegerAttr value : attr.getValues<IntegerAttr>())
...;
```
ElementsAttr also supports failable range/iterator access,
allowing for selective code paths depending on data type
support:
```c++
ElementsAttr attr = ...;
if (auto range = attr.tryGetValues<uint64_t>()) {
for (uint64_t value : *range)
...;
}
```
Differential Revision: https://reviews.llvm.org/D109190
Currently DenseElementsAttr only exposes the ability to get the full range of values for a given type T, but there are many situations where we just want the beginning/end iterator. This revision adds proper value_begin/value_end methods for all of the supported T types, and also cleans up a bit of the interface.
Differential Revision: https://reviews.llvm.org/D104173
SparseElementsAttr currently does not perform any verfication on construction, with the only verification existing within the parser. This revision moves the parser verification to SparseElementsAttr, and also adds additional verification for when a sparse index is not valid.
Differential Revision: https://reviews.llvm.org/D109189
Some patterns may share the common DAG structures. Generate a static
function to do the match logic to reduce the binary size.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D105797
For `memref.subview` operations, when there are more than one
unit-dimensions, the strides need to be used to figure out which of
the unit-dims are actually dropped.
Differential Revision: https://reviews.llvm.org/D109418
Add an interface that allows grouping together all covolution and
pooling ops within Linalg named ops. The interface currently
- the indexing map used for input/image access is valid
- the filter and output are accessed using projected permutations
- that all loops are charecterizable as one iterating over
- batch dimension,
- output image dimensions,
- filter convolved dimensions,
- output channel dimensions,
- input channel dimensions,
- depth multiplier (for depthwise convolutions)
Differential Revision: https://reviews.llvm.org/D109793
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
Add a new version of fusion on tensors that supports the following scenarios:
- support input and output operand fusion
- fuse a producer result passed in via tile loop iteration arguments (update the tile loop iteration arguments)
- supports only linalg operations on tensors
- supports only scf::for
- cannot add an output to the tile loop nest
The LinalgTileAndFuseOnTensors pass tiles the root operation and fuses its producers.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D109766
The discussion on forum:
https://llvm.discourse.group/t/bug-in-partial-dialect-conversion/4115
The `applyPartialConversion` didn't handle the operations, that were
marked as illegal inside dynamic legality callback.
Instead of reporting error, if such operation was not converted to legal set,
the method just added it to `unconvertedSet` in the same way as unknown operations.
This patch fixes that and handle dynamically illegal operations as well.
The patch includes 2 fixes for existing passes:
* `tensor-bufferize` - explicitly mark `std.return` as legal.
* `convert-parallel-loops-to-gpu` - ugly fix with marking visited operations
to avoid recursive legality checks.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108505
Add a constant propagator for gpu.launch op in cases where the
grid/thread IDs can be trivially determined to take a single constant
value of zero.
Differential Revision: https://reviews.llvm.org/D109994
Note that this revision adds a very tiny bit of constant folding in the
sparse compiler lattice construction. Although I am generally trying to
avoid such canonicalizations (and rely on other passes to fix this instead),
the benefits of avoiding a very expensive disjunction lattice construction
justify having this special code (at least for now).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109939
Add the addTileLoopIvsToIndexOpResults method to shift the IndexOp results after tiling.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109761
Express the input shape definitions of convolution and pooling operations in terms of the output shapes, filter shapes, strides, and dilations.
Reviewed By: shabalin, rsuderman, stellaraccident
Differential Revision: https://reviews.llvm.org/D109815
Adds a new rewrite directive returnType that can be added at the end of an op's
argument list to explicitly specify return types.
```
(OpX $v0, $v1, (returnType "$_builder.getI32Type()"))
```
Pass in a bound value to copy its return type, or pass a native code call to
dynamically create new types.
```
(OpX $v0, $v1, (returnType $v0, (NativeCodeCall<"..."> $v1)))
```
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D109472
Summary: Making the late transformations opt-in results in less surprising behavior when composing multiple calls to the codegen strategy.
Reviewers:
Subscribers:
Differential revision: https://reviews.llvm.org/D109820
This seems in-line with the intent and how we build tools around it.
Update the description for the flag accordingly.
Also use an injected thread pool in MLIROptMain, now we will create
threads up-front and reuse them across split buffers.
Differential Revision: https://reviews.llvm.org/D109802
Both copy/alloc ops are using memref dialect after this change.
Reviewed By: silvas, mehdi_amini
Differential Revision: https://reviews.llvm.org/D109480
Add the makeComposedExtractSliceOp method that creates an ExtractSliceOp and folds chains of ExtractSliceOps by computing the sum of their offsets and by multiplying their strides.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109601
This tiling option scalarizes all dynamic dimensions, i.e., it tiles all dynamic dimensions by 1.
This option is useful for linalg ops with partly dynamic tensor dimensions. E.g., such ops can appear in the partial iteration after loop peeling. After scalarizing dynamic dims, those ops can be vectorized.
Differential Revision: https://reviews.llvm.org/D109268
Only scf.for loops are supported at the moment. linalg.tiled_loop support will be added in a subsequent commit.
Only static tensor sizes are supported. Loops for dynamic tensor sizes can be peeled, but the generated code is not optimal due to a missing canonicalization pattern.
Differential Revision: https://reviews.llvm.org/D109043
* Revert https://reviews.llvm.org/D107307 so that both LHS and RHS have
the same layout with K0 as the innermost dimension.
* Continuing from https://reviews.llvm.org/D107003, move also 'K'
to the outer side, so that now the inter-tile dimensions as all outer,
and the intra-tile dimensions are all inner.
Reviewed By: asaadaldien
Differential Revision: https://reviews.llvm.org/D109692
Types and attributes now have a `hasTrait` function that allow users to check
if a type defines a trait.
Also, AbstractType and AbstractAttribute has now a `hasTraitFn` field to carry
the implementation of the `hasTrait` function of the concrete type or attribute.
This patch also adds the remaining functions to access type and attribute traits
in TableGen.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D105202
Tosa.while shape inference requires repeatedly running shape inference across
the body of the loop until the types become static as we do not know the number
of iterations required by the loop body. Once the least specific arguments are
known they are propagated to both regions.
To determine the final end type, the least restrictive types are determined
from all yields.
Differential Revision: https://reviews.llvm.org/D108801
The original version of the bufferization pattern for linalg.generic would
manually clone operations within the region to the bufferized clone of the
operation. This triggers legality requirements on those operations in the
conversion infra. Instead, this now uses the rewriter to inline the region
instead, avoiding those legality requirements.
Differential Revision: https://reviews.llvm.org/D109581
Generate an scf.for instead of an scf.if for the partial iteration. This is for consistency reasons: The peeling of linalg.tiled_loop also uses another loop for the partial iteration.
Note: Canonicalizations patterns may rewrite partial iterations to scf.if afterwards.
Differential Revision: https://reviews.llvm.org/D109568
Extend the signature of the tile loop nest region builder to take all operand values to use and not just the scf::For iterArgs. This change allows us to pass in all block arguments of TiledLoop and use them directly instead of replacing them after the loop generation.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D109569
Further enhance the set of operations that can be handled by the sparse compiler
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109413
Conversion to the LLVM dialect is being refactored to be more progressive and
is now performed as a series of independent passes converting different
dialects. These passes may produce `unrealized_conversion_cast` operations that
represent pending conversions between built-in and LLVM dialect types.
Historically, a more monolithic Standard-to-LLVM conversion pass did not need
these casts as all operations were converted in one shot. Previous refactorings
have led to the requirement of running the Standard-to-LLVM conversion pass to
clean up `unrealized_conversion_cast`s even though the IR had no standard
operations in it. The pass must have been also run the last among all to-LLVM
passes, in contradiction with the partial conversion logic. Additionally, the
way it was set up could produce invalid operations by removing casts between
LLVM and built-in types even when the consumer did not accept the uncasted
type, or could lead to cryptic conversion errors (recursive application of the
rewrite pattern on `unrealized_conversion_cast` as a means to indicate failure
to eliminate casts).
In fact, the need to eliminate A->B->A `unrealized_conversion_cast`s is not
specific to to-LLVM conversions and can be factored out into a separate type
reconciliation pass, which is achieved in this commit. While the cast operation
itself has a folder pattern, it is insufficient in most conversion passes as
the folder only applies to the second cast. Without complex legality setup in
the conversion target, the conversion infra will either consider the cast
operations valid and not fold them (a separate canonicalization would be
necessary to trigger the folding), or consider the first cast invalid upon
generation and stop with error. The pattern provided by the reconciliation pass
applies to the first cast operation instead. Furthermore, having a separate
pass makes it clear when `unrealized_conversion_cast`s could not have been
eliminated since it is the only reason why this pass can fail.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109507
OpenMP reductions need a neutral element, so we match some known reduction
kinds (integer add/mul/or/and/xor, float add/mul, integer and float min/max) to
define the neutral element and the atomic version when possible to express
using atomicrmw (everything except float mul). The SCF-to-OpenMP pass becomes a
module pass because it now needs to introduce new symbols for reduction
declarations in the module.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D107549
When tiling a LinalgOp, extract_slice/insert_slice pairs are inserted. To avoid going out-of-bounds when the tile size does not divide the shape size evenly (at the boundary), AffineMin ops are inserted. Some ops have assumptions regarding the dimensions of inputs/outputs. E.g., in a `A * B` matmul, `dim(A, 1) == dim(B, 0)`. However, loop bounds use either `dim(A, 1)` or `dim(B, 0)`.
With this change, AffineMin ops are expressed in terms of loop bounds instead of tensor sizes. (Both have the same runtime value.) This simplifies canonicalizations.
Differential Revision: https://reviews.llvm.org/D109267
This patch refactors the existing implementation of computing an explicit
representation of an identifier as a floordiv in terms of other identifiers and
exposes this computation as a public function.
The computation of this representation is required to support local identifiers
in PresburgerSet subtract, complement and isEqual.
Reviewed By: bondhugula, arjunp
Differential Revision: https://reviews.llvm.org/D106662
Add loop coalesce utility for affine.for. This expects loops to have
been normalized a-priori. This works for both constant as well non
constant upper bounds having single/multiple result upper bound affine
map.
With contributions from Arnab Dutta and Uday Bondhugula.
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D108126
Create a gpu memset op and corresponding CUDA and ROCm wrappers.
Reviewed By: herhut, lorenrose1013
Differential Revision: https://reviews.llvm.org/D107548
This makes the IR more readable, in particular when this will be used on
the builtin func outside of the LLVM dialect.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D109209
The limitation on iter_args introduced with D108806 is too restricting. Changes of the runtime type should be allowed.
Extends the dim op canonicalization with a simple analysis to determine when it is safe to canonicalize.
Differential Revision: https://reviews.llvm.org/D109125
Add an operation omp.critical.declare to declare names/symbols of
critical sections. Named omp.critical operations should use symbols
declared by omp.critical.declare. Having a declare operation ensures
that the names of critical sections are global and unique. In the
lowering flow to LLVM IR, the OpenMP IRBuilder creates unique names
for critical sections.
Reviewed By: ftynse, jeanPerier
Differential Revision: https://reviews.llvm.org/D108713
This upstreams the Cpp emitter, initially presented with [1], from [2]
to MLIR core. Together with the previously upstreamed EmitC dialect [3],
the target allows to translate MLIR to C/C++.
[1] https://reviews.llvm.org/D76571
[2] https://github.com/iml130/mlir-emitc
[3] https://reviews.llvm.org/D103969
Co-authored-by: Jacques Pienaar <jpienaar@google.com>
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Co-authored-by: Oliver Scherf <oliver.scherf@iml.fraunhofer.de>
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D104632
Use the recently introduced OpenMPIRBuilder facility to transate OpenMP
workshare loops with reductions to LLVM IR calling OpenMP runtime. Most of the
heavy lifting is done at the OpenMPIRBuilder. When other OpenMP dialect
constructs grow support for reductions, the translation can be updated to
operate on, e.g., an operation interface for all reduction containers instead
of workshare loops specifically. Designing such a generic translation for the
single operation that currently supports reductions is premature since we don't
know how the reduction modeling itself will be generalized.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107343
Add method to get NameLoc. Treat null child location as unknown to avoid
needing to create UnknownLoc in C API where child loc is not needed.
Differential Revision: https://reviews.llvm.org/D108678
This patch is to add Image Operands in SPIR-V Dialect and also let ImageDrefGather to use Image Operands.
Image Operands are used in many image instructions. "Image Operands encodes what oprands follow, as per Image Operands". And ususally, they are optional to image instructions.
The format of image operands looks like:
%0 = spv.ImageXXXX %1, ... %3 : f32 ["Bias|Lod"](%4, %5 : f32, f32) -> ...
This patch doesn’t implement all operands (see Section 3.14 in SPIR-V Spec) but provides a skeleton of it. There is TODO in verifyImageOperands function.
Co-authored: Alan Liu <alanliu.yf@gmail.com>
Reviewed by: antiagainst
Differential Revision: https://reviews.llvm.org/D108501
In D104421, we changed the API for pass registration.
Before you would write:
void registerPass("my-pass", "My Pass Description.",
[] { return createMyPass(); });
while now you’d only write:
void registerPass([] { return createMyPass(); });
If you’re using TableGen to define your pass registration, you shouldn’t have anything to do. If you’re using directly the C++ API here are some changes.
Your project may also be broken even if you use TableGen and you call the
generated registration API in case your pass implementation didn’t inherit from
the MyPassBase class generated by TableGen.
If you don't use TableGen, the "my-pass" and "My Pass Description." fields must
be provided by overriding methods on the pass itself:
llvm::StringRef getArgument() const final { return "my-pass"; }
llvm::StringRef getDescription() const final {
return "My Pass Description.";
}
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104429
The output tensor was added for tiling purposes. With use of
`TilingInterface` for tiling pad operations, there is no need for an
explicit operand for the shape of result of `linalg.pad_tensor`
op. The interface allows the tiling pattern to query the value that
can be used for the "init" needed for tiling dynamically.
Differential Revision: https://reviews.llvm.org/D108613
Currently the builtin dialect is the default namespace used for parsing
and printing. As such module and func don't need to be prefixed.
In the case of some dialects that defines new regions for their own
purpose (like SpirV modules for example), it can be beneficial to
change the default dialect in order to improve readability.
Differential Revision: https://reviews.llvm.org/D107236
This aligns the printer with the parser contract: the operation isn't part of the user-controllable part of the syntax.
Differential Revision: https://reviews.llvm.org/D108804
This makes the hook return a printer if available, instead of using LogicalResult to
indicate if a printer was available (and invoked). This allows the caller to detect that
the dialect has a printer for a given operation without actually invoking the printer.
It'll be leveraged in a future revision to move printing the op name itself under control
of the ASMPrinter.
Differential Revision: https://reviews.llvm.org/D108803
An interface to allow for tiling of operations is introduced. The
tiling of the linalg.pad_tensor operation is modified to use this
interface.
Differential Revision: https://reviews.llvm.org/D108611
The StringAttr version doesn't need a context, so we can just use the
existing `SymbolRefAttr::get` form. The StringRef version isn't preferred
so we want to encourage people to use StringAttr.
There is an additional form of getSymbolRefAttr that takes a (SymbolTrait
implementing) operation. This should also be moved, but I'll do that as
a separate patch.
Differential Revision: https://reviews.llvm.org/D108922
* It is pretty clear that no one has tried this yet since it was both incomplete and broken.
* Fixes a symbol hiding issues keeping even the generic builder from constructing an operation with successors.
* Adds ODS support for successors.
* Adds CAPI `mlirBlockGetParentRegion`, `mlirRegionEqual` + tests (and missing test for `mlirBlockGetParentOperation`).
* Adds Python property: `Block.region`.
* Adds Python methods: `Block.create_before` and `Block.create_after`.
* Adds Python property: `InsertionPoint.block`.
* Adds new blocks.py test to verify a plausible CFG construction case.
Differential Revision: https://reviews.llvm.org/D108898
41d4aa7de6 introduced incorrect code in
extraTraitClassDeclaration: `this` refers to the trait class and not the
operation class so `->getContext()` is not valid. Use `$_op` instead.
SymbolRefAttr is fundamentally a base string plus a sequence
of nested references. Instead of storing the string data as
a copies StringRef, store it as an already-uniqued StringAttr.
This makes a lot of things simpler and more efficient because:
1) references to the symbol are already stored as StringAttr's:
there is no need to copy the string data into MLIRContext
multiple times.
2) This allows pointer comparisons instead of string
comparisons (or redundant uniquing) within SymbolTable.cpp.
3) This allows SymbolTable to hold a DenseMap instead of a
StringMap (which again copies the string data and slows
lookup).
This is a moderately invasive patch, so I kept a lot of
compatibility APIs around. It would be nice to explore changing
getName() to return a StringAttr for example (right now you have
to use getNameAttr()), and eliminate things like the StringRef
version of getSymbol.
Differential Revision: https://reviews.llvm.org/D108899
* Add `DimOfIterArgFolder`.
* Move existing cross-dialect canonicalization patterns to `LoopCanonicalization.cpp`.
* Rename `SCFAffineOpCanonicalization` pass to `SCFForLoopCanonicalization`.
* Expand documentaton of scf.for: The type of loop-carried variables may not change with iterations. (Not even the dynamic type.)
Differential Revision: https://reviews.llvm.org/D108806
* Add batched version of all `addId` variants, so that multiple IDs can be added at a time.
* Rename `addId` and variants to `insertId` and `appendId`. Most external users call `appendId`. Splitting `addId` into two functions also makes it possible to provide batched version for both. (Otherwise, the overloads are ambigious when calling `addId`.)
Differential Revision: https://reviews.llvm.org/D108532
This allows for using a different type when accessing a parameter than the
one used for storage. This allows for returning parameters by reference,
enables using more optimized/convient reference results, and more.
Differential Revision: https://reviews.llvm.org/D108593
This allows for parsing strings that have escape sequences, which require constructing
a string (as they can't be represented by looking at the Token contents directly).
Differential Revision: https://reviews.llvm.org/D108589
This allows for iterating and interacting with the uses of a specific subset of
results as opposed to just the full range.
Differential Revision: https://reviews.llvm.org/D108586
* Add support for affine.max ops to SCF loop peeling pattern.
* Add support for affine.max ops to `AffineMinSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalizationPattern` to `AffineOpSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalization` pass to `SCFAffineOpCanonicalization`.
Differential Revision: https://reviews.llvm.org/D108009
This canonicalization simplifies affine.min operations inside "for loop"-like operations (e.g., scf.for and scf.parallel) based on two invariants:
* iv >= lb
* iv < lb + step * ((ub - lb - 1) floorDiv step) + 1
This commit adds a new pass `canonicalize-scf-affine-min` (instead of being a canonicalization pattern) to avoid dependencies between the Affine dialect and the SCF dialect.
Differential Revision: https://reviews.llvm.org/D107731
Introduces new Ops to represent 1. alias.scope metadata in LLVM, and 2. domains for these scopes. These correspond to the metadata described in https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata. Lists of scopes are modeled the same way as access groups - as an ArrayAttr on the Op (added in https://reviews.llvm.org/D97944).
Lowering 'noalias' attributes on function parameters is already supported. However, lowering `noalias` metadata on individual Ops is not, which is added in this change. LLVM uses the same keyword for these, but this change introduces a separate attribute name 'noalias_scopes' to represent this distinct concept.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107870
I found myself typing this code several times at different places
by now, so time to make this a general utility instead. Given
a permutation, it returns the permuted position of the input,
for example (i,j,k) -> (k,i,j) yields position 1 for input 0.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D108347
This revision adds native ODS support for VariadicOfVariadic operand
groups. An example of this is the SwitchOp, which has a variadic number
of nested operand ranges for each of the case statements, where the
number of case statements is variadic. Builtin ODS support allows for
generating proper accessors for the nested operand ranges, builder
support, and declarative format support. VariadicOfVariadic operands
are supported by providing a segment attribute to use to store the
operand groups, mapping similarly to the AttrSizedOperand trait
(but with a user defined attribute name).
`build` methods for VariadicOfVariadic operand expect inputs of the
form `ArrayRef<ValueRange>`. Accessors for the variadic ranges
return a new `OperandRangeRange` type, which represents a
contiguous range of `OperandRange`. In the declarative assembly
format, VariadicOfVariadic operands and types are by default
formatted as a comma delimited list of value lists:
`(<value>, <value>), (), (<value>)`.
Differential Revision: https://reviews.llvm.org/D107774
This allows for inlining into an empty block or to the beginning of a block. NFC as the existing implementations now foward to this overload.
Differential Revision: https://reviews.llvm.org/D108572
Do not apply loop peeling to loops that are contained in the partial iteration of an already peeled loop. This is to avoid code explosion when dealing with large loop nests. Can be controlled with a new pass option `skip-partial`.
Differential Revision: https://reviews.llvm.org/D108542
* This is the native data layout for PyTorch and npcomp was using the prior version before cleanup.
Differential Revision: https://reviews.llvm.org/D108527
* Resolves a TODO by making this configurable by downstreams.
* This seems to be the last thing allowing full use of the Python bindings as a library within another project (i.e. be embedding them).
Differential Revision: https://reviews.llvm.org/D108523
Multiple operations were still defined as TC ops that had equivalent versions
as YAML operations. Reducing to a single compilation path guarantees that
frontends can lower to their equivalent operations without missing the
optimized fastpath.
Some operations are maintained purely for testing purposes (mainly conv{1,2,3}D
as they are included as sole tests in the vectorizaiton transforms.
Differential Revision: https://reviews.llvm.org/D108169
Tosa rescale can contain uint8 types. Added support for these types
using an unrealized conversion cast. Optimistically it would be better to
use bitcast however it does not support unsigned integers.
Differential Revision: https://reviews.llvm.org/D108427
Apply the "for loop peeling" pattern from SCF dialect transforms. This pattern splits scf.for loops into full and partial iterations. In the full iteration, all masked loads/stores are canonicalized to unmasked loads/stores.
Differential Revision: https://reviews.llvm.org/D107733
Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
* Rename ids to values in FlatAffineValueConstraints.
* Overall cleanup of comments in FlatAffineConstraints and FlatAffineValueConstraints.
Differential Revision: https://reviews.llvm.org/D107947
* Extract "value" functionality of `FlatAffineConstraints` into a new derived `FlatAffineValueConstraints` class. Current users of `FlatAffineConstraints` can use `FlatAffineValueConstraints` without additional code changes, thus NFC.
* `FlatAffineConstraints` no longer associates dimensions with SSA Values. All functionality that requires this, is moved to `FlatAffineValueConstraints`.
* `FlatAffineConstraints` no longer makes assumptions about where Values associated with dimensions are coming from.
Differential Revision: https://reviews.llvm.org/D107725
This method bitcasts a DenseElementsAttr elementwise to one of the same
shape with a different element type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D107612
Reduction axis should come after all parallel axis to work with vectorization.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108005
These operations are not lowered to from any source dialect and are only
used for redundant tests. Removing these named ops, along with their
associated tests, will make migration to YAML operations much more
convenient.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D107993
Expand ParallelLoopTilingPass with an inbound_check mode.
In default mode, the upper bound of the inner loop is from the min op; in
inbound_check mode, the upper bound of the inner loop is the step of the outer
loop and an additional inbound check will be emitted inside of the inner loop.
This was 'FIXME' in the original codes and a typical usage is for GPU backends,
thus the outer loop and inner loop can be mapped to blocks/threads in seperate.
Differential Revision: https://reviews.llvm.org/D105455
While the changes are extensive, they basically fall into a few
categories:
1) Moving the TestDialect itself.
2) Updating C++ code in tablegen to explicitly use ::mlir, since it
will be put in a headers that shouldn't expect a 'using'.
3) Updating some generic MLIR Interface definitions to do the same thing.
4) Updating the Tablegen generator in a few places to be explicit about
namespaces
5) Doing the same thing for llvm references, since we no longer pick
up the definitions from mlir/Support/LLVM.h
Differential Revision: https://reviews.llvm.org/D88251
The approach for handling reductions in the outer most
dimension follows that for inner most dimensions, outlined
below
First, transpose to move reduction dims, if needed
Convert reduction from n-d to 2-d canonical form
Then, for outer reductions, we emit the appropriate op
(add/mul/min/max/or/and/xor) and combine the results.
Differential Revision: https://reviews.llvm.org/D107675
Move StaticVerifierFunctionEmitter to CodeGenHelper.h so that it can be
used for both ODS and DRR.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D106636
This can be useful when one needs to know which unrolled iteration an Op belongs to, for example, conveying noalias information among memory-affecting ops in parallel-access loops.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107789
Existing linalg.conv2d is not well optimized for performance. Changed to a
version that is more aligned for optimziation. Include the corresponding
transposes to use this optimized version.
This also splits the conv and depthwise conv into separate implementations
to avoid overly complex lowerings.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D107504
The constraint was checking that the type is not an LLVM structure or array
type, but was not checking that it is an LLVM-compatible type, making it accept
incorrect types. As a result, some LLVM dialect ops could process values that
are not compatible with the LLVM dialect leading to further issues with
conversions and translations that assume all values are LLVM-compatible. Make
LLVM_AnyNonAggregate only accept LLVM-compatible types.
Reviewed By: cota, akuegel
Differential Revision: https://reviews.llvm.org/D107889
Reimplement this function in terms of `composeMatchingMap`.
Also fix a bug in `composeMatchingMap` where local dims of `this` could be missing in `localCst`.
Differential Revision: https://reviews.llvm.org/D107813
This function overload is similar to the existing `FlatAffineConstraints::addLowerOrUpperBound`. It constrains a dimension based on an affine map. However, in contrast to the other overloading, it does not attempt to align dimensions/symbols of the affine map with the dimensions/symbols of the constraint set. Instead, dimensions/symbols are expected to already be aligned.
Differential Revision: https://reviews.llvm.org/D107727
This function aligns an affine map (and operands) with given dims and syms SSA values.
This is useful in conjunction with `FlatAffineConstraints::addLowerOrUpperBound`, which requires the `boundMap` to be aligned with the constraint set's dims and syms.
Differential Revision: https://reviews.llvm.org/D107728
Some folding cases are trivial to fold away, specifically no-op cases where
an operation's input and output are the same. Canonicalizing these away
removes unneeded operations.
The current version includes tensor cast operations to resolve shape
discreprencies that occur when an operation's result type differs from the
input type. These are resolved during a tosa shape propagation pass.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D107321
This patch enables normalizing memrefs with MemRef_ReinterpretCastOp by
adding MemRefsNormalizable trait in the Op definition.
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D107425
This enables querying shapes/values as shapes without mutating the IR
directly (e.g., towards enabling doing inference in analysis &
application steps, inferring function shape with constant from callsite,
...). Add a new ShapeAdaptor that abstracts over whether shape is from
Type or ShapedTypeComponents or DenseIntElementsAttribute. This adds new
accessors to ValueShapeRange to get Shape and value as shape, but
doesn't restrict or remove the previous way of accessing Type via the
Value for now, that does mean a less refined shape could be accidentally
queried and will be restricted in follow up.
Currently restricted Value query to what can be represented as Shape. So
only supports cases where constant subgraph evaluation's output is a
shape. I had considered making it more general, but without TBD extern
attribute concept or some such a user cannot today uniformly avoid
overhead.
Update TOSA ops and also the shape inference pass.
Differential Revision: https://reviews.llvm.org/D107768
The following constructor call (and others) used to be ambiguous:
```
FlatAffineConstraints constraints(0, 0, 0);
```
Differential Revision: https://reviews.llvm.org/D107726
Perform scalar constant propagation for FPTruncOp only if the resulting value can be represented without precision loss or rounding.
Example:
%cst = constant 1.000000e+00 : f32
%0 = fptrunc %cst : f32 to bf16
-->
%cst = constant 1.000000e+00 : bf16
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107518
Use new return type for `OpAsmDialectInterface::getAlias`:
* `AliasResult::NoAlias` if an alias was not provided.
* `AliasResult::OverridableAlias` if an alias was provided, but it might be overriden by other hook.
* `AliasResult::FinalAlias` if an alias was provided and it should be used (no other hooks will be checked).
In that case `AsmPrinter` will use either the first alias with `FinalAlias` result or
the last alias with `OverridableAlias` result (it depends on dialect array order).
Used `OverridableAlias` result for `BuiltinOpAsmDialectInterface`.
Use case: provide more informative alias for built-in attributes like `AffineMapAttr`
instead of generic "map<N>".
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D107437
The file in which `Region::viewGraph` is defined has changed. This should have been updated with D106342.
Differential Revision: https://reviews.llvm.org/D107517