1. Add support to vectorize induction variables of loops that are
not mapped to any vector dimension in SuperVectorize pass.
2. Fix a bug in getForInductionVarOwner.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D111370
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
The purpose of this revision is to make "write into non-writable memory" conflict detection easier to understand.
The main idea is that there is a conflict in the case of inplace bufferization if:
1. Someone writes to (an alias of) opOperand, opResult or the to-be-bufferized op writes itself.
2. And, opOperand or opResult aliases a non-writable buffer.
Differential Revision: https://reviews.llvm.org/D111379
This commit adds a pattern to perform constant folding on linalg
generic ops which are essentially transposes. We see real cases
where model importers may generate such patterns.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D110597
The convolution op is one of the remaining hard coded Linalg operations that have no region attached. It got obsolete due to the OpDSL convolution operations. Removing it allows us to delete specialized code and tests that are not needed for the OpDSL counterparts that rely on the standard code paths.
Test needed due to specialized implementations are removed. Tiling and fusion tests are replaced by variants using linalg.conv_2d.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111233
Currently Affine LICM checks iterOperands and does not hoist out any
instruction containing iterOperands. We should check iterArgs instead.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D111090
Add an interface for outlineable OpenMP operations.
This patch was initially done in fir-dev and is now needed
for the upstreaming.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D111310
The signature of this function was confusing. Check for hasKnownBufferizationAliasingBehavior separately when needed.
Differential Revision: https://reviews.llvm.org/D110916
It was bundling quite a lot of patterns that convert high-D
vector ops into low-D elementary ops. It might not be good
for all of the patterns to happen for a particular downstream
user. For example, `ShapeCastOpRewritePattern` rewrites
`vector.shape_cast` into data movement extract/insert ops.
Instead, split the entry point into multiple ones so users
can pull in patterns on demand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111225
Move getInplaceableOpResult() call into bufferizableInPlaceAnalysis.
Note: The only goal of this change is to make the signature of bufferizableInPlaceAnalysis smaller. (Fewer arguments.)
Differential Revision: https://reviews.llvm.org/D110915
ConstShapeOp has a constant shape, so its type can always be static.
We still allow it to have ShapeType though.
Differential Revision: https://reviews.llvm.org/D111139
Update OpDSL to support unsigned integers by adding unsigned min/max/cast signatures. Add tests in OpDSL and on the C++ side to verify the proper signed and unsigned operations are emitted.
The patch addresses an issue brought up in https://reviews.llvm.org/D111170.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D111230
Add folding rule for std.or op when an operand has all bits set.
or(x, <all bits set>) -> <all bits set>
Differential Revision: https://reviews.llvm.org/D111206
Create the Arithmetic dialect that contains basic integer and floating
point arithmetic operations. Ops that did not meet this criterion were
moved to the Math dialect.
First of two atomic patches to remove integer and floating point
operations from the standard dialect. Ops will be removed from the
standard dialect in a subsequent patch.
Reviewed By: ftynse, silvas
Differential Revision: https://reviews.llvm.org/D110200
For the type lattice, we (now) use the "less specialized or equal" partial
order, leading to the bottom representing the empty set, and the top
representing any type.
This naming is more in line with the generally used conventions, where the top
of the lattice is the full set, and the bottom of the lattice is the empty set.
A typical example is the powerset of a finite set: generally, meet would be the
intersection, and join would be the union.
```
top: {a,b,c}
/ | \
{a,b} {a,c} {b,c}
| X X |
{a} { b } {c}
\ | /
bottom: { }
```
This is in line with the examined lattice representations in LLVM:
* lattice for `BitTracker::BitValue` in `Hexagon/BitTracker.h`
* lattice for constant propagation in `HexagonConstPropagation.cpp`
* lattice in `VarLocBasedImpl.cpp`
* lattice for address space inference code in `InferAddressSpaces.cpp`
Reviewed By: silvas, jpienaar
Differential Revision: https://reviews.llvm.org/D110766
Instead just emit a warning that analysis failed and the result will be treated conservatively.
Differential Revision: https://reviews.llvm.org/D111217
Implement min and max using the newly introduced std operations instead of relying on compare and select.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D111170
This patch extends Linalg core vectorization with support for min/max reductions
in linalg.generic ops. It enables the reduction detection for min/max combiner ops.
It also renames MIN/MAX combining kinds to MINS/MAXS to make the sign explicit for
floating point and signed integer types. MINU/MAXU should be introduce din the future
for unsigned integer types.
Reviewed By: pifon2a, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D110854
This otherwise loses a lot of debugging info and results in a painful
debugging experience.
Reviewed By: mravishankar, stellaraccident
Differential Revision: https://reviews.llvm.org/D111107
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
This allows us to generate interfaces in a namespace,
following other TableGen'erated code.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108311
Move the generalization pattern to the other Linalg transforms to make it available to the codegen strategy.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110728
These are considered noops.
Buferization will still fail on scf.execute_region which yield values.
This is used to make comprehensive bufferization interoperate better with external clients.
Differential Revision: https://reviews.llvm.org/D111130
This change allows better interop with external clients of comprehensive bufferization functions
but is otherwise NFC for the MLIR pass itself.
Differential Revision: https://reviews.llvm.org/D111121
The discussion in https://reviews.llvm.org/D110425 demonstrated that "packing"
may be a confusing term to define the behavior of this op in presence of the
attribute. Instead, indicate the intended effect of preventing the folder from
being applied.
Reviewed By: nicolasvasilache, silvas
Differential Revision: https://reviews.llvm.org/D111046
This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.
It also contains three small changes.
1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.
Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>
Reviewed by:antiagainst
Differential revision: https://reviews.llvm.org/D110207
Tiling can create dim ops and those dim ops can take `InitTensorOp`
as input. Including it in the tiling canonicalization patterns
allows us to fold those dim ops away.
Also sorted the existing ops along the way.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D110876
The pooling ops are among the last remaining hard coded Linalg operations that have no region attached. They got obsolete due to the OpDSL pooling operations. Removing them allows us to delete specialized code and tests that are not needed for the OpDSL counterparts that rely on the standard code paths.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110909
Add support for dynamic shared memory for GPU launch ops: add an
optional operand to gpu.launch and gpu.launch_func ops to specify the
amount of "dynamic" shared memory to use. Update lowerings to connect
this operand to the GPU runtime.
Differential Revision: https://reviews.llvm.org/D110800
This revision exposes some minimal funcitonality to allow comprehensive
bufferization to interop with external projects.
Differential Revision: https://reviews.llvm.org/D110875
For convolution, the input window dimension's access affine map
is of the form `(d0 * s0 + d1)`, where `d0`/`d1` is the output/
filter window dimension, and `s0` is the stride.
When tiling, https://reviews.llvm.org/D109267 changed how the
way dimensions are acquired. Instead of directly querying using
`*.dim` ops on the original convolution op, we now get it by
applying the access affine map to the loop upper bounds. This
is fine for dimensions having single-dimension affine maps,
like matmul, but not for convolution input. It will cause
incorrect compuation and out of bound. A concrete example, say
we have 1x225x225x3 (NHWC) input, 3x3x3x32 (HWCF) filter, and
1x112x112x3 (NHWC) output with stride 2, (112 * 2 + 3) would be
227, which is different from the correct input window dimension
size 225.
Instead, we should first calculate the max indices for each loop,
and apply the affine map to them, and then plus one to get the
dimension size. Note this makes no difference for matmul-like
ops given they will have `d0 - 1 + 1` effectively.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110849
* This could have been removed some time ago as it only had one op left in it, which is redundant with the new approach.
* `matmul_i8_i8_i32` (the remaining op) can be trivially replaced by `matmul`, which natively supports mixed precision.
Differential Revision: https://reviews.llvm.org/D110792
The former is redundant because the later carries it as part of
its builder. Add a getContext() helper method to DialectAsmParser
to make this more convenient, and stop passing the context around
explicitly. This simplifies ODS generated parser hooks for attrs
and types.
This resolves PR51985
Recommit 4b32f8bac4 after fixing a dependency.
Differential Revision: https://reviews.llvm.org/D110796
The former is redundant because the later carries it as part of
its builder. Add a getContext() helper method to DialectAsmParser
to make this more convenient, and stop passing the context around
explicitly. This simplifies ODS generated parser hooks for attrs
and types.
This resolves PR51985
Differential Revision: https://reviews.llvm.org/D110796
Should have verified the perm length and input rank were the same before
inferring shape. Caused a crash with invalid IR.
Differential Revision: https://reviews.llvm.org/D110674
The lack of negi details leaked from merger class into codegen part.
Also, special case for vector code was not needed, the type can be used directly!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110677
Added interface implementations for AllocOp and CloneOp defined in the MemRef diallect.
Adapted the BufferDeallocation pass to be compatible with the interface introduced in this CL.
Differential Revision: https://reviews.llvm.org/D109350
This revision retires a good portion of the complexity of the codegen strategy and puts the logic behind pass logic.
Differential revision: https://reviews.llvm.org/D110678
Unroll-and-jam currently doesn't work when the loop being unroll-and-jammed
or any of its inner loops has iter_args. This patch modifies the
unroll-and-jam utility to support loops with iter_args.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110085
Adapt the signature of the PaddingValueComputationFunction callback to either return the padding value or failure to signal padding is not desired.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110572
This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110442
The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.
Add tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D110373
Let the calling pass or pattern replace the uses of the original root operation. Internally, the tileAndFuse still replaces uses and updates operands but only of newly created operations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110169
This revision adds a
```
FlatAffineValueConstraints(ValueRange ivs, ValueRange lbs, ValueRange ubs)
```
method and use it in hoist padding.
Differential Revision: https://reviews.llvm.org/D110427
This revision extracts padding hoisting in a new file and cleans it up in prevision of future improvements and extensions.
Differential Revision: https://reviews.llvm.org/D110414
When splitting with linalg.copy, cannot write into the destination alloc directly. Instead, write into a subview of the alloc.
Differential Revision: https://reviews.llvm.org/D110512
For such cases, the type of the constant DenseElementsAttr is
different from the transpose op return type.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D110446
This patch introduces a generic reduction detection utility that works
across different dialecs. It is mostly a generalization of the reduction
detection algorithm in Affine. The reduction detection logic in Affine,
Linalg and SCFToOpenMP have been replaced with this new generic utility.
The utility takes some basic components of the potential reduction and
returns: 1) the reduced value, and 2) a list with the combiner operations.
The logic to match reductions involving multiple combiner operations disabled
until we can properly test it.
Reviewed By: ftynse, bondhugula, nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D110303
These are among the last operations still defined explicitly in C++. I've
tried to keep this commit as NFC as possible, but these ops
definitely need a non-NFC cleanup at some point.
Differential Revision: https://reviews.llvm.org/D110440
* If the input is a constant splat value, we just
need to reshape it.
* If the input is a general constant with one user,
we can also constant fold it, without bloating
the IR.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D110439
This commits updates the remaining usages of the ArrayRef<Value> based
matchAndRewrite/rewrite methods in favor of the new OpAdaptor
overload.
Differential Revision: https://reviews.llvm.org/D110360
This has been a TODO for a long time, and it brings about many advantages (namely nice accessors, and less fragile code). The existing overloads that accept ArrayRef are now treated as deprecated and will be removed in a followup (after a small grace period). Most of the upstream MLIR usages have been fixed by this commit, the rest will be handled in a followup.
Differential Revision: https://reviews.llvm.org/D110293
Initially, the padding transformation and the related operation were only used
to guarantee static shapes of subtensors in tiled operations. The
transformation would not insert the padding operation if the shapes were
already static, and the overall code generation would actively remove such
"noop" pads. However, this transformation can be also used to pack data into
smaller tensors and marshall them into faster memory, regardless of the size
mismatches. In context of expert-driven transformation, we should assume that,
if padding is requested, a potentially padded tensor must be always created.
Update the transformation accordingly. To do this, introduce an optional
`packing` attribute to the `pad_tensor` op that serves as an indication that
the padding is an intentional choice (as opposed to side effect of type
normalization) and should be left alone by cleanups.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110425
This canonicalization pattern complements the tensor.cast(pad_tensor) one in
propagating constant type information when possible. It contributes to the
feasibility of pad hoisting.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110343
* Do not discard static result type information that cannot be inferred from lower/upper padding.
* Add optional argument to `PadTensorOp::inferResultType` for specifying known result dimensions.
Differential Revision: https://reviews.llvm.org/D110380
When generating code to add an element to SparseTensorCOO (e.g., when doing dense=>sparse conversion), we used to check for nonzero values on the runtime side, whereas now we generate MLIR code to do that check.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D110121
This test makes sure kernels map to efficient sparse code, i.e. all
compressed for-loops, no co-iterating while loops. In addition, this
revision removes the special constant folding inside the sparse
compiler in favor of Mahesh' new generic linalg folding. Thanks!
NOTE: relies on Mahesh fix, which needs to be rebased first
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110001
The current folder of constant -> generic op only handles splat
constants. The same logic holds for scalar constants. Teach the
pattern to handle such cases.
Differential Revision: https://reviews.llvm.org/D109982
Now not just SUM, but also PRODUCT, AND, OR, XOR. The reductions
MIN and MAX are still to be done (also depends on recognizing
these operations in cmp-select constructs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110203
If no interchange vector is given initialize it with the identity permutation from 0 to number of loops.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D110249
This change adds automatic wrapper functoins with emit_c_interface
to all methods in the sparse support library that deal with MEMREFs.
The wrappers will take care of passing MEMREFs by value internally
and by pointer externally, thereby avoiding ABI issues across platforms.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110219
Compute the tiled producer slice dimensions directly starting from the consumer not using the producer at all.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110147
Add a helper method to check if an index vector contains a permutation of its indices. Additionally, refactor applyPermutationToVector to take int64_t.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110135
It was previously assumed that tensor.insert_slice should be bufferized first in a greedy fashion to avoid out-of-place bufferization of the large tensor. This heuristic does not hold upon further inspection.
This CL removes the special handling of such ops and adds a test that exhibits better behavior and appears in real use cases.
The only test adversely affected is an artificial test which results in a returned memref: this pattern is not allowed by comprehensive bufferization in real scenarios anyway and the offending test is deleted.
Differential Revision: https://reviews.llvm.org/D110072
Previously, comprehensive bufferize would consider all aliasing reads and writes to
the result buffer and matching operand. This resulted in spurious dependences
being considered and resulted in too many unnecessary copies.
Instead, this revision revisits the gathering of read and write alias sets.
This results in fewer alloc and copies.
An exhaustive test cases is added that considers all possible permutations of
`matmul(extract_slice(fill), extract_slice(fill), ...)`.
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
Lots of custom ops have hand-rolled comma-delimited parsing loops, as does
the MLIR parser itself. Provides a standard interface for doing this that
is less error prone and less boilerplate.
While here, extend Delimiter to support <> and {} delimited sequences as
well (I have a use for <> in CIRCT specifically).
Differential Revision: https://reviews.llvm.org/D110122
Currently DenseElementsAttr only exposes the ability to get the full range of values for a given type T, but there are many situations where we just want the beginning/end iterator. This revision adds proper value_begin/value_end methods for all of the supported T types, and also cleans up a bit of the interface.
Differential Revision: https://reviews.llvm.org/D104173
For `memref.subview` operations, when there are more than one
unit-dimensions, the strides need to be used to figure out which of
the unit-dims are actually dropped.
Differential Revision: https://reviews.llvm.org/D109418
Add an interface that allows grouping together all covolution and
pooling ops within Linalg named ops. The interface currently
- the indexing map used for input/image access is valid
- the filter and output are accessed using projected permutations
- that all loops are charecterizable as one iterating over
- batch dimension,
- output image dimensions,
- filter convolved dimensions,
- output channel dimensions,
- input channel dimensions,
- depth multiplier (for depthwise convolutions)
Differential Revision: https://reviews.llvm.org/D109793
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
Add a new version of fusion on tensors that supports the following scenarios:
- support input and output operand fusion
- fuse a producer result passed in via tile loop iteration arguments (update the tile loop iteration arguments)
- supports only linalg operations on tensors
- supports only scf::for
- cannot add an output to the tile loop nest
The LinalgTileAndFuseOnTensors pass tiles the root operation and fuses its producers.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D109766
So far, the CF cost-model for detensoring was limited to discovering
pure CF structures. This means, if while discovering the CF component,
the cost-model found any op that is not detensorable, it gives up on
detensoring altogether. This patch makes it a bit more flexible by
cleaning-up the detensorable component from non-detensorable ops without
giving up entirely.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D109965
The discussion on forum:
https://llvm.discourse.group/t/bug-in-partial-dialect-conversion/4115
The `applyPartialConversion` didn't handle the operations, that were
marked as illegal inside dynamic legality callback.
Instead of reporting error, if such operation was not converted to legal set,
the method just added it to `unconvertedSet` in the same way as unknown operations.
This patch fixes that and handle dynamically illegal operations as well.
The patch includes 2 fixes for existing passes:
* `tensor-bufferize` - explicitly mark `std.return` as legal.
* `convert-parallel-loops-to-gpu` - ugly fix with marking visited operations
to avoid recursive legality checks.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108505
Add a constant propagator for gpu.launch op in cases where the
grid/thread IDs can be trivially determined to take a single constant
value of zero.
Differential Revision: https://reviews.llvm.org/D109994
Note that this revision adds a very tiny bit of constant folding in the
sparse compiler lattice construction. Although I am generally trying to
avoid such canonicalizations (and rely on other passes to fix this instead),
the benefits of avoiding a very expensive disjunction lattice construction
justify having this special code (at least for now).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109939
Even with all parallel loops reading the output value is still allowed so we
don't have to handle reduction loops differently.
Differential Revision: https://reviews.llvm.org/D109851
Add the addTileLoopIvsToIndexOpResults method to shift the IndexOp results after tiling.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109761
The pattern is returning success even if it does no work leading to pattern application running up to the max iteration count and failing.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D109791
TosaOp defintion had an artificial constraint that the input/output types
needed to be ranked to invoke the quantization builder. This is correct as an
unranked tensor could still be quantized.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D109863
This enables the sparsification of more kernels, such as convolutions
where there is a x(i+j) subscript. It also enables more tensor invariants
such as x(1) or other affine subscripts such as x(i+1). Currently, we
reject sparsity altogether for such tensors. Despite this restriction,
however, we can already handle a lot more kernels with compound subscripts
for dense access (viz. convolution with dense input and sparse filter).
Some unit tests and an integration test demonstrate new capability.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109783
There are two main versions of depthwise conv depending whether the multiplier
is 1 or not. In cases where m == 1 we should use the version without the
multiplier channel as it can perform greater optimization.
Add lowering for the quantized/float versions to have a multiplier of one.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D108959
This revision fixes a corner case that could appear due to incorrect insertion point behavior in comprehensive bufferization.
Differential Revision: https://reviews.llvm.org/D109830
AliasInfo can now use union-find for a much more efficient implementation.
This brings no functional changes but large performance gains on more complex examples.
Differential Revision: https://reviews.llvm.org/D109819
Add the makeComposedExtractSliceOp method that creates an ExtractSliceOp and folds chains of ExtractSliceOps by computing the sum of their offsets and by multiplying their strides.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109601
This tiling option scalarizes all dynamic dimensions, i.e., it tiles all dynamic dimensions by 1.
This option is useful for linalg ops with partly dynamic tensor dimensions. E.g., such ops can appear in the partial iteration after loop peeling. After scalarizing dynamic dims, those ops can be vectorized.
Differential Revision: https://reviews.llvm.org/D109268
Only scf.for loops are supported at the moment. linalg.tiled_loop support will be added in a subsequent commit.
Only static tensor sizes are supported. Loops for dynamic tensor sizes can be peeled, but the generated code is not optimal due to a missing canonicalization pattern.
Differential Revision: https://reviews.llvm.org/D109043
This revision allows hoisting static alloc/dealloc pairs as high as possible during ComprehensiveBufferization.
This also aligns such allocated buffers to 128B by default.
This change exhibited some issues wrt insertion points and a missing copy that are also fixed in this revision; tests are updated accordingly.
Differential Revision: https://reviews.llvm.org/D109684
Previously, we would insert a DimOp and rely on later canonicalizations.
Unfortunately, reifyShape kind of rewrites are not canonicalizations anymore.
This introduces undesirable pass dependencies.
Instead, immediately reify the result shape and avoid the DimOp altogether.
This is akin to a local folding, which avoids introducing more reliance on `-resolve-shaped-type-result-dims` (similar to compositions of `affine.apply` by construction to avoid chains of size > 1).
It does not completely get rid of the reliance on the pass as the process is merely local: calling the pass may still be necessary for global effects. Indeed, one of the tests still requires the pass.
Differential Revision: https://reviews.llvm.org/D109571
Tosa.while shape inference requires repeatedly running shape inference across
the body of the loop until the types become static as we do not know the number
of iterations required by the loop body. Once the least specific arguments are
known they are propagated to both regions.
To determine the final end type, the least restrictive types are determined
from all yields.
Differential Revision: https://reviews.llvm.org/D108801
The original version of the bufferization pattern for linalg.generic would
manually clone operations within the region to the bufferized clone of the
operation. This triggers legality requirements on those operations in the
conversion infra. Instead, this now uses the rewriter to inline the region
instead, avoiding those legality requirements.
Differential Revision: https://reviews.llvm.org/D109581
Generate an scf.for instead of an scf.if for the partial iteration. This is for consistency reasons: The peeling of linalg.tiled_loop also uses another loop for the partial iteration.
Note: Canonicalizations patterns may rewrite partial iterations to scf.if afterwards.
Differential Revision: https://reviews.llvm.org/D109568
Extend the signature of the tile loop nest region builder to take all operand values to use and not just the scf::For iterArgs. This change allows us to pass in all block arguments of TiledLoop and use them directly instead of replacing them after the loop generation.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D109569
This revision fixes the traversal order of extract_slice during the inplace analysis.
It was previously thought that such ops could be analyzed at the very end.
This is unfortunately not true as the AliasInfo for dependents of these ops need to be updated.
This change allows the aliases introduced by the bufferization of extract_slice to be properly propagated.
Differential Revision: https://reviews.llvm.org/D109519
This PR adds missing AtomicRMWKind::min/max cases which we would like to use for min/max reduction loop vectorizations.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104881
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
Further enhance the set of operations that can be handled by the sparse compiler
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109413
Fix extra space print for llvm global op when the 'unamed_addr'
attribute was empty. This led to two spaces being printed in the custom
form between non-whitespace chars. A round trip would add an extra space
to a typical spaced form. NFC.
Differential Revision: https://reviews.llvm.org/D109502
Fold dim ops of scf.for results to dim ops of the respective iter args if the loop is shape preserving.
Differential Revision: https://reviews.llvm.org/D109430
Fold dim ops of linalg.tiled_loop results to dim ops of the respective iter args if the loop is shape preserving.
Differential Revision: https://reviews.llvm.org/D109431
Run a small analysis to see if the runtime type of the iter_arg is changing. Fold only if the runtime type stays the same. (Same as `DimOfIterArgFolder` in SCF.)
Differential Revision: https://reviews.llvm.org/D109299
When tiling a LinalgOp, extract_slice/insert_slice pairs are inserted. To avoid going out-of-bounds when the tile size does not divide the shape size evenly (at the boundary), AffineMin ops are inserted. Some ops have assumptions regarding the dimensions of inputs/outputs. E.g., in a `A * B` matmul, `dim(A, 1) == dim(B, 0)`. However, loop bounds use either `dim(A, 1)` or `dim(B, 0)`.
With this change, AffineMin ops are expressed in terms of loop bounds instead of tensor sizes. (Both have the same runtime value.) This simplifies canonicalizations.
Differential Revision: https://reviews.llvm.org/D109267
It looks like it was a typo. Instead of `*maybeConstantIndex`,
`initTensorOp.getStaticSize(*maybeConstantIndex)` should be used to access the
dim size of the tensor. There is a test for that in `canonicalize.mlir`, but it
was working correctly because `ReplaceStaticShapeDims` was canonicalizing DimOp
before `FoldInitTensorWithDimOp`. So, to make the patterns more "orthogonal",
this case is disabled.
Differential Revision: https://reviews.llvm.org/D109247
Previously only await inside the async function (coroutine after lowering to async runtime) would check the error state
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D109229
Create a gpu memset op and corresponding CUDA and ROCm wrappers.
Reviewed By: herhut, lorenrose1013
Differential Revision: https://reviews.llvm.org/D107548
This makes the IR more readable, in particular when this will be used on
the builtin func outside of the LLVM dialect.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D109209
The sparse index order must always be satisfied, but this
may give a choice in topsorts for several cases. We broke
ties in favor of any dense index order, since this gives
good locality. However, breaking ties in favor of pushing
unrelated indices into sparse iteration spaces gives better
asymptotic complexity. This revision improves the heuristic.
Note that in the long run, we are really interested in using
ML for ML to find the best loop ordering as a replacement for
such heuristics.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109100
DialectAsmParser::parseKeyword is rejecting `'i' digit+` while it is
a valid identifier according to mlir/docs/LangRef.md.
Integer types actually used to be TOK_KEYWORD a while back before the
change: 6af866c58d.
This patch Modifies `isCurrentTokenAKeyword` to return true for tokens that
match integer types too.
The motivation for this change is the parsing of `!fir.type<{` `component-name: component-type,`+ `}>`
type in FIR that represent Fortran derived types. The component-names are
parsed as keywords, and can very well be i32 or any ixxx (which are
valid Fortran derived type component names).
The Quant dialect type parser had to be modified since it relied on `iw` not
being parsed as keywords.
Differential Revision: https://reviews.llvm.org/D108913
The limitation on iter_args introduced with D108806 is too restricting. Changes of the runtime type should be allowed.
Extends the dim op canonicalization with a simple analysis to determine when it is safe to canonicalize.
Differential Revision: https://reviews.llvm.org/D109125
Add an operation omp.critical.declare to declare names/symbols of
critical sections. Named omp.critical operations should use symbols
declared by omp.critical.declare. Having a declare operation ensures
that the names of critical sections are global and unique. In the
lowering flow to LLVM IR, the OpenMP IRBuilder creates unique names
for critical sections.
Reviewed By: ftynse, jeanPerier
Differential Revision: https://reviews.llvm.org/D108713
This patch is to add Image Operands in SPIR-V Dialect and also let ImageDrefGather to use Image Operands.
Image Operands are used in many image instructions. "Image Operands encodes what oprands follow, as per Image Operands". And ususally, they are optional to image instructions.
The format of image operands looks like:
%0 = spv.ImageXXXX %1, ... %3 : f32 ["Bias|Lod"](%4, %5 : f32, f32) -> ...
This patch doesn’t implement all operands (see Section 3.14 in SPIR-V Spec) but provides a skeleton of it. There is TODO in verifyImageOperands function.
Co-authored: Alan Liu <alanliu.yf@gmail.com>
Reviewed by: antiagainst
Differential Revision: https://reviews.llvm.org/D108501
The output tensor was added for tiling purposes. With use of
`TilingInterface` for tiling pad operations, there is no need for an
explicit operand for the shape of result of `linalg.pad_tensor`
op. The interface allows the tiling pattern to query the value that
can be used for the "init" needed for tiling dynamically.
Differential Revision: https://reviews.llvm.org/D108613
This aligns the printer with the parser contract: the operation isn't part of the user-controllable part of the syntax.
Differential Revision: https://reviews.llvm.org/D108804
Don't assert fail on strided memrefs when dropping unit dims.
Instead just leave them unchanged.
Differential Revision: https://reviews.llvm.org/D108205
An interface to allow for tiling of operations is introduced. The
tiling of the linalg.pad_tensor operation is modified to use this
interface.
Differential Revision: https://reviews.llvm.org/D108611
The StringAttr version doesn't need a context, so we can just use the
existing `SymbolRefAttr::get` form. The StringRef version isn't preferred
so we want to encourage people to use StringAttr.
There is an additional form of getSymbolRefAttr that takes a (SymbolTrait
implementing) operation. This should also be moved, but I'll do that as
a separate patch.
Differential Revision: https://reviews.llvm.org/D108922
SymbolRefAttr is fundamentally a base string plus a sequence
of nested references. Instead of storing the string data as
a copies StringRef, store it as an already-uniqued StringAttr.
This makes a lot of things simpler and more efficient because:
1) references to the symbol are already stored as StringAttr's:
there is no need to copy the string data into MLIRContext
multiple times.
2) This allows pointer comparisons instead of string
comparisons (or redundant uniquing) within SymbolTable.cpp.
3) This allows SymbolTable to hold a DenseMap instead of a
StringMap (which again copies the string data and slows
lookup).
This is a moderately invasive patch, so I kept a lot of
compatibility APIs around. It would be nice to explore changing
getName() to return a StringAttr for example (right now you have
to use getNameAttr()), and eliminate things like the StringRef
version of getSymbol.
Differential Revision: https://reviews.llvm.org/D108899
* Add `DimOfIterArgFolder`.
* Move existing cross-dialect canonicalization patterns to `LoopCanonicalization.cpp`.
* Rename `SCFAffineOpCanonicalization` pass to `SCFForLoopCanonicalization`.
* Expand documentaton of scf.for: The type of loop-carried variables may not change with iterations. (Not even the dynamic type.)
Differential Revision: https://reviews.llvm.org/D108806
* Add batched version of all `addId` variants, so that multiple IDs can be added at a time.
* Rename `addId` and variants to `insertId` and `appendId`. Most external users call `appendId`. Splitting `addId` into two functions also makes it possible to provide batched version for both. (Otherwise, the overloads are ambigious when calling `addId`.)
Differential Revision: https://reviews.llvm.org/D108532
This allows for parsing strings that have escape sequences, which require constructing
a string (as they can't be represented by looking at the Token contents directly).
Differential Revision: https://reviews.llvm.org/D108589
* Add support for affine.max ops to SCF loop peeling pattern.
* Add support for affine.max ops to `AffineMinSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalizationPattern` to `AffineOpSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalization` pass to `SCFAffineOpCanonicalization`.
Differential Revision: https://reviews.llvm.org/D108009
This canonicalization simplifies affine.min operations inside "for loop"-like operations (e.g., scf.for and scf.parallel) based on two invariants:
* iv >= lb
* iv < lb + step * ((ub - lb - 1) floorDiv step) + 1
This commit adds a new pass `canonicalize-scf-affine-min` (instead of being a canonicalization pattern) to avoid dependencies between the Affine dialect and the SCF dialect.
Differential Revision: https://reviews.llvm.org/D107731
Introduces new Ops to represent 1. alias.scope metadata in LLVM, and 2. domains for these scopes. These correspond to the metadata described in https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata. Lists of scopes are modeled the same way as access groups - as an ArrayAttr on the Op (added in https://reviews.llvm.org/D97944).
Lowering 'noalias' attributes on function parameters is already supported. However, lowering `noalias` metadata on individual Ops is not, which is added in this change. LLVM uses the same keyword for these, but this change introduces a separate attribute name 'noalias_scopes' to represent this distinct concept.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107870
If additional static type information can be deduced from a insert_slice's size operands, insert an explicit cast of the op's source operand.
This enables other canonicalization patterns that are matching for tensor_cast ops such as `ForOpTensorCastFolder` in SCF.
Differential Revision: https://reviews.llvm.org/D108617
Rationale:
Passing in a pointer to the memref data in order to implement the
dense to sparse conversion was a bit too low-level. This revision
improves upon that approach with a cleaner solution of generating
a loop nest in MLIR code itself that prepares the COO object before
passing it to our "swiss army knife" setup. This is much more
intuitive *and* now also allows for dynamic shapes.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108491
This revision adds native ODS support for VariadicOfVariadic operand
groups. An example of this is the SwitchOp, which has a variadic number
of nested operand ranges for each of the case statements, where the
number of case statements is variadic. Builtin ODS support allows for
generating proper accessors for the nested operand ranges, builder
support, and declarative format support. VariadicOfVariadic operands
are supported by providing a segment attribute to use to store the
operand groups, mapping similarly to the AttrSizedOperand trait
(but with a user defined attribute name).
`build` methods for VariadicOfVariadic operand expect inputs of the
form `ArrayRef<ValueRange>`. Accessors for the variadic ranges
return a new `OperandRangeRange` type, which represents a
contiguous range of `OperandRange`. In the declarative assembly
format, VariadicOfVariadic operands and types are by default
formatted as a comma delimited list of value lists:
`(<value>, <value>), (), (<value>)`.
Differential Revision: https://reviews.llvm.org/D107774
Do not apply loop peeling to loops that are contained in the partial iteration of an already peeled loop. This is to avoid code explosion when dealing with large loop nests. Can be controlled with a new pass option `skip-partial`.
Differential Revision: https://reviews.llvm.org/D108542
Multiple operations were still defined as TC ops that had equivalent versions
as YAML operations. Reducing to a single compilation path guarantees that
frontends can lower to their equivalent operations without missing the
optimized fastpath.
Some operations are maintained purely for testing purposes (mainly conv{1,2,3}D
as they are included as sole tests in the vectorizaiton transforms.
Differential Revision: https://reviews.llvm.org/D108169
Previously, ExecuteRegionOps with multiple return values would fail a round-trip test due to missing parenthesis around the types.
Differential Revision: https://reviews.llvm.org/D108402
Apply the "for loop peeling" pattern from SCF dialect transforms. This pattern splits scf.for loops into full and partial iterations. In the full iteration, all masked loads/stores are canonicalized to unmasked loads/stores.
Differential Revision: https://reviews.llvm.org/D107733
Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
This shares more code with existing utilities. Also, to be consistent,
we moved dimension permutation on the DimOp to the tensor lowering phase.
This way, both pre-existing DimOps on sparse tensors (not likely but
possible) as well as compiler generated DimOps are handled consistently.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108309