Convert data operands from the acc.parallel operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103337
Implements better naming for results of spv.mlir.addressof ops by making it
inherit from OpAsmOpInterface and implementing the associated
getAsmResultName(...) hook.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D103594
* Add hasUnitStride and hasZeroOffset to OffsetSizeAndStrideOpInterface. These functions are useful for various patterns. E.g., some vectorization patterns apply only for tensor ops with zero offsets and/or unit stride.
* Add getConstantIntValue and isEqualConstantInt helper functions, which are useful for implementing the two above functions, as well as various patterns.
Differential Revision: https://reviews.llvm.org/D103763
Controlled by a compiler option, if 32-bit indices can be handled
with zero/sign-extention alike (viz. no worries on non-negative
indices), scatter/gather operations can use the more efficient
32-bit SIMD version.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D103632
* Rename PadTensorOpVectorizationPattern to GenericPadTensorOpVectorizationPattern.
* Make GenericPadTensorOpVectorizationPattern a private pattern, to be instantiated via populatePadTensorOpVectorizationPatterns.
* Factor out parts of PadTensorOpVectorizationPattern into helper functions.
This commit prepares PadTensorOpVectorizationPattern for a series of subsequent commits that add more specialized PadTensorOp vectorization patterns.
Differential Revision: https://reviews.llvm.org/D103681
Convert data operands from the acc.data operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103332
Introduces a test pass that rewrites PadTensorOps with static shapes as a sequence of:
```
linalg.init_tensor // to create output
linalg.fill // to initialize with padding value
linalg.generic // to copy the original contents to the padded tensor
```
The pass can be triggered with:
- `--test-linalg-transform-patterns="test-transform-pad-tensor"`
Differential Revision: https://reviews.llvm.org/D102804
Replace the uses of deprecated Structured Op Interface methods in TestLinalgElementwiseFusion.cpp, TestLinalgFusionTransforms.cpp, and Transforms.cpp. The patch is based on https://reviews.llvm.org/D103394.
Differential Revision: https://reviews.llvm.org/D103528
Adding methods to access operand properties via OpOperands and mark outdated methods as deprecated.
Differential Revision: https://reviews.llvm.org/D103394
Implements better naming for results of `spv.Constant` ops by making it
inherit from OpAsmOpInterface and implementing the associated
getAsmResultName(...) hook.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D103152
Depends On D103109
If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103203
Depends On D103102
Not yet implemented:
1. Error handling after synchronous await
2. Error handling for async groups
Will be addressed in the followup PRs
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103109
Support reference counted values implicitly passed (live) only to some of the successors.
Example: if branched to ^bb2 token will leak, unless `drop_ref` operation is properly created
```
^entry:
%token = async.runtime.create : !async.token
cond_br %cond, ^bb1, ^bb2
^bb1:
async.runtime.await %token
async.runtime.drop_ref %token
br ^bb2
^bb2:
return
```
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103102
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.
Differential Revision: https://reviews.llvm.org/D103023
This revision refactors and simplifies the pattern detection logic: thanks to SSA value properties, we can actually look at all the uses of a given value and avoid having to pattern-match specific chains of operations.
A bufferization pattern for subtensor is added and specific inplaceability analysis is implemented for the simple case of subtensor. More advanced use cases will follow.
Differential revision: https://reviews.llvm.org/D102512
Allow support for specifying empty IVs in an `affine.parallel`.
For example:
```
affine.parallel () = () to () {
affine.yield
}
```
Reviewed By: bondhugula, jbruestle
Differential Revision: https://reviews.llvm.org/D102895
Prevent users of `iter_args` of an affine for loop from being hoisted
out of it. Otherwise, LICM leads to a violation of the SSA dominance
(as demonstrated in the added test case).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=50103
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D102984
This previously handled memref::SubviewOp, but this can be extended to
all ops implementing the interface.
Differential Revision: https://reviews.llvm.org/D103076
Fix inconsistent MLIR CMake variable names. Consistently name them as
MLIR_ENABLE_<feature>.
Eg: MLIR_CUDA_RUNNER_ENABLED -> MLIR_ENABLE_CUDA_RUNNER
MLIR follows (or has mostly followed) the convention of naming
cmake enabling variables in the from MLIR_ENABLE_... etc. Using a
convention here is easy and also important for convenience. A counter
pattern was started with variables named MLIR_..._ENABLED. This led to a
sequence of related counter patterns: MLIR_CUDA_RUNNER_ENABLED,
MLIR_ROCM_RUNNER_ENABLED, etc.. From a naming standpoint, the imperative
form is more meaningful. Additional discussion at:
https://llvm.discourse.group/t/mlir-cmake-enable-variable-naming-convention/3520
Switch all inconsistent ones to the ENABLE form. Keep the couple of old
mappings needed until buildbot config is migrated.
Differential Revision: https://reviews.llvm.org/D102976
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.
This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101921
This revision completes the "dimension ordering" feature
of sparse tensor types that enables the programmer to
define a preferred order on dimension access (other than
the default left-to-right order). This enables e.g. selection
of column-major over row-major storage for sparse matrices,
but generalized to any rank, as in:
dimOrdering = affine_map<(i,j,k,l,m,n,o,p) -> (p,o,j,k,i,l,m,n)>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102856
The previous implementation did not handle casting behavior properly and
did not consider aliases.
Differential Revision: https://reviews.llvm.org/D102785
This pattern inlines operands to a linalg.generic operation that use a constant
index and hence are loop-invariant scalars. This reduces the number of
linalg.generic operands and unlocks some canonicalizations that rely on seeing
an explicit tensor.extract.
Differential Revision: https://reviews.llvm.org/D102682
Skip the sparsification pass for Linalg ops without annotated tensors
(or cases that are not properly handled yet).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102787
The patch extends the yaml code generation to support the following new OpDSL constructs:
- captures
- constants
- iteration index accesses
- predefined types
These changes have been introduced by revision
https://reviews.llvm.org/D101364.
Differential Revision: https://reviews.llvm.org/D102075
VectorTransferPermutationMapLoweringPatterns can be enabled via a pass option. These additional patterns lower permutation maps to minor identity maps with broadcasting, if possible, allowing for more efficient vector load/stores. The option is deactivated by default.
Differential Revision: https://reviews.llvm.org/D102593
LinalgOps that are all parallel do not use the value of `outs`
tensor. The semantics is that the `outs` tensor is fully
overwritten. Using anything other than `init_tensor` can add false
dependencies between operations, when the use is just for the shape of
the tensor. Adding a canonicalization to always use `init_tensor` in
such cases, breaks this dependence.
Differential Revision: https://reviews.llvm.org/D102561
Original interfaces are not safe to be called during dialect conversion.
This is because some ops (e.g. `dynamic_reshape(input, target_shape)`)
depend on the values of their operands to calculate the output shape.
However the operands may be out of reach during dialect conversion (e.g.
converting from tensor world to buffer world). This patch provides a new
kind of interface which accpets user-provided operands to solve this
problem.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D102317
- Enables inferring return type for ConstShape, takes into account valid return types;
- The compatible return type function could be reused, leaving that for next use refactoring;
Differential Revision: https://reviews.llvm.org/D102182
The experimental flag for "inplace" bufferization in the sparse
compiler can be replaced with the new inplace attribute. This gives
a uniform way of expressing the more efficient way of bufferization.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102538
Broadcast dimensions of vector transfer ops are always in-bounds. This is consistent with the fact that the starting position of a transfer is always in-bounds.
Differential Revision: https://reviews.llvm.org/D102566
Splitting the memref dialect lead to an introduction of several dependencies
to avoid compilation issues. The canonicalize pass also depends on the
memref dialect, but it shouldn't. This patch resolves the dependencies
and the unintuitive includes are removed. However, the dependency moves
to the constructor of the std dialect.
Differential Revision: https://reviews.llvm.org/D102060
Replace the templated linalgLowerOpToLoops method by three specialized methods linalgOpToLoops, LinalgOpToParallelLoops, and linalgOpToAffineLoops.
Differential Revision: https://reviews.llvm.org/D102324
Add TransferWritePermutationLowering, which replaces permutation maps of TransferWriteOps with vector.transpose.
Differential Revision: https://reviews.llvm.org/D102548
We are moving from just dense/compressed to more general dim level
types, so we need more than just an "i1" array for annotations.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102520
This change allows the SRC and DST of dma_start operations to be located in the
same memory space. This applies to both the Affine dialect and Memref dialect
versions of these Ops. The documention has been updated to reflect this by
explicitly stating overlapping memory locations are not supported (undefined
behavior).
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D102274
This covers the extremely common case of replacing all uses of a Value
with a new op that is itself a user of the original Value.
This should also be a little bit more efficient than the
`SmallPtrSet<Operation *, 1>{op}` idiom that was being used before.
Differential Revision: https://reviews.llvm.org/D102373
Support OpImageQuerySize in spirv dialect
co-authored-by: Alan Liu <alanliu.yf@gmail.com>
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D102029
Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.
This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.
Differential Revision: https://reviews.llvm.org/D101745
Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.
This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.
Differential Revision: https://reviews.llvm.org/D101745
The current static checker for linalg does not work on the decreasing
index cases well. So, this is to Update the current static bound checker
for linalg to cover decreasing index cases.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D102302
Add a conversion pass to convert higher-level type before translation.
This conversion extract meangingful information and pack it into a struct that
the translation (D101504) will be able to understand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D102170
First step in adding alignment as an attribute to MLIR global definitions. Alignment can be specified for global objects in LLVM IR. It can also be specified as a named attribute in the LLVMIR dialect of MLIR. However, this attribute has no standing and is discarded during translation from MLIR to LLVM IR. This patch does two things: First, it adds the attribute to the syntax of the llvm.mlir.global operation, and by doing this it also adds accessors and verifications. The syntax is "align=XX" (with XX being an integer), placed right after the value of the operation. Second, it allows transforming this operation to and from LLVM IR. It is checked whether the value is an integer power of 2.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D101492
This is actually necessary for correctness, as memref.reinterpret_cast
doesn't verify if the output shape doesn't match the static sizes.
Differential Revision: https://reviews.llvm.org/D102232
VectorTransfer split previously only split read xfer ops. This adds
the same logic to write ops. The resulting code involves 2
conditionals for write ops while read ops only needed 1, but the created
ops are built upon the same patterns, so pattern matching/expectations
are all consistent other than in regards to the if/else ops.
Differential Revision: https://reviews.llvm.org/D102157
All glue and clutter in the linalg ops has been replaced by proper
sparse tensor type encoding. This code is no longer needed. Thanks
to ntv@ for giving us a temporary home in linalg.
So long, and thanks for all the fish.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102098
A very elaborate, but also very fun revision because all
puzzle pieces are finally "falling in place".
1. replaces lingalg annotations + flags with proper sparse tensor types
2. add rigorous verification on sparse tensor type and sparse primitives
3. removes glue and clutter on opaque pointers in favor of sparse tensor types
4. migrates all tests to use sparse tensor types
NOTE: next CL will remove *all* obsoleted sparse code in Linalg
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102095
According to the API contract, LinalgLoopDistributionOptions
expects to work on parallel iterators. When getting processor
information, only loop ranges for parallel dimensions should
be fed in. But right now after generating scf.for loop nests,
we feed in *all* loops, including the ones materialized for
reduction iterators. This can cause unexpected distribution
of reduction dimensions. This commit fixes it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D102079
In the buffer deallocation pass, unranked memref types are not properly supported.
After investigating this issue, it turns out that the Clone and Dealloc operation
does not support unranked memref types in the current implementation.
This patch adds the missing feature and enables the transformation of any memref
type.
This patch solves this bug: https://bugs.llvm.org/show_bug.cgi?id=48385
Differential Revision: https://reviews.llvm.org/D101760
The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries.
This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision.
This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.
Differential Revision: https://reviews.llvm.org/D102035
Replace all `linalg.indexed_generic` ops by `linalg.generic` ops that access the iteration indices using the `linalg.index` op.
Differential Revision: https://reviews.llvm.org/D101612
The pattern to convert subtensor ops to their rank-reduced versions
(by dropping unit-dims in the result) can also convert to a zero-rank
tensor. Handle that case.
This also fixes a OOB access bug in the existing pattern for such
cases.
Differential Revision: https://reviews.llvm.org/D101949
This expose a lambda control instead of just a boolean to control unit
dimension folding.
This however gives more control to user to pick a good heuristic.
Folding reshapes helps fusion opportunities but may generate sub-optimal
generic ops.
Differential Revision: https://reviews.llvm.org/D101917
Fixing a minor bug which lead to element type of the output being
modified when folding reshapes with generic op.
Differential Revision: https://reviews.llvm.org/D101942
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.
This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:
- TargetTriple
- ObjectFileType
- SubtargetInfo
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101462
These instructions map to SVE-specific instrinsics that accept a
predicate operand to support control flow in vector code.
Differential Revision: https://reviews.llvm.org/D100982
This patch adds support for vectorizing loops with 'iter_args'
implementing known reductions along the vector dimension. Comparing to
the non-vector-dimension case, two additional things are done during
vectorization of such loops:
- The resulting vector returned from the loop is reduced to a scalar
using `vector.reduce`.
- In some cases a mask is applied to the vector yielded at the end of
the loop to prevent garbage values from being written to the
accumulator.
Vectorization of reduction loops is disabled by default. To enable it, a
map from loops to array of reduction descriptors should be explicitly passed to
`vectorizeAffineLoops`, or `vectorize-reductions=true` should be passed
to the SuperVectorize pass.
Current limitations:
- Loops with a non-unit step size are not supported.
- n-D vectorization with n > 1 is not supported.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100694
The old index op handling let the new index operations point back to the
producer block. As a result, after fusion some index operations in the
fused block had back references to the old producer block resulting in
illegal IR. The patch now relies on a block and value mapping to avoid
such back references.
Differential Revision: https://reviews.llvm.org/D101887
While we figure out how to best add Standard support for scalable
vectors, these instructions provide a workaround for basic arithmetic
between scalable vectors.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100837
This revision migrates more code from Linalg into the new permanent home of
SparseTensor. It replaces the test passes with proper compiler passes.
NOTE: the actual removal of the last glue and clutter in Linalg will follow
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101811
TransferReadOps that are a scalar read + broadcast are handled by TransferReadToVectorLoadLowering.
Differential Revision: https://reviews.llvm.org/D101808
Given the source and destination shapes, if they are static, or if the
expanded/collapsed dimensions are unit-extent, it is possible to
compute the reassociation maps that can be used to reshape one type
into another. Add a utility method to return the reassociation maps
when possible.
This utility function can be used to fuse a sequence of reshape ops,
given the type of the source of the producer and the final result
type. This pattern supercedes a more constrained folding pattern added
to DropUnitDims pass.
Differential Revision: https://reviews.llvm.org/D101343
Convert subtensor and subtensor_insert operations to use their
rank-reduced versions to drop unit dimensions.
Differential Revision: https://reviews.llvm.org/D101495
The current implementation had a bug as it was relying on the target vector
dimension sizes to calculate where to insert broadcast. If several dimensions
have the same size we may insert the broadcast on the wrong dimension. The
correct broadcast cannot be inferred from the type of the source and
destination vector.
Instead when we want to extend transfer ops we calculate an "inverse" map to the
projected permutation and insert broadcast in place of the projected dimensions.
Differential Revision: https://reviews.llvm.org/D101738
Move TransposeOp lowering in its own populate function as in some cases
it is better to keep it during ContractOp lowering to better
canonicalize it rather than emiting scalar insert/extract.
Differential Revision: https://reviews.llvm.org/D101647
Added canonicalization for vector_load and vector_store. An existing
pattern SimplifyAffineOp can be reused to compose maps that supplies
result into them. Added AffineVectorStoreOp and AffineVectorLoadOp
into static_assert of SimplifyAffineOp to allow operation to use it.
This fixes the bug filed: https://bugs.llvm.org/show_bug.cgi?id=50058
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D101691
(1) migrates the encoding from TensorDialect into the new SparseTensorDialect
(2) replaces dictionary-based storage and builders with struct-like data
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D101669
Three patterns are added to convert into vector.multi_reduction into a
sequence of vector.reduction as the following:
- Transpose the inputs so inner most dimensions are always reduction.
- Reduce rank of vector.multi_reduction into 2d with inner most
reduction dim (get the 2d canical form)
- 2D canonical form is converted into a sequence of vector.reduction.
There are two things we might worth in a follow up diff:
- An scf.for (maybe optionally) around vector.reduction instead of unrolling it.
- Breakdown the vector.reduction into a sequence of vector.reduction
(e.g tree-based reduction) instead of relying on how downstream dialects
handle it.
Note: this will requires passing target-vector-length
Differential Revision: https://reviews.llvm.org/D101570
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Differential Revision: https://reviews.llvm.org/D101573
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101488
This enables to express more complex parallel loops in the affine framework,
for example, in cases of tiling by sizes not dividing loop trip counts perfectly
or inner wavefront parallelism, among others. One can't use affine.max/min
and supply values to the nested loop bounds since the results of such
affine.max/min operations aren't valid symbols. Making them valid symbols
isn't an option since they would introduce selection trees into memref
subscript arithmetic as an unintended and undesired consequence. Also
add support for converting such loops to SCF. Drop some API that isn't used in
the core repo from AffineParallelOp since its semantics becomes ambiguous in
presence of max/min bounds. Loop normalization is currently unavailable for
such loops.
Depends On D101171
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D101172
Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.
Reviewed By: chelini, kumasento, bondhugula
Differential Revision: https://reviews.llvm.org/D101171
FillOp allows complex ops, and filling a properly sized buffer with
a default zero complex number is implemented.
Differential Revision: https://reviews.llvm.org/D99939
This revision adds support for vectorizing more general linalg operations with projected permutation maps.
This is achieved by eagerly broadcasting the intermediate vector to the common size
of the iteration domain of the linalg op. This allows a much more natural expression of
generalized vectorization but may introduce additional computations until all the
proper canonicalizations are implemented.
This generalization modifies the vector.transfer_read/write permutation logic and
exposes the fact that the logic employed in vector.contract was too ad-hoc.
As a consequence, changes occur in the permutation / transposition logic for contraction. In turn this prompts supporting more cases in the lowering of contract
to matrix intrinsics, which is required to make the corresponding tests pass.
Differential revision: https://reviews.llvm.org/D101165
Canonicalizations for subtensor operations defaulted to use the
rank-reduced version of the operation, but the cast inserted to get
back the original type would be illegal if the rank was actually
reduced. Instead make the canonicalization not reduce the rank of the
operation.
Differential Revision: https://reviews.llvm.org/D101258
The current canonicalization did not remap operation results correctly
and attempted to erase tiledLoop, which is incorrect if not all tensor
results are folded.
Tensor inputs, if not used in the body of TiledLoopOp, can be removed.
memref::CastOp can be folded into TiledLoopOp as well.
Differential Revision: https://reviews.llvm.org/D101445
Both, `shape.broadcast` and `shape.cstr_broadcastable` accept dynamic and static
extent tensors. If their operands are casted, we can use the original value
instead.
Differential Revision: https://reviews.llvm.org/D101376
Empty extent tensor operands were only removed when they were defined as a
constant. Additionally, we can remove them if they are known to be empty by
their type `tensor<0xindex>`.
Differential Revision: https://reviews.llvm.org/D101351
Splat constant folding was limited to `std.constant` operations. Instead, use
the constant matcher and apply splat constant folding to any constant-like
operation that holds a splat attribute.
Differential Revision: https://reviews.llvm.org/D101301
The new "encoding" field in tensor types so far had no meaning. This revision introduces:
1. an encoding attribute interface in IR: for verification between tensors and encodings in general
2. an attribute in Tensor dialect; #tensor.sparse<dict> + concrete sparse tensors API
Active discussion:
https://llvm.discourse.group/t/rfc-introduce-a-sparse-tensor-type-to-core-mlir/2944/
Reviewed By: silvas, penpornk, bixia
Differential Revision: https://reviews.llvm.org/D101008
Add two canoncalizations for scf.if.
1) A canonicalization that allows users of a condition within an if to assume the condition
is true if in the true region, etc.
2) A canonicalization that removes yielded statements that are equivalent to the condition
or its negation
Differential Revision: https://reviews.llvm.org/D101012
Ensure to preserve the correct type during when folding and canonicalization.
`shape.broadcast` of of a single operand can only be folded away if the argument
type is correct.
Differential Revision: https://reviews.llvm.org/D101158
Eliminate empty shapes from the operands, partially fold all constant shape
operands, and fix normal folding.
Differential Revision: https://reviews.llvm.org/D100634
The interchange option attached to the linalg to loop lowering affects only the loops and does not update the memory accesses generated in to body of the operation. Instead of performing the interchange during the loop lowering use the interchange pattern.
Differential Revision: https://reviews.llvm.org/D100758
Example:
```
%0 = linalg.init_tensor : tensor<...>
%1 = linalg.generic ... outs(%0: tensor<...>)
%2 = linalg.generic ... outs(%0: tensor<...>)
```
Memref allocated as a result of `init_tensor` bufferization can be incorrectly overwritten by the second linalg.generic operation
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D100921
This will prevent fusion that spains all dims and generates
(d0, d1, ...) -> () reshape that isn't legal
Differential Revision: https://reviews.llvm.org/D100805
Break up the dependency between SCF ops and substituteMin helper and make a
more generic version of AffineMinSCFCanonicalization. This reduce dependencies
between linalg and SCF and will allow the logic to be used with other kind of
ops. (Like ID ops).
Differential Revision: https://reviews.llvm.org/D100321
Previously, any terminator without ReturnLike and BranchOpInterface traits (e.g. scf.condition) were causing pass to fail.
Differential Revision: https://reviews.llvm.org/D100832
Instead of always running the region builder check if the generalized op has a region attached. If yes inline the existing region instead of calling the region builder. This change circumvents a problem with named operations that have a region builder taking captures and the generalization pass not knowing about this captures.
Differential Revision: https://reviews.llvm.org/D100880
The current implementation allows for TransferWriteOps with broadcasts that do not make sense. E.g., a broadcast could write a vector into a single (scalar) memory location, which is effectively the same as writing only the last element of the vector.
Differential Revision: https://reviews.llvm.org/D100842
The patch extends the vectorization pass to lower linalg index operations to vector code. It allocates constant 1d vectors that enumerate the indexes along the iteration dimensions and broadcasts/transposes these 1d vectors to the iteration space.
Differential Revision: https://reviews.llvm.org/D100373
This patch extends the control-flow cost-model for detensoring by
implementing a forward-looking pass on block arguments that should be
detensored. This makes sure that if a (to-be-detensored) block argument
"escapes" its block through the terminator, then the successor arguments
are also detensored.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D100457
The patch replaces the index operations in the body of fused producers and linearizes the indices after expansion.
Differential Revision: https://reviews.llvm.org/D100479
Update the dimensions of the index operations to account for dropped dimensions and replace the index operations of dropped dimensions by zero.
Differential Revision: https://reviews.llvm.org/D100395
This patch add the UnnamedAddr attribute for the GlobalOp in the LLVM
dialect. The attribute is also handled to and from LLVM IR.
This is meant to be used in a follow up patch to lower OpenACC/OpenMP ops to
call to kmp and tgt runtime calls (D100678).
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D100677
Instead of interchanging loops during the loop lowering this pass performs the interchange by permuting the indexing maps. It also updates the iterator types and the index accesses in the body of the operation.
Differential Revision: https://reviews.llvm.org/D100627
Move the existing optimization for transfer op on tensor to folder and
canonicalization. This handles the write after write case and read after write
and also add write after read case.
Differential Revision: https://reviews.llvm.org/D100597
ArmSVE dialect is behind the recent changes in how the Vector dialect
interacts with backend vector dialects and the MLIR -> LLVM IR
translation module. This patch cleans up ArmSVE initialization within
Vector and removes the need for an LLVMArmSVE dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D100171
In the long run, we want to unify the dot product codegen solutions between
all target architectures, but this intrinsic enables experimenting with AVX
specific implementations in the meantime.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100593
Rationale:
Now that vector<?xindex> is allowed, the restriction on vectorization
of index types in the sparse compiler can be removed. Also needs
generalization of scatter/gather index types.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D100522
This patch collects operations that have users in a for loop and uses
them when loop invariant operations are detected and hoisted.
Reviewed By: bondhugula, vinayaka-polymage
Differential Revision: https://reviews.llvm.org/D99761
Per the SPIR-V spec "2.16.2. Validation Rules for Shader Capabilities":
Composite objects in the StorageBuffer, PhysicalStorageBuffer,
Uniform, and PushConstant Storage Classes must be explicitly
laid out.
For other cases we don't need to attach the struct offsets.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D100386
The patch updates the tiling pass to add the tile offsets to the indices returned by the linalg operations.
Differential Revision: https://reviews.llvm.org/D100379
The patch extends the linalg to loop lowering pass to replace all linalg index operations by the induction variables of the generated loop nests.
Differential Revision: https://reviews.llvm.org/D100364
This patch introduces the neccessary infrastructure changes to implement
cost-modelling for detensoring. In particular, it introduces the
following changes:
- An extension to the dialect conversion framework to selectively
convert sub-set of non-entry BB arguments.
- An extension to branch conversion pattern to selectively convert
sub-set of a branche's operands.
- An interface for detensoring cost-modelling.
- 2 simple implementations of 2 different cost models.
This sets the stage to explose cost-modelling for detessoring in an
easier way. We still need to come up with better cost models.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D99945
Depends On D95311
Previous automatic-ref-counting pass worked with high level async operations (e.g. async.execute), however async values reference counting is a runtime implementation detail.
New pass mostly relies on the save liveness analysis to place drop_ref operations, and does better verification of CFG with different liveIn sets in block successors.
This is almost NFC change. No new reference counting ideas, just a cleanup of the previous version.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D95390
This is similar to the definition of llvm.switch, providing
unstructured branch-based control flow. It differs from the LLVM
operation in that it accepts any signless integer (not only an i32),
takes no branch weights (the same as the Branch and CondBranch ops),
and has a slightly different syntax for the default case that includes
it in the list of cases with an explicit `default` keyword.
Also included are several canonicalizers.
See https://llvm.discourse.group/t/rfc-add-std-switch-and-scf-switch/3090
Reviewed By: rriddle, bondhugula
Differential Revision: https://reviews.llvm.org/D99925
The stride should be calculated with the converted array element
type, not the original input type.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D100337
These patterns have been used as a prerequisite step for lowering
to SPIR-V. But they don't involve SPIR-V dialect ops; they are
pure memref/vector op transformations. Given now we have a dedicated
MemRef dialect, moving them to Memref/Transforms/, which is a more
suitable place to host them, to allow used by others.
This commit just moves code around and renames patterns/passes
accordingly. CMakeLists.txt for existing MemRef libraries are
also improved along the way.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D100326
We will soon be adding non-AVX512 operations to MLIR, such as AVX's rsqrt. In https://reviews.llvm.org/D99818 several possibilities were discussed, namely to (1) add non-AVX512 ops to the AVX512 dialect, (2) add more dialects (e.g. AVX dialect for AVX rsqrt), and (3) expand the scope of the AVX512 to include these SIMD x86 ops, thereby renaming the dialect to something more accurate such as X86Vector.
Consensus was reached on option (3), which this patch implements.
Reviewed By: aartbik, ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100119
Fusing a constant with a linalg.generic operation can result in the
fused operation being illegal since the loop bound computation
fails. Avoid such fusions.
Differential Revision: https://reviews.llvm.org/D100272
The `linalg.index` operation provides access to the iteration indexes of immediately enclosing linalg operations. It takes a dimension `dim` attribute and returns the iteration index in the given dimension. Having `linalg.index` allows us to unify `linalg.generic` and `linalg.indexed_generic` and also enables index access in named operations.
Differential Revision: https://reviews.llvm.org/D100292
Recent change enable dropping unit-trip loops of "reduction" iterator
type as well. This is fine as long as there is one other "reduction"
iterator in the operation. Without this the initialized value (value
of `out`) is not read which leads to a correctness issue.
Also fix a bug in the `fill` -> `tensor_reshape` folding. The `out`
operand of the `fill` needs to be reshaped to get the `out` operand of
the generated `fill` operation.
Differential Revision: https://reviews.llvm.org/D100145
This patch doesn't support the optional operands of ImageDrefGather. The support of optional operands will be implemented later.
co-authered-by: Alan Liu <alanliu.yf@gmail.com>
Differential Revision: https://reviews.llvm.org/D100128
This patch unconditionally converts i1 types to i8 types on memrefs. If the
extensions or capabilities are not met, they will be converted to i32. Hence the
logic in IntLoadPattern and IntStorePattern are also updated.
Also added the implementation of SPIRVTypeConverter::getOptions().
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D99724
Non-32-bit scalar types require special hardware support that may not
exist on all GPUs. This is reflected in SPIR-V as that non-32-bit scalar
types require special capabilities or extensions.
Previously when there is a non-32-bit type and no native support, we
unconditionally emulate it with 32-bit ones. This isn't good given that
it can have implications over ABI and data layout consistency.
This commit introduces an option to control whether to use 32-bit
types to emulate.
Differential Revision: https://reviews.llvm.org/D100059
Per the TypeConverter API contract, returning `llvm:None` means
other conversion rules should be tried. But we only have one
rule per input type. So there is no need to try others and we can
just directly fail, which should return `nullptr`. This avoids
unnecessary checks.
Differential Revision: https://reviews.llvm.org/D100058
The patch enables the use of index type in vectors. It is a prerequisite to support vectorization for indexed Linalg operations. This refactoring became possible due to the newly introduced data layout infrastructure. The data layout of a module defines the bitwidth of the index type needed to verify bitcasts and similar vector operations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D99948
When allocLikeOp is updated in alloc constant folding,
alighnment attribute was ignored. This patch fixes it.
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D99882
Some sparse matrices operate on integral values (in contrast with the common
f32 and f64 values). This CL expands the compiler and runtime support to deal
with several common type combinations.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D99999
Also factors out out-of-bounds mask generation from vector.transfer_read/write into a new MaterializeTransferMask pattern.
Differential Revision: https://reviews.llvm.org/D100001
Linalg fusion on tensors has mismatching assumptions on the operand side than on the region bbArg side.
Relax the behavior on the operand/indexing map side so that we better support output operands that may also be read from.
Differential revision: https://reviews.llvm.org/D99499
Right now Elementwise operations fusion in Linalg fuses everything it
can. This can run up against resource limits of the target hardware
without some checks. This patch adds a callback function that clients
can use to implement a cost function. When two elementwise operations
are deemed structurally fusable, the callback can be used to control
if the fusion applies.
Differential Revision: https://reviews.llvm.org/D99820
The moved `populate` methods are only relevant to Linalg
operations. So they are better of in `linalg` namespace. Also rename
`populateLinalgTensorOpsFusionPatterns` to
`populateElementwiseOpsFusionPatterns`. This makes the scope of these
patterns explicit and disambiguates it with fusion on tensors using
tile + fuse.
Differential Revision: https://reviews.llvm.org/D99819
This commit add utility functions for creating push constant
storage variable and loading values from it.
Along the way, performs some clean up:
* Deleted `setABIAttrs`, which is just a 4-liner function
with one user.
* Moved `SPIRVConverstionTarget` into `mlir` namespace,
to be consistent with `SPIRVTypeConverter` and
`LLVMConversionTarget`.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D99725
Rationale:
Small indices and values, when allowed by the required range of the
input tensors, can reduce the memory footprint of sparse tensors
even more. Note, however, that we must be careful zero extending
the values (since sparse tensors never use negatives for indexing),
but LLVM treats the index type as signed in most memory operations
(like the scatter and gather). This CL dots all the i's in this regard.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D99777
This is in preparation for adding a new "mask" operand. The existing "masked" attribute was used to specify dimensions that may be out-of-bounds. Such transfers can be lowered to masked load/stores. The new "in_bounds" attribute is used to specify dimensions that are guaranteed to be within bounds. (Semantics is inverted.)
Differential Revision: https://reviews.llvm.org/D99639
This revision adds support to properly add the body of registered
builtin named linalg ops.
At this time, indexing_map and iterator_type support is still
missing so the op is not executable yet.
Differential Revision: https://reviews.llvm.org/D99578
This verification is to check if the indices for static shaped operands
on linalgOps access out of bound memory or not. For dynamic shaped
operands, we would be able to check it on runtime stage.
Found several invalid Linalg ops testcases, and fixed them.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D98390
A new `InterfaceMethod` is added to `InferShapedTypeOpInterface` that
allows an operation to return the `Value`s for each dim of its
results. It is intended for the case where the `Value` returned for
each dim is computed using the operands and operation attributes. This
interface method is for cases where the result dim of an operation can
be computed independently, and it avoids the need to aggregate all
dims of a result into a single shape value. This also implies that
this is not suitable for cases where the result type is unranked (for
which the existing interface methods is to be used).
Also added is a canonicalization pattern that uses this interface and
resolves the shapes of the output in terms of the shapes of the
inputs. Moving Linalg ops to use this interface, so that many
canonicalization patterns implemented for individual linalg ops to
achieve the same result can be removed in favor of the added
canonicalization pattern.
Differential Revision: https://reviews.llvm.org/D97887
Subtensor operations that are taking a slice out of a tensor that is
unit-extent along a dimension can be rewritten to drop that dimension.
Differential Revision: https://reviews.llvm.org/D99226
Drop usage of `emitRemark` and use `notifyMatchFailure` instead to
avoid unnecessary spew during compilation.
Differential Revision: https://reviews.llvm.org/D99485
Convert transfer_read ops with permutation maps into simpler
transfer_read with minority map + vector.braodcast and vector.transpose.
And transfer_read with leading dimensions broacast into transfer_read of
lower rank.
Differential Revision: https://reviews.llvm.org/D99019
Add a new clone operation to the memref dialect. This operation implicitly
copies data from a source buffer to a new buffer. In contrast to the linalg.copy
operation, this operation does not accept a target buffer as an argument.
Instead, this operation performs a conceptual allocation which does not need to
be performed manually.
Furthermore, this operation resolves the dependency from the linalg-dialect
in the BufferDeallocation pass. In addition, we also extended the canonicalization
patterns to fold clone operations. The copy removal pass has been removed.
Differential Revision: https://reviews.llvm.org/D99172
In particular for Graph Regions, the terminator needs is just a
historical artifact of the generalization of MLIR from CFG region.
Operations like Module don't need a terminator, and before Module
migrated to be an operation with region there wasn't any needed.
To validate the feature, the ModuleOp is migrated to use this trait and
the ModuleTerminator operation is deleted.
This patch is likely to break clients, if you're in this case:
- you may iterate on a ModuleOp with `getBody()->without_terminator()`,
the solution is simple: just remove the ->without_terminator!
- you created a builder with `Builder::atBlockTerminator(module_body)`,
just use `Builder::atBlockEnd(module_body)` instead.
- you were handling ModuleTerminator: it isn't needed anymore.
- for generic code, a `Block::mayNotHaveTerminator()` may be used.
Differential Revision: https://reviews.llvm.org/D98468
For such op chains, we can create new linalg.fill ops
with the result type of the linalg.tensor_reshape op.
Differential Revision: https://reviews.llvm.org/D99116
init tensor operands also has indexing map and generally follow
the same constraints we expect for non-init-tensor operands.
Differential Revision: https://reviews.llvm.org/D99115
This commit exposes an option to the pattern
FoldWithProducerReshapeOpByExpansion to allow
folding unit dim reshapes. This gives callers
more fine-grained controls.
Differential Revision: https://reviews.llvm.org/D99114
This identifies a pattern where the producer affine min/max op
is bound to a dimension/symbol that is used as a standalone
expression in the consumer affine op's map. In that case the
producer affine min/max op can be merged into its consumer.
For example, a pattern like the following:
```
%0 = affine.min affine_map<()[s0] -> (s0 + 16, s0 * 8)> ()[%sym1]
%1 = affine.min affine_map<(d0)[s0] -> (s0 + 4, d0)> (%0)[%sym2]
```
Can be turned into:
```
%1 = affine.min affine_map<
()[s0, s1] -> (s0 + 4, s1 + 16, s1 * 8)> ()[%sym2, %sym1]
```
Differential Revision: https://reviews.llvm.org/D99016
If there are multiple identical expressions in an affine
min/max op's map, we can just keep one.
Differential Revision: https://reviews.llvm.org/D99015
Until now Linalg fusion only allow fusing producers whose operands
are all permutation indexing maps. It's easier to deduce the
subtensor/subview but it is an unnecessary constraint, as in tiling
we have more advanced logic to deduce the subranges even when the
operand is not of permutation indexing maps, e.g., the input operand
for convolution ops.
This patch uses the logic on tiling side to deduce subranges for
fusion. This enables fusing convolution with its consumer ops
when possible.
Along the way, we are now generating proper affine.min ops to guard
against size boundaries, if we cannot be certain they won't be
out of bounds.
Differential Revision: https://reviews.llvm.org/D99014
This is a preparation step to reuse makeTiledShapes in tensor
fusion. Along the way, did some lightweight cleanups.
Differential Revision: https://reviews.llvm.org/D99013
All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete.
Differential Revision: https://reviews.llvm.org/D99168
Index type is an integer type of target-specific bitwidth present in many MLIR
operations (loops, memory accesses). Converting values of this type to
fixed-size integers has always been problematic. Introduce a data layout entry
to specify the bitwidth of `index` in a given layout scope, defaulting to 64
bits, which is a commonly used assumption, e.g., in constants.
Port builtin-to-LLVM type conversion to use this data layout entry when
converting `index` type and untie it from pointer size. This is particularly
relevant for GPU targets. Keep a possibility to forcibly override the index
type in lowerings.
Depends On D98525
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98937
ModuleOp is a natural place to provide scoped data layout information. However,
it is undesirable for ModuleOp to implement the entirety of
DataLayoutOpInterface because that would require either pushing the interface
inside the IR library instead of a separate library, or putting the default
implementation of the interface as inline functions in headers leading to
binary bloat. Instead, ModuleOp accepts an arbitrary data layout spec attribute
and has a dedicated hook to extract it, and DataLayout is modified to know
about ModuleOp particularities.
Reviewed By: herhut, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98500
Fix the BlockAndValueMapping update that was missing entries for scf.for op's blockIterArgs.
Skip cloning subtensors of the padded tensor as the logic for these is separate.
Add a filter to drop side-effecting ops.
Tests are beefed up to verify the IR is sound in all hoisting configurations for 2-level 3-D tiled matmul.
Differential Revision: https://reviews.llvm.org/D99255
Use new `MemRefType::getMemorySpace` method with generic Attribute
in cases, where there is no specific logic around the memory space.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D99154
To match an interface or trait, users currently have to use the `MatchAny` tag. This tag can be quite problematic for compile time for things like the canonicalizer, as the `MatchAny` patterns may get applied to *every* operation. This revision adds better support by bucketing interface/trait patterns based on which registered operations have them registered. This means that moving forward we will only attempt to match these patterns to operations that have this interface registered. Two simplify defining patterns that match traits and interfaces, two new utility classes have been added: OpTraitRewritePattern and OpInterfaceRewritePattern.
Differential Revision: https://reviews.llvm.org/D98986
This provides a simplified way to implement 'matchAndRewrite' style
canonicalization patterns for ops that don't need the full power of
RewritePatterns. Using this style, you can implement a static method
with a signature like:
```
LogicalResult AssertOp::canonicalize(AssertOp op, PatternRewriter &rewriter) {
return success();
}
```
instead of dealing with defining RewritePattern subclasses. This also
adopts this for a few canonicalization patterns in the std dialect to
show how it works.
Differential Revision: https://reviews.llvm.org/D99143
This revision introduces proper backward slice computation during the hoisting of
PadTensorOp. This allows hoisting padding even across multiple levels of tiling.
Such hoisting requires the proper handling of loop bounds that may depend on enclosing
loop variables.
Differential revision: https://reviews.llvm.org/D98965
This nicely aligns the naming with RewritePatternSet. This type isn't
as widely used, but we keep a using declaration in to help with
downstream consumption of this change.
Differential Revision: https://reviews.llvm.org/D99131
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
This maintains the old name to have minimal source impact on downstream codes, and
does not do the huge mechanical patch. I expect the huge mechanical patch to land
sometime this week, but we can keep around the old names for a couple weeks to reduce
impact on downstream projects.
Differential Revision: https://reviews.llvm.org/D99119
This allows adding a C function pointer as a matchAndRewrite style pattern, which
is a very common case. This adopts it in ExpandTanh to show how it reduces a level
of nesting.
We could allow C++ lambdas here, but that doesn't work as well with type inference
in the common case. Instead of:
patterns.insert(convertTanhOp);
you need to specify:
patterns.insert<math::TanhOp>(convertTanhOp);
which is boilerplate'y. Capturing state like this is very uncommon, so we choose
to require clients to define their own structs and use the non-convenience method
when they need to do so.
Differential Revision: https://reviews.llvm.org/D99039
- Drop unnecessary occurrences of rewriter.eraseOp: dead linalg ops on tensors should be cleaned up by DCE.
- reimplement the part of Linalg on fusion that constructs the body and block arguments: the previous implementation had too much magic. Instead this spells out all cases explicitly and asserts / introduces TODOs for incorrect cases.
As a consequence, we can use the default traversal order for this pattern.
Differential Revision: https://reviews.llvm.org/D99070
GreedyPatternRewriteDriver was changed from bottom-up traversal to top-down traversal. Not all passes work yet with that change for traversal order. To give some time for fixing, add an option to allow to switch back to bottom-up traversal. Use this option in FusionOfTensorOpsPass which fails otherwise.
Differential Revision: https://reviews.llvm.org/D99059
mlir/lib/Dialect/Shape/IR/Shape.cpp:573:26: warning: loop variable 'shape' is always a copy because the range of type '::mlir::Operation::operand_range' (aka 'mlir::OperandRange') does not return a reference [-Wrange-loop-analysis]
for (const auto &shape : shapes()) {
^
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
* Fold SelectOp when both true and false args are same SSA value
* Fold some cmp + select patterns
Differential Revision: https://reviews.llvm.org/D98576
* Do we need a threshold on maximum number of Yeild arguments processed (maximum number of SelectOp's to be generated)?
* Had to modify some old IfOp tests to not get optimized by this pattern
Differential Revision: https://reviews.llvm.org/D98592
This makes the annotation tied to the operand and the use of a keyword
more explicit/readable on what it means.
Differential Revision: https://reviews.llvm.org/D99001
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.
I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D98447
This reverts commit 32a744ab20.
CI is broken:
test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input
// CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32>
^
`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98671
This covers cases that are not folded away because the extent tensor type
becomes more concrete in the process.
Differential Revision: https://reviews.llvm.org/D98782
Add a feature to `EnumAttr` definition to generate
specialized Attribute class for the particular enumeration.
This class will inherit `StringAttr` or `IntegerAttr` and
will override `classof` and `getValue` methods.
With this class the enumeration predicate can be checked with simple
RTTI calls (`isa`, `dyn_cast`) and it will return the typed enumeration
directly instead of raw string/integer.
Based on the following discussion:
https://llvm.discourse.group/t/rfc-add-enum-attribute-decorator-class/2252
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97836
'ForOpIterArgsFolder' can now remove iterator arguments (and corresponding
results) with no use.
Example:
```
%cst = constant 32 : i32
%0:2 = scf.for %arg1 = %lb to %ub step %step iter_args(%arg2 = %arg0, %arg3 = %cst)
-> (i32, i32) {
%1 = addu %arg2, %cst : i32
scf.yield %1, %1 : i32, i32
}
use(%0#0)
```
%arg3 is not used in the block, and its corresponding result `%0#1` has no use,
thus remove the iter argument.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98711
This revision extends the PDL Interpreter dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl_interp.check_types : Compare a range of types with a known range.
* pdl_interp.create_types : Create a constant range of types.
* pdl_interp.get_operands : Get a range of operands from an operation.
* pdl_interp.get_results : Get a range of results from an operation.
* pdl_interp.switch_types : Switch on a range of types.
This revision handles adding support in the interpreter dialect and the conversion from PDL to PDLInterp. Support for variadic operands and results in the bytecode will be added in a followup revision.
Differential Revision: https://reviews.llvm.org/D95722
This revision extends the PDL dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl.operands : Define a range of input operands.
* pdl.results : Extract a result group from an operation.
* pdl.types : Define a handle to a range of types.
Support for these in the pdl interpreter dialect and byte code will be added in followup revisions.
Differential Revision: https://reviews.llvm.org/D95721
This has a numerous amount of benefits, given the overly clunky nature of CreateNativeOp:
* Users can now call into arbitrary rewrite functions from inside of PDL, allowing for more natural interleaving of PDL/C++ and enabling for more of the pattern to be in PDL.
* Removes the need for an additional set of C++ functions/registry/etc. The new ApplyNativeRewriteOp will use the same PDLRewriteFunction as the existing RewriteOp. This reduces the API surface area exposed to users.
This revision also introduces a new PDLResultList class. This class is used to provide results of native rewrite functions back to PDL. We introduce a new class instead of using a SmallVector to simplify the work necessary for variadics, given that ranges will require some changes to the structure of PDLValue.
Differential Revision: https://reviews.llvm.org/D95720
Up until now, results have been represented as additional results to a pdl.operation. This is fairly clunky, as it mismatches the representation of the rest of the IR constructs(e.g. pdl.operand) and also isn't a viable representation for operations returned by pdl.create_native. This representation also creates much more difficult problems when factoring in support for variadic result groups, optional results, etc. To resolve some of these problems, and simplify adding support for variable length results, this revision extracts the representation for results out of pdl.operation in the form of a new `pdl.result` operation. This operation returns the result of an operation at a given index, e.g.:
```
%root = pdl.operation ...
%result = pdl.result 0 of %root
```
Differential Revision: https://reviews.llvm.org/D95719
Enhance 'ForOpIterArgsFolder' to remove unused iteration arguments in a
scf::ForOp. If the block argument corresponding to the given iterator has no
use and the yielded value equals the input, we fold it away.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98503
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98470
The patch in question broke the build with shared libraries due to
missing dependencies, one of which would have been circular between
MLIRStandard and MLIRMemRef if added. Fix this by moving more code
around and swapping the dependency direction. MLIRMemRef now depends on
MLIRStandard, but MLIRStandard does _not_ depend on MLIRMemRef.
Arguably, this is the right direction anyway since numerous libraries
depend on MLIRStandard and don't necessarily need to depend on
MLIRMemref.
Other otable changes include:
- some EDSC code is moved inline to MemRef/EDSC/Intrinsics.h because it
creates MemRef dialect operations;
- a utility function related to shape moved to BuiltinTypes.h/cpp
because it only realtes to shaped types and not any particular dialect
(standard dialect is erroneously believed to contain MemRefType);
- a Python test for the standard dialect is disabled completely because
the ops it tests moved to the new MemRef dialect, but it is not
exposed to Python bindings, and the change for that is non-trivial.
This is a temporary work-around to get our all-annotations-all-flags
stress testing effort run clean. In the long run, we want to provide
efficient implementations of strided loads and stores though
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D98563
Functions used only in `assert` cause warnings in release mode
Reviewed By: mehdi_amini, dcaballe, ftynse
Differential Revision: https://reviews.llvm.org/D98476
This restricts the attributes to integers for constants of type
IndexType. So far an attribute like StringAttr as in
%c1 = constant "" : index
is valid.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98216
This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97822
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97892
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
This allows for storage instances to store data that isn't uniqued in the context, or contain otherwise non-trivial logic, in the rare situations that they occur. Storage instances with trivial destructors will still have their destructor skipped. A consequence of this is that the storage instance definition must be visible from the place that registers the type.
Differential Revision: https://reviews.llvm.org/D98311
This patch fixes a heap-use-after-free introduced by the recent changes
in the vectorizer: https://reviews.llvm.org/rG95db7b4aeaad590f37720898e339a6d54313422f
The problem is due to the way candidate loops are visited. All candidate loops
are pattern-matched beforehand using the 'NestedMatch' utility. These matches may
intersect with each other so it may happen that we try to vectorize a loop that
was previously vectorized. The new vectorization algorithm replaces the original
loops that are vectorized with new loops and, therefore, any reference to the
original loops in the pre-computed matches becomes invalid.
This patch fixes the problem by classifying the candidate matches into buckets
before vectorization. Each bucket contains all the matches that intersect. The
vectorizer uses these buckets to make sure that we only vectorize *one* match from
each bucket, at most.
Differential Revision: https://reviews.llvm.org/D98382
Data layout information allows to answer questions about the size and alignment
properties of a type. It enables, among others, the generation of various
linear memory addressing schemes for containers of abstract types and deeper
reasoning about vectors. This introduces the subsystem for modeling data
layouts in MLIR.
The data layout subsystem is designed to scale to MLIR's open type and
operation system. At the top level, it consists of attribute interfaces that
can be implemented by concrete data layout specifications; type interfaces that
should be implemented by types subject to data layout; operation interfaces
that must be implemented by operations that can serve as data layout scopes
(e.g., modules); and dialect interfaces for data layout properties unrelated to
specific types. Built-in types are handled specially to decrease the overall
query cost.
A concrete default implementation of these interfaces is provided in the new
Target dialect. Defaults for built-in types that match the current behavior are
also provided.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97067
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203