Summary:
This revision introduces a helper function to allow applying rewrite patterns, interleaved with more global transformations, in a staged fashion:
1. the first stage consists of an OwningRewritePatternList. The RewritePattern in this list are applied once and in order.
2. the second stage consists of a single OwningRewritePattern that is applied greedily until convergence.
3. the third stage consists of applying a lambda, generally used for non-local transformation effects.
This allows creating custom fused transformations where patterns can be ordered and applied at a finer granularity than a sequence of traditional compiler passes.
A test that exercises these behaviors is added.
Differential Revision: https://reviews.llvm.org/D79518
This [discussion](https://llvm.discourse.group/t/viewop-isnt-expressive-enough/991/2) raised some concerns with ViewOp.
In particular, the handling of offsets is incorrect and does not match the op description.
Note that with an elemental type change, offsets cannot be part of the type in general because sizeof(srcType) != sizeof(dstType).
Howerver, offset is a poorly chosen term for this purpose and is renamed to byte_shift.
Additionally, for all intended purposes, trying to support non-identity layouts for this op does not bring expressive power but rather increases code complexity.
This revision simplifies the existing semantics and implementation.
This simplification effort is voluntarily restrictive and acts as a stepping stone towards supporting richer semantics: treat the non-common cases as YAGNI for now and reevaluate based on concrete use cases once a round of simplification occurred.
Differential revision: https://reviews.llvm.org/D79541
Summary: This revision introduces LinalgPromotionOptions to more easily control the application of promotion patterns. It also simplifies the different entry points into Promotion in preparation for some behavior change in subsequent revisions.
Differential Revision: https://reviews.llvm.org/D79489
This dialect contains various structured control flow operaitons, not only
loops, reflect this in the name. Drop the Ops suffix for consistency with other
dialects.
Note that this only moves the files and changes the C++ namespace from 'loop'
to 'scf'. The visible IR prefix remains the same and will be updated
separately. The conversions will also be updated separately.
Differential Revision: https://reviews.llvm.org/D79578
Summary:
Cast from a value interpreted as floating-point to the corresponding signed
integer value. Similar to an element-wise `static_cast` in C++, performs an
element-wise conversion operation.
Differential Revision: https://reviews.llvm.org/D79373
Complex addition and substraction are the first two binary operations on complex
numbers.
Remaining operations will follow the same pattern.
Differential Revision: https://reviews.llvm.org/D79479
The SideEffect interface definitions currently use string expressions to
reference custom resource objects. This CL introduces Resource objects in
tablegen definitions to simplify linking of resource reference to resource
objects.
Differential Revision: https://reviews.llvm.org/D78917
This is a wrapper around vector of NamedAttributes that keeps track of whether sorted and does some minimal effort to remain sorted (doing more, e.g., appending attributes in sorted order, could be done in follow up). It contains whether sorted and if a DictionaryAttr is queried, it caches the returned DictionaryAttr along with whether sorted.
Change MutableDictionaryAttr to always return a non-null Attribute even when empty (reserve null cases for errors). To this end change the getter to take a context as input so that the empty DictionaryAttr could be queried. Also create one instance of the empty dictionary attribute that could be reused without needing to lock context etc.
Update infer type op interface to use DictionaryAttr and use NamedAttrList to avoid incurring multiple conversion costs.
Fix bug in sorting helper function.
Differential Revision: https://reviews.llvm.org/D79463
Originally, these operations were folded only if all expressions in their
affine maps could be folded to a constant expression that can be then subject
to numeric min/max computation. This introduces a more advanced version that
partially folds the affine map by lifting individual constant expression in it
even if some of the expressions remain variable. The folding can update the
operation in place to use a simpler map. Note that this is not as powerful as
canonicalization, in particular this does not remove dimensions or symbols that
became useless. This allows for better composition of Linalg tiling and
promotion transformation, where the latter can handle some canonical forms of
affine.min that the folding can now produce.
Differential Revision: https://reviews.llvm.org/D79502
The list of destination load ops while evaluating producer-consumer
fusion wasn't being maintained as a set, and as such, duplicate load ops
were being added to it. Although this is harmless correctness-wise, it's
a killer efficiency-wise and it prevents interesting/useful fusions
(including for eg. reshapes into a matmul). The reason the latter
fusions would be missed is that a slice union would be unnecessarily
needed due to the duplicate load ops on a memref added to the 'dst
loads' list. Since slice union is unimplemented for the local var case,
a single destination load op that leads to local vars (like a floordiv /
mod producing fusion), a common case, would not get fused due to an
unnecessary union being tried with itself. (The union would actually be
the same thing but we would bail out.)
Besides the above, this would also significantly speed up fusion as all
the unnecessary slice computations / unions, checks, etc. due to the
duplicates go away.
Differential Revision: https://reviews.llvm.org/D79547
When the folding is performed in place, the `::fold` function does not populate
its `results` argument to indicate that. (In the folding hook for single-result
operations, the result of the original operation is expected to be returned,
but it is then ignored by the wrapper.) `OperationFolder::create` would
erronously rely on the _operation_ having zero results instead of on the
_folding_ producing zero new results to populate the list of results with those
of the original operation. This would lead to a crash for single-result ops
with in-place folds where the first result is accessed uncondtionally because
the list of results was not properly populated. Use the list of values produced
by the folding instead.
Differential Revision: https://reviews.llvm.org/D79497
The types of forward references are checked that they match with other
uses, but they do not check they match with the definition.
func @forward_reference_type_check() -> (i8) {
br ^bb2
^bb1:
return %1 : i8
^bb2:
%1 = "bar"() : () -> (f32)
br ^bb1
}
Would be parsed and the use site of '%1' would be silently changed to
'f32'.
This commit adds a test for this case, and a check during parsing for
the types to match.
Patch by Matthew Parkinson <mattpark@microsoft.com>
Closes D79317.
Summary:
This revision adds a conservative canonicalization pattern for MemRefCastOp that are typically inserted during ViewOp and SubViewOp canonicalization.
Ideally such canonicalizations would propagate the type to consumers but this is not a local behavior. As a consequence MemRefCastOp are introduced to keep type compatibility but need to be cleaned up later, in the case where more dynamic behavior than necessary is introduced.
Differential Revision: https://reviews.llvm.org/D79438
This revision allows for creating DenseElementsAttrs and accessing elements using std::complex<APInt>/std::complex<APFloat>. This allows for opaquely accessing and transforming complex values. This is used by the printer/parser to provide pretty printing for complex values. The form for complex values matches that of std::complex, i.e.:
```
// `(` element `,` element `)`
dense<(10,10)> : tensor<complex<i64>>
```
Differential Revision: https://reviews.llvm.org/D79296
This revision adds support for storing ComplexType elements inside of a DenseElementsAttr. We store complex objects as an array of two elements, matching the definition of std::complex. There is no current attribute storage for ComplexType, but DenseElementsAttr provides API for access/creation using std::complex<>. Given that the internal implementation of DenseElementsAttr is already fairly opaque, the only real complexity here is in the printing/parsing. This revision keeps it simple for now and always uses hex when printing complex elements. A followup will add prettier syntax for this.
Differential Revision: https://reviews.llvm.org/D79281
DMA operation classes in the Standard dialect (`DmaStartOp` and `DmaWaitOp`)
provide helper functions that make numerous assumptions about the number and
order of operands, and about their types. However, these assumptions were not
checked in the verifier, leading to assertion failures or crashes when helper
functions were used on ill-formed ops. Some of the assuptions were checked in
the custom parser (and thus could not check assumption violations in ops
constructed programmatically, e.g., during rewrites) and others were not
checked at all. Introduce the verifiers for all these assumptions and drop
unnecessary checks in the parser that are now covered by the verifier.
Addresses PR45560.
Differential Revision: https://reviews.llvm.org/D79408
Summary:
Adds the loop unroll transformation for loop::ForOp.
Adds support for promoting the body of single-iteration loop::ForOps into its containing block.
Adds check tests for loop::ForOps with dynamic and static lower/upper bounds and step.
Care was taken to share code (where possible) with the AffineForOp unroll transformation to ease maintenance and potential future transition to a LoopLike construct on which loop transformations for different loop types can implemented.
Reviewers: ftynse, nicolasvasilache
Reviewed By: ftynse
Subscribers: bondhugula, mgorny, zzheng, mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, grosul1, frgossen, Kayjukh, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79184
Adding this pattern reduces code duplication. There is no need to have a
custom implementation for lowering to llvm.cmpxchg.
Differential Revision: https://reviews.llvm.org/D78753
This revision adds support for merging identical blocks, or those with the same operations that branch to the same successors. Operands that mismatch between the different blocks are replaced with new block arguments added to the merged block.
Differential Revision: https://reviews.llvm.org/D79134
Summary:
In the particular case of an insertion in a block without a terminator, the BlockBuilder insertion point should be block->end().
Adding a unit test to exercise this.
Differential Revision: https://reviews.llvm.org/D79363
Summary:
As D78974, this patch implements the emulation for store op. The emulation is
done with atomic operations. E.g., if the storing value is i8, rewrite the
StoreOp to:
1) load a 32-bit integer
2) clear 8 bits in the loading value
3) store 32-bit value back
4) load a 32-bit integer
5) modify 8 bits in the loading value
6) store 32-bit value back
The step 1 to step 3 are done by AtomicAnd as one atomic step, and the step 4
to step 6 are done by AtomicOr as another atomic step.
Differential Revision: https://reviews.llvm.org/D79272
- Exports MLIR targets to be used out-of-tree.
- mimicks `add_clang_library` and `add_flang_library`.
- Fixes libMLIR.so
After https://reviews.llvm.org/D77515 libMLIR.so was no longer containing
any object files. We originally had a cludge there that made it work with
the static initalizers and when switchting away from that to the way the
clang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target,
and doesn't export it's targets to `lib/cmake/mlir`.
This is due to MLIR using `add_llvm_library` under the hood, which adds
the target to `llvmexports`.
Differential Revision: https://reviews.llvm.org/D78773
[MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIB
Primarily, this patch moves all mlir references to LLVM libraries into
either LLVM_LINK_COMPONENTS or LINK_COMPONENTS. This enables magic in
the llvm cmake files to automatically replace reference to LLVM components
with references to libLLVM.so when necessary. Among other things, this
completes fixing libMLIR.so, which has been broken for some configurations
since D77515.
Unlike previously, the pattern is now that mlir libraries should almost
always use add_mlir_library. Previously, some libraries still used
add_llvm_library. However, this confuses the export of targets for use
out of tree because libraries specified with add_llvm_library are exported
by LLVM. Instead users which don't need/can't be linked into libMLIR.so
can specify EXCLUDE_FROM_LIBMLIR
A common error mode is linking with LLVM libraries outside of LINK_COMPONENTS.
This almost always results in symbol confusion or multiply defined options
in LLVM when the same object file is included as a static library and
as part of libLLVM.so. To catch these errors more directly, there's now
mlir_check_all_link_libraries.
To simplify usage of add_mlir_library, we assume that all mlir
libraries depend on LLVMSupport, so it's not necessary to separately specify
it.
tested with:
BUILD_SHARED_LIBS=on,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79067
[MLIR] Move from using target_link_libraries to LINK_LIBS
This allows us to correctly generate dependencies for derived targets,
such as targets which are created for object libraries.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79243
Three commits have been squashed to avoid intermediate build breakage.
Linalg transformations are currently exposed as DRRs.
Unfortunately RewriterGen does not play well with the line of work on named linalg ops which require variadic operands and results.
Additionally, DRR is arguably not the right abstraction to expose compositions of such patterns that don't rely on SSA use-def semantics.
This revision abandons DRRs and exposes manually written C++ patterns.
Refactorings and cleanups are performed to uniformize APIs.
This refactoring will allow replacing the currently manually specified Linalg named ops.
A collateral victim of this refactoring is the `tileAndFuse` DRR, and the one associated test, which will be revived at a later time.
Lastly, the following 2 tests do not add value and are altered:
- a dot_perm tile + interchange test does not test anything new and is removed
- a dot tile + lower to loops does not need 2-D tiling and is trimmed.
Add `CreateComplexOp`, `ReOp`, and `ImOp` to the standard dialect.
This is the first step to support complex numbers.
Differential Revision: https://reviews.llvm.org/D79159
This is useful for several reasons:
* In some situations the user can guarantee that thread-safety isn't necessary and don't want to pay the cost of synchronization, e.g., when parsing a very large module.
* For things like logging threading is not desirable as the output is not guaranteed to be in stable order.
This flag also subsumes the pass manager flag for multi-threading.
Differential Revision: https://reviews.llvm.org/D79266
These libraries are distinct from other things in Analysis in that they
operate only on core IR concepts. This also simplifies dependencies
so that Dialect -> Analysis -> Parser -> IR. Previously, the parser depended
on portions of the the Analysis directory as well, which sometimes
caused issues with the way the cmake makefile generator discovers
dependencies on generated files during compilation.
Differential Revision: https://reviews.llvm.org/D79240
Summary:
This is an initial version, currently supports OpString and OpLine
for autogenerated operations during (de)serialization.
Differential Revision: https://reviews.llvm.org/D79091
Summary:
Maps ZeroExtendIOp and TruncateIOp to spirv::UConvertOp and spirv::SConvertOp.
Depends On D78974
Differential Revision: https://reviews.llvm.org/D79143
Summary:
The current implementation in SPIRVTypeConverter just unconditionally turns
everything into 32-bit if it doesn't meet the requirements of extensions or
capabilities. In this case, we can load a 32-bit value and then do bit
extraction to get the value.
Differential Revision: https://reviews.llvm.org/D78974
- Extract common logic between -convert-gpu-to-nvvm and -convert-gpu-to-rocdl.
- Cope with the fact that alloca operates on different addrspaces between NVVM
and ROCDL.
- Modernize unit tests for ROCDL dialect.
Differential Revision: https://reviews.llvm.org/D79021
Summary:
This revision cleans up a layer of complexity in ScopedContext and uses InsertGuard instead of previously manual bookkeeping.
The method `getBuilder` is renamed to `getBuilderRef` and spurious copies of OpBuilder are tracked.
This results in some canonicalizations not happening anymore in the Linalg matmul to vector test. This test is retired because relying on DRRs for this has been shaky at best. The solution will be better support to write fused passes in C++ with more idiomatic pattern composition and application.
Differential Revision: https://reviews.llvm.org/D79208
This revision adds support to allow named ops to lower to loops.
Linalg.batch_matmul successfully lowers to loops and to LLVM.
In the process, this test also activates linalg to affine loops.
However padded convolutions to not lower to affine.load atm so this revision overrides the type of underlying load / store operation.
Differential Revision: https://reviews.llvm.org/D79135
There are three op conversion modes: Partial, Full, and Analysis. This change modifies the Partial mode to optionally take a set of non-legalizable ops. If this parameter is specified, all ops that are not legalizable (i.e. would cause full conversion to fail) are tracked throughout the partial legalization.
Differential Revision: https://reviews.llvm.org/D78788
This commit marks AllocLikeOp as MemAlloc in StandardOps.
Also in Linalg dependency analysis use memory effect to detect
allocation. This allows the dependency analysis to be more
general and recognize other allocation-like operations.
Differential Revision: https://reviews.llvm.org/D78705
This revision allows masked vector transfers with m-D buffers and n-D vectors to
progressively lower to m-D buffer and 1-D vector transfers.
For a vector.transfer_read, assuming a `memref<(leading_dims) x (major_dims) x (minor_dims) x type>` and a `vector<(minor_dims) x type>` are involved in the transfer, this generates pseudo-IR resembling:
```
if (any_of(%ivs_major + %offsets, <, major_dims)) {
%v = vector_transfer_read(
{%offsets_leading, %ivs_major + %offsets_major, %offsets_minor},
%ivs_minor):
memref<(leading_dims) x (major_dims) x (minor_dims) x type>,
vector<(minor_dims) x type>;
} else {
%v = splat(vector<(minor_dims) x type>, %fill)
}
```
Differential Revision: https://reviews.llvm.org/D79062
This range allows for performing many different operations on successor operands, including erasing/adding/setting. This removes the need for the explicit canEraseSuccessorOperand and eraseSuccessorOperand methods.
Differential Revision: https://reviews.llvm.org/D79077
Currently a declaration won't be generated if the method has a default implementation. Meaning that operations that wan't to override the default have to explicitly declare the method in the extraClassDeclarations. This revision adds an optional list parameter to DeclareOpInterfaceMethods to allow for specifying a set of methods that should always have the declarations generated, even if there is a default.
Differential Revision: https://reviews.llvm.org/D79030
This class allows for mutating an operand range in-place, and provides vector like API for adding/erasing/setting. ODS now uses this class to generate mutable wrappers for named operands, with the name `MutableOperandRange <operand-name>Mutable()`
Differential Revision: https://reviews.llvm.org/D78892
This revision adds a mode to the crash reproducer generator to attempt to generate a more local reproducer. This will attempt to generate a reproducer right before the offending pass that fails. This is useful for the majority of failures that are specific to a single pass, and situations where some passes in the pipeline are not registered with a specific tool.
Differential Revision: https://reviews.llvm.org/D78314
type operands.
The instructions used to convert std.cmpi cannot have i1 types
according to SPIR-V specification. A different set of operations are
specified in the SPIR-V spec for comparing boolean types. Enhance the
StandardToSPIRV lowering to target these instructions when operands to
std.cmpi operation are of i1 type.
Differential Revision: https://reviews.llvm.org/D79049
On certain targets std.subview should be able to take memrefs from non-zero
addrspaces. Improve lowering logic to llvm dialect and amend the tests.
Differential Revision: https://reviews.llvm.org/D79024
Enhance lowering logic and tests so vector.transfer_read and
vector.transfer_write take memrefs on non-zero addrspaces.
Differential Revision: https://reviews.llvm.org/D79023
(A previous version of this, dd2c639c3c, was
reverted.)
Introduce op trait PolyhedralScope for ops to define a new scope for
polyhedral optimization / affine dialect purposes, thus generalizing
such scopes beyond FuncOp. Ops to which this trait is attached will
define a new scope for the consideration of SSA values as valid symbols
for the purposes of polyhedral analysis and optimization. Update methods
that check for dim/symbol validity to work based on this trait.
Differential Revision: https://reviews.llvm.org/D79060
OperationHandle mostly existed to mirror the behavior of ValueHandle.
This has become unnecessary and can be retired.
Differential Revision: https://reviews.llvm.org/D78692
Previously, they would only only verify `isa<DictionaryAttr>` on such attrs
which resulted in crashes down the line from code assuming that the
verifier was doing the more thorough check introduced in this patch.
The key change here is for StructAttr to use
`CPred<"$_self.isa<" # name # ">()">` instead of `isa<DictionaryAttr>`.
To test this, introduce struct attrs to the test dialect. Previously,
StructAttr was only being tested by unittests/, which didn't verify how
StructAttr interacted with ODS.
Differential Revision: https://reviews.llvm.org/D78975
Summary:
When creating an operation with
* `AttrSizedOperandSegments` trait
* Variadic operands of only non-buildable types
* assemblyFormat to automatically generate the parser
the `builder` local variable is used, but never declared.
This adds a fix as well as a test for this case as existing ones use buildable types only.
Reviewers: rriddle, Kayjukh, grosser
Reviewed By: Kayjukh
Subscribers: mehdi_amini, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, frgossen, llvm-commits
Tags: #mlir, #llvm
Differential Revision: https://reviews.llvm.org/D79004
Summary:
This change results in tests also being changed to prevent dead
affine.load operations from being folded away during rewrites.
Also move AffineStoreOp and AffineLoadOp to an ODS file.
Differential Revision: https://reviews.llvm.org/D78930
As we start defining more complex Ops, we increasingly see the need for
Ops-with-regions to be able to construct Ops within their regions in
their ::build methods. However, these methods only have access to
Builder, and not OpBuilder. Creating a local instance of OpBuilder
inside ::build and using it fails to trigger the operation creation
hooks in derived builders (e.g., ConversionPatternRewriter). In this
case, we risk breaking the logic of the derived builder. At the same
time, OpBuilder::create, which is by far the largest user of ::build
already passes "this" as the first argument, so an OpBuilder instance is
already available.
Update all ::build methods in all Ops in MLIR and Flang to take
"OpBuilder &" instead of "Builder *". Note the change from pointer and
to reference to comply with the common style in MLIR, this also ensures
all other users must change their ::build methods.
Differential Revision: https://reviews.llvm.org/D78713
We have provided a generic buffer assignment transformation ported from
TensorFlow. This generic transformation pass automatically analyzes the values
and their aliases (also in other blocks) and returns the valid positions for
Alloc and Dealloc operations. To find these positions, the algorithm uses the
block Dominator and Post-Dominator analyses. In our proposed algorithm, we have
considered aliasing, liveness, nested regions, branches, conditional branches,
critical edges, and independency to custom block terminators. This
implementation doesn't support block loops. However, we have considered this in
our design. For this purpose, it is only required to have a loop analysis to
insert Alloc and Dealloc operations outside of these loops in some special
cases.
Differential Revision: https://reviews.llvm.org/D78484
Introduce op trait `PolyhedralScope` for ops to define a new scope for
polyhedral optimization / affine dialect purposes, thus generalizing
such scopes beyond FuncOp. Ops to which this trait is attached will
define a new scope for the consideration of SSA values as valid symbols
for the purposes of polyhedral analysis and optimization. Update methods
that check for dim/symbol validity to work based on this trait.
Differential Revision: https://reviews.llvm.org/D78863
- Adds a folder for integer division by one with the `divi_signed` and `divi_unsigned` ops.
- Creates tests for scalar and tensor versions of these ops.
- Modifies the test in `parallel-loop-collapsing.mlir` so that it doesn't assume division by one will be in the output.
Differential Revision: https://reviews.llvm.org/D78518
This revision adds support for propagating constants across symbol-based callgraph edges. It uses the existing Call/CallableOpInterfaces to detect the dataflow edges, and propagates constants through arguments and out of returns.
Differential Revision: https://reviews.llvm.org/D78592
This provides a much cleaner interface into Symbols, and allows for users to start injecting op-specific information. For example, derived op can now inject when a symbol can be discarded if use_empty. This would let us drop unused external functions, which generally have public visibility.
This revision also adds a new `extraTraitClassDeclaration` field to ODS OpInterface to allow for injecting declarations into the trait class that gets attached to the operations.
Differential Revision: https://reviews.llvm.org/D78522
Summary: This revision extends the lowering of vector transfers to work with n-D memref and 1-D vector where the permutation map is an identity on the most minor dimensions (1 for now).
Differential Revision: https://reviews.llvm.org/D78925
Summary:
Previously operations like std.load created methods for obtaining their
effects but did not inherit from the SideEffect interfaces when their
parameters were decorated with the information. The resulting situation
was that passes had no information on the SideEffects of std.load/store
and had to treat them more cautiously. This adds the inheritance
information when creating the methods.
As a side effect, many tests are modified, as they were using std.load
for testing and this oepration would be folded away as part of pattern
rewriting. Tests are modified to use store or to reutn the result of the
std.load.
Reviewers: mravishankar, antiagainst, nicolasvasilache, herhut, aartbik, ftynse!
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, bader, grosul1, frgossen, Kayjukh, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78802
This revision refactors the structure of the operand storage such that there is no additional memory cost for resizable operand lists until it is required. This is done by using two different internal representations for the operand storage:
* One using trailing operands
* One using a dynamically allocated std::vector<OpOperand>
This allows for removing the resizable operand list bit, and will free up APIs from needing to workaround non-resizable operand lists.
Differential Revision: https://reviews.llvm.org/D78875
`addArgument()` is not undoable and should not be used in
ConversionPattern, therefore replacing `splitBlock()` with
`createBlock()`, that creates a block with specified args.
Differential Revision: https://reviews.llvm.org/D78731
Summary: Added support for sparse strings elements. This is a follow up from the original DenseStringElements.
Differential Revision: https://reviews.llvm.org/D78844
- Implement a first constant fold for shape.shape_of (more ops coming in subsequent patches)
- Implement the right builder interfaces for ShapeType and other types
- Splits shape.constant into shape.const_size and shape.const_shape which plays better with dyn_cast and building vs one polymorphic op.
Also, fix the RUN line in ops.mlir to properly verify round-tripping.
The current implementation of this method performs the replacement directly, and thus doesn't support proper back tracking.
Differential Revision: https://reviews.llvm.org/D78790
Ensure that `gpu.func` is only used within the dedicated `gpu.module`.
Implement the constraint to the GPU dialect and adopt test cases.
Differential Revision: https://reviews.llvm.org/D78541
Summary:
Implemented a DenseStringsElements attr for handling arrays / tensors of strings. This includes the
necessary logic for parsing and printing the attribute from MLIR's text format.
To store the attribute we perform a single allocation that includes all wrapped string data tightly packed.
This means no padding characters and no null terminators (as they could be present in the string). This
buffer includes a first chunk of data that represents an array of StringRefs, that contain address pointers
into the string data, with the length of each string wrapped. At this point there is no Sparse representation
however strings are not typically represented sparsely.
Differential Revision: https://reviews.llvm.org/D78600
367229e100 retired ValueHandle but
mistakenly removed the implementation for `negate` which was not
tested and would result in linking errors.
This revision adds the implementation back and provides a test.
The current Liveness analysis does not support operations with nested regions.
This causes issues when querying liveness information about blocks nested within
operations. Furthermore, the live-in and live-out sets are not computed properly
in these cases.
Differential Revision: https://reviews.llvm.org/D77714
It currently requires that the condition match the shape of the selected value, but this is only really useful for things like masks. This revision allows for the use of i1 to mean that all of the vector/tensor is selected. This also matches the behavior of LLVM select. A benefit of this change is that transformations that want to generate selects, like those on the CFG, don't have to special case vector/tensor. Previously the only way to generate a select from an i1 was to use a splat, but that doesn't support dynamically shaped/unranked tensors.
Differential Revision: https://reviews.llvm.org/D78690
This revision adds support for canonicalizing the following:
```
br ^bb1
^bb1
br ^bbN(...)
br ^bbN(...)
```
Differential Revision: https://reviews.llvm.org/D78683
This revision adds support for canonicalizing the following:
```
cond_br %cond, ^bb1(A, ..., N), ^bb1(A, ..., N)
br ^bb1(A, ..., N)
```
If the operands to the successor are different and the cond_br is the only predecessor, we emit selects for the branch operands.
```
cond_br %cond, ^bb1(A), ^bb1(B)
%select = select %cond, A, B
br ^bb1(%select)
```
Differential Revision: https://reviews.llvm.org/D78682
Summary:
This test is in a different file because it contains a literal NUL
character, which causes various tools to treat it as a binary file.
Hence it is useful to have this test kept in a separate, rarely-changing
file.
Differential Revision: https://reviews.llvm.org/D78689
Summary:
Use a nested symbol to identify the kernel to be invoked by a `LaunchFuncOp` in the GPU dialect.
This replaces the two attributes that were used to identify the kernel module and the kernel within seperately.
Differential Revision: https://reviews.llvm.org/D78551
Summary:
Use the shortcu `kernel` for the `gpu.kernel` attribute of `gpu.func`.
The parser supports this and test cases are easier to read.
Differential Revision: https://reviews.llvm.org/D78542
Summary:
Fix a broken test case in the `invalid.mlir` lit test case.
`expect` was missing its `e`.
Differential Revision: https://reviews.llvm.org/D78540
The buffer allocated by a promotion can be subject to other transformations afterward. For example it could be vectorized, in which case it is needed to ensure that this buffer is memory-aligned.
Differential Revision: https://reviews.llvm.org/D78556
This revision is the first in a set of improvements that aim at allowing
more generalized named Linalg op generation from a mathematical
specification.
This revision allows creating a new op and checks that the parser,
printer and verifier are hooked up properly.
This opened up a few design points that will be addressed in the future:
1. A named linalg op has a static region builder instead of an
explicitly parsed region. This is not currently compatible with
assemblyFormat so a custom parser / printer are needed.
2. The convention for structured ops and tensor return values needs to
evolve to allow tensor-land and buffer land specifications to agree
3. ReferenceIndexingMaps and referenceIterators will need to become
static to allow building attributes at parse time.
4. Error messages will be improved once we have 3. and we pretty print
in custom form.
Differential Revision: https://reviews.llvm.org/D78327
Unfortunately FileCheck ignores directives with whitespace between the directive and the colon (`CHECK :` for example), thus most of the directives of this test were ignored.
Differential Revision: https://reviews.llvm.org/D78548
This is possible by adding two new ControlFlowInterface additions:
- A new interface, RegionBranchOpInterface
This interface allows for region holding operations to describe how control flows between regions. This interface initially contains two methods:
* getSuccessorEntryOperands
Returns the operands of this operation used as the entry arguments when entering the region at `index`, which was specified as a successor by `getSuccessorRegions`. when entering. These operands should correspond 1-1 with the successor inputs specified in `getSuccessorRegions`, and may be a subset of the entry arguments for that region.
* getSuccessorRegions
Returns the viable successors of a region, or the possible successor when branching from the parent op. This allows for describing which regions may be executed when entering an operation, and which regions are executed after having executed another region of the parent op. For example, a structured loop operation may always enter into the loop body region. The loop body region may branch back to itself, or exit to the operation.
- A trait, ReturnLike
This trait signals that a terminator exits a region and forwards all of its operands as "exiting" values.
These additions allow for performing more general dataflow analysis in the presence of region holding operations.
Differential Revision: https://reviews.llvm.org/D78447
This revision adds the initial pass for performing SCCP generically in MLIR. SCCP is an algorithm for propagating constants across control flow, and optimistically assumes all values to be constant unless proven otherwise. It currently supports branching control, with support for regions and inter-procedural propagation being added in followups.
Differential Revision: https://reviews.llvm.org/D78397
The promotion transformation is promoting all input and output buffers of the transformed op. The user might want to only promote some of these buffers.
Differential Revision: https://reviews.llvm.org/D78498
The previous code result a mismatch between block argument types and
predecessor successor args when a type conversion was needed in a
multiblock case. It was assuming the replaced result types matched the
region result types.
Also, slighly improve the debug output from the inliner.
Differential Revision: https://reviews.llvm.org/D78415
Summary:
Generate method to generate a DictionaryAttr with attribute values of
derived attribute. If a conversion back from the derived attribute C++
type to Attribute is not defined, then attempting to materialize such an
op's derived attributes would result in runtime failure.
This allows to treat derived attributes and attributes of an op in more
uniform manner where needed. The derived attributes are not added to the
operation but returned as new attribute instead.
Differential Revision: https://reviews.llvm.org/D78302
Fix intra-tile upper bound setting in a scenario where the tile size was
larger than the trip count.
Differential Revision: https://reviews.llvm.org/D78505
memref types with dynamic dimensions do not have a compile-time
known size. They should be mapped to SPIR-V runtime array types.
Differential Revision: https://reviews.llvm.org/D78197
Summary:
Workgroup size is written into the kernel. So to properly modelling
vulkan launch, we have to skip local workgroup size for vulkan launch
call op.
Differential Revision: https://reviews.llvm.org/D78307
Summary:
The tests referred to in Chapter 3 of the tutorial were missing from the tutorial test
directory; this adds those missing tests. This also cleans up some stale directory paths and code
snippets used throughout the tutorial.
Differential Revision: https://reviews.llvm.org/D76809
Summary:
The tests referred to in Chapter 3 of the tutorial were missing from the tutorial test
directory; this adds those missing tests. This also cleans up some stale directory paths and code
snippets used throughout the tutorial.
Differential Revision: https://reviews.llvm.org/D76809
Summary:
The tests referred to in Chapter 3 of the tutorial were missing from the tutorial test
directory; this adds those missing tests. This also cleans up some stale directory paths and code
snippets used throughout the tutorial.
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76809
Summary:
Rather than having a full, recursive, lowering of vector.broadcast
to LLVM IR, it is much more elegant to have a progressive lowering
of each vector.broadcast into a lower dimensional vector.broadcast,
until only elementary vector operations remain. This results
in more elegant, step-wise code, that is easier to understand.
Also makes some optimizations in the generated code.
Reviewers: nicolasvasilache, mehdi_amini, andydavis1, grosul1
Reviewed By: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78071
MLIR supports operations with resizable operand lists, but this property must
be indicated during the construction of such operations. It can be done
programmatically by calling a function on OperationState. Introduce an
ODS-internal trait `ResizableOperandList` to indicate such operations are use
it when generating the bodies of various `build` functions as well as the
`parse` function when the declarative assembly format is used.
Differential Revision: https://reviews.llvm.org/D78292
There were some unused CMakeFiles for Affine/IR and Affine/EDSC.
This change builds separate MLIRAffineOps and MLIRAffineEDSC libraries
using those CMakeFiles. This combination replaces the old MLIRAffine
library.
Differential Revision: https://reviews.llvm.org/D78317
The function attribute in generic ops is not paying for itself.
A region is the more standardized way of specifying a custom computation.
If needed this region can call a function directly.
This is deemed more natural than managing a dedicated function attribute.
This also simplifies named ops generation by trimming unnecessary complexity.
Differential Revision: https://reviews.llvm.org/D78266
OpBase.td defined attributes kind for all integer types expect index. This
commit fixes that by adding an IndexAttr attribute kind. Update the
respective tests.
Differential Revision: https://reviews.llvm.org/D78195
This change makes the ModuleTranslation threadsafe by locking on the
LLVMContext. Furthermore, we now clone the llvm module into a new
context when compiling to PTX similar to what the OrcJit does.
Differential Revision: https://reviews.llvm.org/D78207
Summary:
OpBase.td defined attributes kind for all integer types expect index. This
commit fixes that by adding an IndexAttr attribute kind.
Differential Revision: https://reviews.llvm.org/D78195
Summary:
Modified AffineMap::get to remove support for the overload which allowed
an ArrayRef of AffineExpr but no context (and gathered the context from a
presumed first entry, resulting in bugs when there were 0 results).
Instead, we support only a ArrayRef and a context, and a version which
takes a single AffineExpr.
Additionally, removed some now needless case logic which previously
special cased which call to AffineMap::get to use.
Reviewers: flaub, bondhugula, rriddle!, nicolasvasilache, ftynse, ulysseB, mravishankar, antiagainst, aartbik
Subscribers: mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, bader, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78226
This revision introduces a utility to unswitch affine.for/parallel loops
by hoisting affine.if operations past surrounding affine.for/parallel.
The hoisting works for both perfect/imperfect nests and in the presence
of else blocks. The hoisting is currently to as outermost a level as
possible. Uses a test pass to test the utility.
Add convenience method Operation::getParentWithTrait<Trait>.
Depends on D77487.
Differential Revision: https://reviews.llvm.org/D77870
Similarly to actual LLVM IR, and to `llvm.mlir.func`, allow the custom syntax
of `llvm.mlir.global` to omit the linkage keyword. If omitted, the linkage is
assumed to be external. This makes the modeling of globals in the LLVM dialect
more consistent, both within the dialect and with LLVM IR.
Differential Revision: https://reviews.llvm.org/D78096
Introduce mlir::applyOpPatternsAndFold which applies patterns as well as
any folding only on a specified op (in contrast to
applyPatternsAndFoldGreedily which applies patterns only on the regions
of an op isolated from above). The caller is made aware of the op being
folded away or erased.
Depends on D77485.
Differential Revision: https://reviews.llvm.org/D77487
This class implements a switch-like dispatch statement for a value of 'T' using dyn_cast functionality. Each `Case<T>` takes a callable to be invoked if the root value isa<T>, the callable is invoked with the result of dyn_cast<T>() as a parameter.
Differential Revision: https://reviews.llvm.org/D78070
These have proved incredibly useful for interleaving values between a range w.r.t to streams. After this revision, the mlir/Support/STLExtras.h is empty. A followup revision will remove it from the tree.
Differential Revision: https://reviews.llvm.org/D78067
The inversePermutation method returns a null map on failure. Update
uses of this method within Linalg to handle this. In LinalgToLoops the
null return value was used to emit scalar code. Modify that to return
failure, and emit scalar implementation when affine map is "empty",
i.e. 1 dims, 0 symbols and no result exprs.
Differential Revision: https://reviews.llvm.org/D77964
Summary:
This operation occurs during collapseParallelLoops, so we constant fold
them also to allow more situations of determining a loop invariant upper
bound when lowering to the GPU dialect from the Loop dialect.
Differential Revision: https://reviews.llvm.org/D77723
Summary: Functional.h contains many different methods that have a direct, and more efficient, equivalent in LLVM. This revision replaces all usages with the LLVM equivalent, and removes the header. This is part of larger cleanup, pr45513, merging MLIR support facilities into LLVM.
Differential Revision: https://reviews.llvm.org/D78053
The invertPermutation method does not return a nullptr anymore, but
rather returns an empty map for the scalar case. Update the check in
LinalgToLoops to reflect this.
Also add test case for generating scalar code.
The outer parallel loops of a linalg operation is lowered to
loop.parallel, with the other loops lowered to loop.for. This gets the
lowering to loop.parallel on par with the loop.for lowering. In future
the reduction loop could also be lowered to loop.parallel.
Also add a utility function that returns the loops that are
created.
Differential Revision: https://reviews.llvm.org/D77678
NFC clean up for simplify-affine-structures test cases. Rename sets
better; avoid suffix numbers; move outlined definitions close to use.
This is in preparation for other functionality updates.
Differential Revision: https://reviews.llvm.org/D78017
This commit added stride support in runtime array types. It also
adjusted the assembly form for the stride from `[N]` to `stride=N`.
This makes the IR more readable, especially for the cases where
one mix array types and struct types.
Differential Revision: https://reviews.llvm.org/D78034
Summary: This revision makes the registration of command line options for these two files manual with `registerMLIRContextCLOptions` and `registerAsmPrinterCLOptions` methods. This removes the last remaining static constructors within lib/.
Differential Revision: https://reviews.llvm.org/D77960
Summary: This revision adds support for specifying operands or results as "optional". This is a special case of variadic where the number of elements is either 0 or 1. Operands and results of this kind will have accessors generated using Value instead of the range types, making it more natural to interface with.
Differential Revision: https://reviews.llvm.org/D77863
Summary:
The string in the location is used to provide metadata for the fused location
or create a NamedLoc. This allows tagging individual locations to convey
additional rewrite information.
Differential Revision: https://reviews.llvm.org/D77840
Summary:
This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745).
While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.
Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:
1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax
2. implicit and eager discovery of dims and symbols when parsing
3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)
A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.
For the following "Tensor Comprehension-inspired" string:
```
def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}
```
With -gen-ods-decl=1, this emits (modulo formatting):
```
def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
NInputs<2>,
NOutputs<1>,
NamedStructuredOpTraits]> {
let arguments = (ins Variadic<LinalgOperand>:$views);
let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
let extraClassDeclaration = [{
llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
void regionBuilder(ArrayRef<BlockArgument> args);
}];
let hasFolder = 1;
}
```
With -gen-ods-impl, this emits (modulo formatting):
```
llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getReductionIteratorTypeName() };
}
llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
{
MLIRContext *context = getContext();
AffineExpr d0, d1, d2, d3;
bindDims(context, d0, d1, d2, d3);
return SmallVector<AffineMap, 8>{
AffineMap::get(4, 0, {d0, d1, d3}),
AffineMap::get(4, 0, {d3, d2}),
AffineMap::get(4, 0, {d0, d1, d2}) };
}
void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
using namespace edsc;
using namespace intrinsics;
ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);
ValueHandle _4 = std_mulf(_0, _1);
ValueHandle _5 = std_addf(_2, _4);
(linalg_yield(ValueRange{ _5 }));
}
```
Differential Revision: https://reviews.llvm.org/D77067
This patch adds support for taskwait and taskyield operations in OpenMP dialect and translation of the these constructs to LLVM IR. The OpenMP IRBuilder is used for this translation.
The patch includes code changes and a testcase modifications.
Differential Revision: https://reviews.llvm.org/D77634
Rename mlir::applyPatternsGreedily -> applyPatternsAndFoldGreedily. The
new name is a more accurate description of the method - it performs
both, application of the specified patterns and folding of all ops in
the op's region irrespective of whether any patterns have been supplied.
Differential Revision: https://reviews.llvm.org/D77478
Introduce a new operation property / trait (AutomaticAllocationScope)
for operations with regions that define a new scope for automatic allocations;
such allocations (typically realized on stack) are automatically freed when
control leaves such ops' regions. std.alloca's are freed at the closest
surrounding op that has this trait. All FunctionLike operations should normally
have this trait.
Differential Revision: https://reviews.llvm.org/D77787
Summary:
LLVM matrix intrinsics recently introduced an option to support row-major mode.
This matches the MLIR vector model, this revision switches to row-major.
A corner case related to degenerate sizes was also fixed upstream.
This revision removes the guard against this corner case.
A bug was uncovered on the output vector construction which this revision also fixes.
Lastly, this has been tested on a small size and benchmarked independently: no visible performance regression is observed.
In the future, when matrix intrinsics support per op attribute, we can more aggressively translate to that and avoid inserting MLIR-level transposes.
This has been tested independently to work on small matrices.
Differential Revision: https://reviews.llvm.org/D77761
Summary:
This revision adds support to lower 1-D vector transfers to LLVM.
A mask of the vector length is created that compares the base offset + linear index to the dim of the vector.
In each position where this does not overflow (i.e. offset + vector index < dim), the mask is set to 1.
A notable fact is that the lowering uses llvm.dialect_cast to allow writing code in the simplest form by targeting the simplest mix of vector and LLVM dialects and
letting other conversions kick in.
Differential Revision: https://reviews.llvm.org/D77703
The "cblas" lib under mlir/test is meant as a simple integration demonstration.
However it is installed and ends up conflicting with external projects who want to
define the real cblas.
Rename to avoid conflicts.
Differential revision: https://reviews.llvm.org/D76615
Summary: Some pattern rewriters, like dialect conversion, prohibit the unbounded recursion(or reapplication) of patterns on generated IR. Most patterns are not written with recursive application in mind, so will generally explode the stack if uncaught. This revision adds a hook to RewritePattern, `hasBoundedRewriteRecursion`, to signal that the pattern can safely be applied to the generated IR of a previous application of the same pattern. This allows for establishing a contract between the pattern and rewriter that the pattern knows and can handle the potential recursive application.
Differential Revision: https://reviews.llvm.org/D77782
This revision builds a simple "fused pass" consisting of 2 levels of tiling, memory promotion and vectorization using linalg transformations written as composable pattern rewrites.
Summary: Pass options are a better choice for various reasons and avoid the need for static constructors.
Differential Revision: https://reviews.llvm.org/D77707
Summary:
Update ShapeCastOp folder to use producer-consumer value forwarding.
Support is added for tracking sub-vectors through trivial shape cast operations,
where the sub-vector shape is preserved across shape cast operations and only
leading ones are added or removed.
Support is preserved for cancelling shape cast operations.
One unit test is added and two are updated.
Reviewers: aartbik, nicolasvasilache
Reviewed By: aartbik, nicolasvasilache
Subscribers: frgossen, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77253
Support to recognize and deal with aligned_alloc was recently added to
LLVM's TLI/MemoryBuiltins and its various optimization passes. This
revision adds support for generation of aligned_alloc's when lowering
AllocOp from std to LLVM. Setting 'use-aligned_alloc=1' will lead to
aligned_alloc being used for all heap allocations. An alignment and size
that works with the constraints of aligned_alloc is chosen.
Using aligned_alloc is preferable to "using malloc and adjusting the
allocated pointer to align for indexing" because the pointer access
arithmetic done for the latter only makes it harder for LLVM passes to
deal with for analysis, optimization, attribute deduction, and rewrites.
Differential Revision: https://reviews.llvm.org/D77528
This revision removes the reliance of Promotion on `linalg.slice` which is meant
for the rank-reducing case.
Differential Revision: https://reviews.llvm.org/D77676
Invoke `keep()` on the output file of `mlir-opt` in case the invocation of `MlirOptMain` was successful, to make sure the output file is not deleted on exit from `mlir-opt`.
Fixes a similar problem in `standalone-opt` from the example for an out-of-tree, standalone MLIR dialect.
This revision also adds a missing parameter to the invocation of `MlirOptMain` in `standalone-opt`.
Differential Revision: https://reviews.llvm.org/D77643
Error messages for the custom assembly format are difficult to understand
because there are no line numbers. This happens because the assembly format
is parsed as a standalone line, separate from it's parent file, with no useful
location information. Fixing this properly probably requires quite a bit
of invasive plumbing through the SourceMgr, similar to how included files
are handled
This proposal is a less invasive short term solution. When generating an
error message we generate an additional note which at least properly describes
the operation definition the error occured in, if not the actual line number
of the assemblyFormat definition.
A typical message is like:
error: type of operand #0, named 'operand', is not buildable and a buildable type cannot be inferred
$operand type($result) attr-dict
^
/src/llvm-project/mlir/test/mlir-tblgen/op-format-spec.td:296:1: note: in custom assembly format for this operation
def ZCoverageInvalidC : TestFormat_Op<"variable_invalid_c", [{
^
note: suggest adding a type constraint to the operation or adding a 'type($operand)' directive to the custom assembly format
$operand type($result) attr-dict
^
Differential Revision: https://reviews.llvm.org/D77488
The messages are somewhat cryptic, since they are not complete sentences,
include lots of ambiguous words, like 'format' which are hard to parse,
and include names from the users code which may, or may not make sense in
the context of the message. Start to clean this up and provide some
guidance for fixes.
Also, add a test for one of the messages which didn't have a test at all.
Differential Revision: https://reviews.llvm.org/D77449
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
ModulePass doesn't provide any special utilities and thus doesn't give enough benefit to warrant a special pass class. This revision replaces all usages with the more general OperationPass.
Differential Revision: https://reviews.llvm.org/D77339
Summary:
Add directive to indicate the location to give to op being created. This
directive is optional and if unused the location will still be the fused
location of all source operations.
Currently this directive only works with other op locations, reusing an
existing op location or a fusion of op locations. But doesn't yet support
supplying metadata for the FusedLoc.
Based off initial revision by antiagainst@ and effectively mirrors GlobalIsel
debug_locations directive.
Differential Revision: https://reviews.llvm.org/D77649
Summary:
* Removal of FxpMathOps was discussed on the mailing list.
* Will send a courtesy note about also removing the Quantizer (which had some dependencies on FxpMathOps).
* These were only ever used for experimental purposes and we know how to get them back from history as needed.
* There is a new proposal for more generalized quantization tooling, so moving these older experiments out of the way helps clean things up.
Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77479
If we have two back-to-back loops with block arguments, the OpPhi
instructions generated for the second loop's block arguments should
have use the merge block of the first SPIR-V loop structure as
their incoming parent block.
Differential Revision: https://reviews.llvm.org/D77543
Introduce the alloca op for stack memory allocation. When converting to the
LLVM dialect, this is lowered to an llvm.alloca. Refactor the std to
llvm conversion for alloc op to reuse with alloca. Drop useAlloca option
with alloc op lowering.
Differential Revision: https://reviews.llvm.org/D76602
Fix point-wise copy generation to work with bounds that have max/min.
Change structure of copy loop nest to use absolute loop indices and
subtracting base from the indexes of the fast buffers. Update supporting
utilities: Fix FlatAffineConstraints::getLowerAndUpperBound to look at
equalities as well and for a missing division. Update unionBoundingBox
to not discard common constraints (leads to a tighter system). Update
MemRefRegion::getConstantBoundingSizeAndShape to add memref dimension
constraints. Run removeTrivialRedundancy at the end of
MemRefRegion::compute. Run single iteration loop promotion and
load/store canonicalization after affine data copy (in its test pass as
well).
Differential Revision: https://reviews.llvm.org/D77320
Summary:
This revision adds a tensor_reshape operation that operates on tensors.
In the tensor world the constraints are less stringent and we can allow more
arbitrary dynamic reshapes, as long as they are contractions.
The expansion of a dynamic dimension into multiple dynamic dimensions is under-specified and is punted on for now.
Differential Revision: https://reviews.llvm.org/D77360
Add a pattern rewriter utility to erase blocks (while notifying the
pattern rewriting driver of the erased ops). Use this to remove trivial
else blocks in affine.if ops.
Differential Revision: https://reviews.llvm.org/D77083
Summary: This revision adds support for marking the last region as variadic in the ODS region list with the VariadicRegion directive.
Differential Revision: https://reviews.llvm.org/D77455
Summary: The attribute grammar includes an optional trailing colon type, so for attributes without a constant buildable type this will generally lead to unexpected and undesired behavior. Given that, it's better to just error out on these cases.
Differential Revision: https://reviews.llvm.org/D77293
Summary:
A recent extension allowed the `loop.if` operation to return results yielded by
its regions. However, such operations could not be lowered to a CFG of standard
operations because it would have required to modify the argument list of a
block, which is not allowed in a conversion pattern. Now that the conversion
infrastructure supports block creation, use it to create a block with an
argument list that dominates the operations following the `loop.if` and forward
the results as arguments of this block.
Depends On D77416
Differential Revision: https://reviews.llvm.org/D77418
Summary:
Linalg makes it possible to interface codegen with externally precompiled HPC libraries. The mechanism to allow such interop uses a normalized ABI and the emission of C interface wrappers.
The mechanism controlling these C interface emission is too aggressive and makes it very easy to obtained undefined symbols for external function (e.g. the ones coming from libm).
This revision uses the newly introduced llvm.emit_c_interface function attribute which allows controlling this behavior at a function granularity. As a consequence LinalgToLLVM does not need to activate the C wrapper emission when adding the StdToLLVM patterns.
Differential Revision: https://reviews.llvm.org/D77364
PatternRewriter and derived classes provide a set of virtual methods to
manipulate blocks, which ConversionPatternRewriter overrides to keep track of
the manipulations and undo them in case the conversion fails. However, one can
currently create a block only by splitting another block into two. This not
only makes the API inconsistent (`splitBlock` is allowed in conversion
patterns, but `createBlock` is not), but it also make it impossible for one to
create blocks with argument lists different from those of already existing
blocks since in-place block updates are not supported either. Such
functionality precludes dialect conversion infrastructure from being used more
extensively on region-containing ops, for example, for value-returning "if"
operations. At the same time, ConversionPatternRewriter already allows one to
undo block creation as block creation is one of the primitive operations in
already supported region inlining.
Support block creation in conversion patterns by hooking `createBlock` on the
block action undo mechanism. This requires to make `Builder::createBlock`
virtual, similarly to Op insertion. This is a minimal change to the Builder
infrastructure that will later help support additional use cases such as block
signature changes. `createBlock` now additionally takes the types of the block
arguments that are added immediately so as to avoid in-place argument list
manipulation that would be illegal in conversion patterns.
Two back-to-back transpose operations are combined into a single transpose, which uses a combination of their permutation vectors.
Differential Revision: https://reviews.llvm.org/D77331
A certain number of EDSCs have a named form (e.g. `linalg.matmul`) and a generic form (e.g. `linalg.generic` with matmul traits).
Despite living in different namespaces, using the same name is confusiong in clients.
Rename them as `linalg_matmul` and `linalg_generic_matmul` respectively.
C interface emission is controlled by a flag and has coarse granularity.
With this coarse control, interfaces are emitted for all external functions.
This makes is easy to get undefined symbols.
This revision adds support for controlling per-function emission with an "emit_c_interface" attribute.
Summary:
LLVM IR functions can have arbitrary attributes attached to them, some of which
affect may affect code transformations. Until we can model all attributes
consistently, provide a pass-through mechanism that forwards attributes from
the LLVMFuncOp in MLIR to LLVM IR functions during translation. This mechanism
relies on LLVM IR being able to recognize string representations of the
attributes and performs some additional checking to avoid hitting assertions
within LLVM code.
Differential Revision: https://reviews.llvm.org/D77072
Add a method that given an affine map returns another with just its unique
results. Use this to drop redundant bounds in max/min for affine.for. Update
affine.for's canonicalization pattern and createCanonicalizedForOp to use
this.
Differential Revision: https://reviews.llvm.org/D77237
Summary:
This is to allow optimizations like loop invariant code motion to work
on the ParallelOp.
Additional small cleanup on the ForOp implementation of
LoopLikeInterface and the test file of loop-invariant-code-motion.
Differential Revision: https://reviews.llvm.org/D77128
Summary:
The RAW fusion happens only if the produecer block dominates the consumer block.
The WAW pattern also works with the precondition. I.e., if a producer can
dominate the consumer, they can fairly fuse together.
Since they are all tilable, we can think the pattern like this way:
Input:
```
linalg_op1 view
tile_loop
subview_2
linalg_op2 subview_2
```
Tile the first Linalg op as same as the second Linalg.
```
tile_loop
subview_1
linalg_op1 subview_1
tile_loop
subview_2
liangl_op2 subview_2
```
Since the first Linalg op is tilable in the same way and the computation are
independently, it's fair to fuse it with the second Linalg op.
```
tile_loop
subview_1
linalg_op1 subview_1
linalg_op2 subview_2
```
In short, this patch includes:
- Handling both RAW and WAW pattern.
- Adding a interface method to get input and output buffers.
- Exposing a method to get a StringRef of a dependency type.
- Fixing existing WAW tests and add one more use case: initialize the buffer
before conv op.
Differential Revision: https://reviews.llvm.org/D76897
Summary:
Performs an N-D pooling operation similarly to the description in the TF
documentation:
https://www.tensorflow.org/api_docs/python/tf/nn/pool
Different from the description, this operation doesn't perform on batch and
channel. It only takes tensors of rank `N`.
```
output[x[0], ..., x[N-1]] =
REDUCE_{z[0], ..., z[N-1]}
input[
x[0] * strides[0] - pad_before[0] + dilation_rate[0]*z[0],
...
x[N-1]*strides[N-1] - pad_before[N-1] + dilation_rate[N-1]*z[N-1]
],
```
The required optional arguments are:
- strides: an i64 array specifying the stride (i.e. step) for window
loops.
- dilations: an i64 array specifying the filter upsampling/input
downsampling rate
- padding: an i64 array of pairs (low, high) specifying the number of
elements to pad along a dimension.
If strides or dilations attributes are missing then the default value is
one for each of the input dimensions. Similarly, padding values are zero
for both low and high in each of the dimensions, if not specified.
Differential Revision: https://reviews.llvm.org/D76414
Move lower-affine.mlir from test/Transforms to
test/Conversion/AffineToStandard/. Other related NFC.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D77008
Existing tiling implementation of Linalg would still work for tiling
the batch dimensions of the convolution op.
Differential Revision: https://reviews.llvm.org/D76637
Summary:
Add support for TupleGetOp folding through InsertSlicesOp and ExtractSlicesOp.
Vector-to-vector transformations for unrolling and lowering to hardware vectors
can generate chains of structured vector operations (InsertSlicesOp,
ExtractSlicesOp and ShapeCastOp) between the producer of a hardware vector
value and its consumer. Because InsertSlicesOp, ExtractSlicesOp and ShapeCastOp
are structured, we can track the location (tuple index and vector offsets) of
the consumer vector value through the chain of structured operations to the
producer, enabling a much more powerful producer-consumer fowarding of values
through structured ops and tuple, which in turn enables a more powerful
TupleGetOp folding transformation.
Reviewers: nicolasvasilache, aartbik
Reviewed By: aartbik
Subscribers: grosul1, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76889
Rewrite mlir::permuteLoops (affine loop permutation utility) to fix
incorrect approach. Avoiding using sinkLoops entirely - use single move
approach. Add test pass.
This fixes https://bugs.llvm.org/show_bug.cgi?id=45328
Depends on D77003.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D77004
Summary: This revision updates the SourceMgrDiagnosticHandler to not print the source location of a note if it is the same location as the previously printed diagnostic. This helps avoid redundancy, and potential confusion, when looking at the diagnostic output.
Differential Revision: https://reviews.llvm.org/D76787
Move test/lib/TestDialect to test/lib/Dialect/Test - makes the dir
structure more uniform.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76677
This patch introduces a utility to separate full tiles from partial
tiles when tiling affine loop nests where trip counts are unknown or
where tile sizes don't divide trip counts. A conditional guard is
generated to separate out the full tile (with constant trip count loops)
into the then block of an 'affine.if' and the partial tile to the else
block. The separation allows the 'then' block (which has constant trip
count loops) to be optimized better subsequently: for eg. for
unroll-and-jam, register tiling, vectorization without leading to
cleanup code, or to offload to accelerators. Among techniques from the
literature, the if/else based separation leads to the most compact
cleanup code for multi-dimensional cases (because a single version is
used to model all partial tiles).
INPUT
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
"foo"() : () -> ()
}
}
OUTPUT AFTER TILING W/O SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
OUTPUT AFTER TILING WITH SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0) -> (d0 + 32)>
map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
#set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.if #set0(%arg2, %arg3)[%M, %N] {
// Full tile.
affine.for %arg4 = #map0(%arg2) to #map1(%arg2) {
affine.for %arg5 = #map0(%arg3) to #map1(%arg3) {
"foo"() : () -> ()
}
}
} else {
// Partial tile.
affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
}
The separation is tested via a cmd line flag on the loop tiling pass.
The utility itself allows one to pass in any band of contiguously nested
loops, and can be used by other transforms/utilities. The current
implementation works for hyperrectangular loop nests.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76700
The Dominance analysis currently misses a utility function to find the nearest common dominator of two given blocks. This is required for a huge variety of different control-flow analyses and transformations. This commit adds this function and moves the getNode function from DominanceInfo to DominanceInfoBase, as it also works for post dominators.
Differential Revision: https://reviews.llvm.org/D75507
This change adds a new option to the StandardToLLVM lowering to configure
the bitwidth of the index type independently of the target architecture's
pointer size.
Differential revision: https://reviews.llvm.org/D76353
It's common in many dialects to use tensors to themselves hold tensor shapes (for example, the shape is itself the result of some non-trivial calculation). Currently, such dialects have to use `tensor<?xi64>` or worse (like allowing either i32 or i64 tensors to represent shapes). `tensor<?xindex>` is the natural type to represent this, but is currently disallowed. This patch allows it.
Differential Revision: https://reviews.llvm.org/D76726
This allows conversion of a ParallelLoop from N induction variables to
some nuber of induction variables less than N.
The first intended use of this is for the GPUDialect to convert
ParallelLoops to iterate over 3 dimensions so they can be launched as
GPU Kernels.
To implement this:
- Normalize each iteration space of the ParallelLoop
- Use the same induction variable in a new ParallelLoop for multiple
original iterations.
- Split the new induction variable back into the original set of values
inside the body of the ParallelLoop.
Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76363
- add method to get back an integer set from flat affine constraints;
this allows a round trip
- use this to complete the simplification of integer sets in
-simplify-affine-structures
- update FlatAffineConstraints::removeTrivialRedundancy to also do GCD
tightening and normalize by GCD (while still keeping it linear time).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Summary:
The test performs add on vector<16384xf32> with
number of workgroups = (128, 1, 1)
local workgroup size = (128, 1, 1)
On a NVIDIA Quadro P1000, I see the following results:
Command buffer submit time: 13us
Compute shader execution time: 19.616us
Differential Revision: https://reviews.llvm.org/D76799
Summary:
The attribute parser fails to correctly parse unsigned 64 bit
attributes as the check `isNegative ? (int64_t)-val.getValue() >= 0
: (int64_t)val.getValue() < 0` will falsely detect an overflow for
unsigned values larger than 2^63-1.
This patch reworks the overflow logic to instead of doing arithmetic
on int64_t use APInt::isSignBitSet() and knowledge of the attribute
type.
Test-cases which verify the de-facto behavior of the parser and
triggered the previous faulty handing of unsigned 64 bit attrbutes are
also added.
Differential Revision: https://reviews.llvm.org/D76493
Summary: The current ConvertStandardToLLVM phase lowers the standard TanHOp to function calls to external tanh symbols. However, this leads to misunderstandings since these external symbols are not defined anywhere. This commit removes the TanHOp lowering functionality from ConvertStandardToLLVM, adapts the LowerGpuOpsToNVVMOps and LowerGpuOpsToROCDLOps passes and adjusts the affected test cases.
Reviewers: mravishankar, herhut
Subscribers: jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75509
gpu.launch
Current implementation of lowering from loop.parallel to gpu.launch
uses a DictionaryAttr to specify the mapping. Moving this attribute to
be auto-generated from specification as a StructAttr. This simplifies
a lot the logic of looking up and creating this attribute.
Differential Revision: https://reviews.llvm.org/D76165
Summary:
While here, simplify the lexer a bit by eliminating the unneeded 'operator'
classification of certain sigils, they can just be treated as 'punctuation'.
Reviewers: rriddle!
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76647
Summary:
This allows the custom parser/printer hooks to do interesting things with
the SSA names. This patch:
- Adds a new 'getResultName' method to OpAsmParser that allows a parser
implementation to get information about its result names, along with
a getNumResults() method that allows op parser impls to know how many
results are expected.
- Adds a OpAsmPrinter::printOperand overload that takes an explicit stream.
- Adds a test.string_attr_pretty_name operation that uses these hooks to
do fancy things with the result name.
Reviewers: rriddle!
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76205
Move some of the affine transforms and their test cases to their
respective dialect directory. This patch does not complete the move, but
takes care of a good part.
Renames: prefix 'affine' to affine loop tiling cl options,
vectorize -> super-vectorize
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76565
Summary:
This removes the static pass registration, and also cleans up some lingering technical debt.
Differential Revision: https://reviews.llvm.org/D76554
Summary:
This file only contains references to test passes, and was never removed when the test passes were moved to the test/ directory.
Differential Revision: https://reviews.llvm.org/D76553
Summary:
Change AffineOps Dialect structure to better group both IR and Tranforms. This included extracting transforms directly related to AffineOps. Also move AffineOps to Affine.
Differential Revision: https://reviews.llvm.org/D76161
The Vector Dialect [document](https://mlir.llvm.org/docs/Dialects/Vector/) discusses the vector abstractions that MLIR supports and the various tradeoffs involved.
One of the layer that is missing in OSS atm is the Hardware Vector Ops (HWV) level.
This revision proposes an AVX512-specific to add a new Dialect/Targets/AVX512 Dialect that would directly target AVX512-specific intrinsics.
Atm, we rely too much on LLVM’s peephole optimizer to do a good job from small insertelement/extractelement/shufflevector. In the future, when possible, generic abstractions such as VP intrinsics should be preferred.
The revision will allow trading off HW-specific vs generic abstractions in MLIR.
Differential Revision: https://reviews.llvm.org/D75987
Summary: This patch add tests when lowering multiple `gpu.all_reduce` operations in the same kernel. This was previously failing.
Differential Revision: https://reviews.llvm.org/D75930
Summary:
These are not supported by any of the code using `type_cast`. In the general
case, such casting would require memrefs to handle a non-contiguous vector
representation or misaligned vectors (e.g., if the offset of the source memref
is not divisible by vector size, since offset in the target memref is expressed
in the number of elements).
Differential Revision: https://reviews.llvm.org/D76349
Summary:
Utility to perform CallOp Dialect conversion, specifically handling cases where
an argument type has changed and the corresponding CallOp needs to be updated.
Differential Revision: https://reviews.llvm.org/D76326
This commit merges the DRR pattern for std.constant to spv.constant
conversion into the C++ OpConversionPattern. This allows us to have
remove the DRR pattern file. Along the way, this commit enhanced
std.constant to spv.constant conversion to consider type conversions,
which means converting the underlying attributes if necessary.
Differential Revision: https://reviews.llvm.org/D76246