This patch adds support for producer-consumer fusion scenarios with
multiple producer stores to the AffineLoopFusion pass. The patch
introduces some changes to the producer-consumer algorithm, including:
* For a given consumer loop, producer-consumer fusion iterates over its
producer candidates until a fixed point is reached.
* Producer candidates are gathered beforehand for each iteration of the
consumer loop and visited in reverse program order (not strictly guaranteed)
to maximize the number of loops fused per iteration.
In general, these changes were needed to simplify the multi-store producer
support and remove some of the workarounds that were introduced in the past
to support more fusion cases under the single-store producer limitation.
This patch also preserves the existing functionality of AffineLoopFusion with
one minor change in behavior. Producer-consumer fusion didn't fuse scenarios
with escaping memrefs and multiple outgoing edges (from a single store).
Multi-store producer scenarios will usually (always?) have multiple outgoing
edges so we couldn't fuse any with escaping memrefs, which would greatly limit
the applicability of this new feature. Therefore, the patch enables fusion for
these scenarios. Please, see modified tests for specific details.
Reviewed By: andydavis1, bondhugula
Differential Revision: https://reviews.llvm.org/D92876
Add a check if regions do not implement the RegionBranchOpInterface. This is not
allowed in the current deallocation steps. Furthermore, we handle edge-cases,
where a single region is attached and the parent operation has no results.
This fixes: https://bugs.llvm.org/show_bug.cgi?id=48575
Differential Revision: https://reviews.llvm.org/D94586
The runtime-wrappers depend on LLVMSupport, pulling in static initialization code (e.g. command line arguments). Dynamically loading multiple such libraries results in ODR violoations.
So far this has not been an issue, but in D94421, I would like to load both the async-runtime and the cuda-runtime-wrappers as part of a cuda-runner integration test. When doing this, code that asserts that an option category is only registered once fails (note that I've only experienced this in Google's bazel where the async-runtime depends on LLVMSupport, but a similar issue would happen in cmake if more than one runtime-wrapper starts to depend on LLVMSupport).
The underlying issue is that we have a mix of static and dynamic linking. If all dependencies were loaded as shared objects (i.e. if LLVMSupport was linked dynamically to the runtime wrappers), each dependency would only get loaded once. However, linking dependencies dynamically would require special attention to paths (one could dynamically load the dependencies first given explicit paths). The simpler approach seems to be to link all dependencies statically into a single shared object.
This change basically applies the same logic that we have in the c_runner_utils: we have a shared object target that can be loaded dynamically, and we have a static library target that can be linked to other runtime-wrapper shared object targets.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D94399
Use cases with 16- or even 8-bit pointer/index structures have been identified.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D95015
* Matches how all of the other shaped types are declared.
* No super principled reason fro this ordering beyond that it makes the one that was different be like the rest.
* Also matches ordering of things like ndarray, et al.
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94812
* This isn't exclusive with other mechanisms for more ODS centric op definitions, but based on discussions, we feel that we will always benefit from a python escape hatch, and that is the most natural way to write things that don't fit the mold.
* I suspect this facility needs further tweaking, and once it settles, I'll document it and add more tests.
* Added extensions for linalg, since it is unusable without them and continued to evolve my e2e example.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94752
* This allows us to hoist trait level information for regions and sized-variadic to class level attributes (_ODS_REGIONS, _ODS_OPERAND_SEGMENTS, _ODS_RESULT_SEGMENTS).
* Eliminates some splicey python generated code in favor of a native helper for it.
* Makes it possible to implement custom, variadic and region based builders with one line of python, without needing to manually code access to the segment attributes.
* Needs follow-on work for region based callbacks and support for SingleBlockImplicitTerminator.
* A follow-up will actually add ODS support for generating custom Python builders that delegate to this new method.
* Also includes the start of an e2e sample for constructing linalg ops where this limitation was discovered (working progressively through this example and cleaning up as I go).
Differential Revision: https://reviews.llvm.org/D94738
This commit adds a new trait that can be attached to ops that have
signed semantics.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D94896
cmake_minimum_required(VERSION) calls cmake_policy(VERSION),
which sets all policies up to VERSION to NEW.
LLVM started requiring CMake 3.13 last year, so we can remove
a bunch of code setting policies prior to 3.13 to NEW as it
no longer has any effect.
Reviewed By: phosek, #libunwind, #libc, #libc_abi, ldionne
Differential Revision: https://reviews.llvm.org/D94374
In prehistorical times, AffineApplyOp was allowed to produce multiple values.
This allowed the creation of intricate SSA use-def chains.
AffineApplyNormalizer was originally introduced as a means of reusing the AffineMap::compose method to write SSA use-def chains.
Unfortunately, symbols that were produced by an AffineApplyOp needed to be promoted to dims and reordered for the mathematical composition to be valid.
Since then, single result AffineApplyOp became the law of the land but the original assumptions were not revisited.
This revision revisits these assumptions and retires AffineApplyNormalizer.
Differential Revision: https://reviews.llvm.org/D94920
* Development setup recommendations.
* Test updates to match what we actually do.
* Update cmake variable `PYTHON_EXECUTABLE` -> `Python3_EXECUTABLE` to match the upgrade to python3 repo wide.
This patch adds support for checking if two PresburgerSets are equal. In particular, one can check if two FlatAffineConstraints are equal by constructing PrebsurgerSets from them and comparing these.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94915
Use cross-compilation approach for `mlir-linalg-ods-gen` application
similar to TblGen tools.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94598
Added the ability to read (an extended version of) the FROSTT
file format, so that we can now read in sparse tensors of arbitrary
rank. Generalized the API to deal with more than two dimensions.
Also added the ability to sort the indices of sparse tensors
lexicographically. This is an important step towards supporting
auto gen of initialization code, since sparse storage formats
are easier to initialize if the indices are sorted. Since most
external formats don't enforce such properties, it is convenient
to have this ability in our runtime support library.
Lastly, the re-entrant problem of the original implementation
is fixed by passing an opaque object around (rather than having
a single static variable, ugh!).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94852
The operantion is an identity if the values yielded by the operation
is the argument of the basic block of that operation. Add this missing check.
Differential Revision: https://reviews.llvm.org/D94819
This commit adds support to generate an additional builder for
each named op that has attributes. This gives better experience
when creating the named ops.
Along the way adds support for i64.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D94733
This is a very minor improvement during iteration graph construction.
If the first attempt considering the dimension order of all tensors fails,
a second attempt is made using the constraints of sparse tensors only.
Dense tensors prefer dimension order (locality) but provide random access
if needed, enabling the compilation of more sparse kernels.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D94709
With the recent changes to linalg on tensor semantics, the tiling
operations works out-of-the-box for generic operations. Add a test to
verify that and some minor refactoring.
Differential Revision: https://reviews.llvm.org/D93077
Add canonicalization to replace use of the result of a linalg
operation on tensors in a dim operation, to use one of the operands of
the linalg operations instead. This allows the linalg op itself to be
deleted when all its non-dim uses are removed (say through tiling, etc.)
Differential Revision: https://reviews.llvm.org/D93076
linalg.generic/indexed_generic operations on tensors whose body is
just yielding the (non-induction variable) arguments of the operation
can be canonicalized by replacing uses of the result with the
corresponding arguments.
Differential Revision: https://reviews.llvm.org/D94581
The standard and gpu dialect both have `alloc` operations which use the
memory effect `MemAlloc`. In both cases, it is specified on both the
operation itself and on the result. This results in two memory effects
being created for these operations. When `MemAlloc` is defined on an
operation, it represents some background effect which the compiler
cannot reason about, and inhibits the ability of the compiler to
remove dead `std.alloc` operations. This change removes the uneeded
`MemAlloc` effect from these operations and leaves the effect on the
result, which allows dead allocs to be erased.
There is the same problem, but to a lesser extent, with MemFree, MemRead
and MemWrite. Over-specifying these traits is not currently inhibiting
any optimization.
Differential Revision: https://reviews.llvm.org/D94662
This spilts out BufferDeallocationInternals.md, since buffer
deallocation is not part of bufferization per se.
Differential Revision: https://reviews.llvm.org/D94351
TosaToLinalg was depending on its header file indirectly through
Passes.h rather than directly. This removes that indirection.
Differential Revision: https://reviews.llvm.org/D94706
This revision adds a new `replaceOpWithIf` hook that replaces uses of an operation that satisfy a given functor. If all uses are replaced, the operation gets erased in a similar manner to `replaceOp`. DialectConversion support will be added in a followup as this requires adjusting how replacements are tracked there.
Differential Revision: https://reviews.llvm.org/D94632
In the overwhelmingly common case, enum attribute case strings represent valid identifiers in MLIR syntax. This revision updates the format generator to format as a keyword in these cases, removing the need to wrap values in a string. The parser still retains the ability to parse the string form, but the printer will use the keyword form when applicable.
Differential Revision: https://reviews.llvm.org/D94575
This is a variant of TypesMatchWith that provides support for variadic arguments. This is necessary because ranges generally can't use the default operator== comparators for checking equality.
Differential Revision: https://reviews.llvm.org/D94574
Initial commit to add support for lowering from TOSA to Linalg. The focus is on
the essential infrastructure for these lowerings and integration with existing
passes.
Includes lowerings for a subset of operations including:
abs, add, sub, pow, and, or, xor, left shift, right shift, tanh
Lit tests are used to validate correctness.
Differential Revision: https://reviews.llvm.org/D94247
With this, we have complete support for emptiness checks. This also paves the way for future support to check if two FlatAffineConstraints are equal.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94272
Similar to the parallelization strategies, the vectorization strategies
provide control on what loops should be vectorize. Unlike the parallel
strategies, only innermost loops are considered, but including reductions,
with the control of vectorizing dense loops only or dense and sparse loops.
The vectorized loops are always controlled by a vector mask to avoid
overrunning the iterations, but subsequent vector operation folding removes
redundant masks and replaces the operations with more efficient counterparts.
Similarly, we will rely on subsequent loop optimizations to further optimize
masking, e.g. using an unconditional full vector loop and scalar cleanup loop.
The current strategy already demonstrates a nice interaction between the
sparse compiler and all prior optimizations that went into the vector dialect.
Ongoing discussion at:
https://llvm.discourse.group/t/mlir-support-for-sparse-tensors/2020/10
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D94551
This corrects the last 2 issues caught by tests when causing dialect
conversion rollbacks to occur.
Differential Revision: https://reviews.llvm.org/D94623
This commit adds support for parsing attribute uses in indexing
maps. These attribute uses are represented as affine symbols in
the resultant indexing maps because we can only know their
concrete value (which are coming from op attributes and are
constants) for specific op instances. The `indxing_maps()`
calls are synthesized to read these attributes and create affine
constants to replace the placeholder affine symbols and simplify.
Depends on D94240
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94335
An invalid permutation will trigger a C++ assertion when attempting to create an AffineMap from the permutation.
This patch adds an `isPermutation` function to check the given permutation before creating the AffineMap.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D94492
This avoids large source files and gives a better structure. It also
allows leveraging compilation parallelism.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D94360
With this, now we can specify a list of attributes on named ops
generated from the spec. The format is defined as
```
attr-id ::= bare-id (`?`)?
attr-typedef ::= type (`[` `]`)?
attr-def ::= attr-id `:` attr-typedef
tc-attr-def ::= `attr` `(` attr-def-list `)`
tc-def ::= `def` bare-id
`(`tensor-def-list`)` `->` `(` tensor-def-list`)`
(tc-attr-def)?
```
For example,
```
ods_def<SomeCppOp>
def some_op(...) -> (...)
attr(
f32_attr: f32,
i32_attr: i32,
array_attr : f32[],
optional_attr? : f32
)
```
where `?` means optional attribute and `[]` means array type.
Reviewed By: hanchung, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94240
This commit moves dangling ops in the main ops.td file to the proper
file matching their categories. This makes ops.td as purely including
all category files.
Differential Revision: https://reviews.llvm.org/D94413
Use TableGen and information in ACC.td for the Default enum in the OpenACC dialect.
This patch generalize what was done for OpenMP for directives.
Follow up patch after D93576
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D93710
This revision uniformizes fusion APIs to allow passing OpOperand, OpResult and adds a finer level of control fusion.
Differential Revision: https://reviews.llvm.org/D94493
Continue the convergence between LLVM dialect and built-in types by using the
built-in vector type whenever possible, that is for fixed vectors of built-in
integers and built-in floats. LLVM dialect vector type is still in use for
pointers, less frequent floating point types that do not have a built-in
equivalent, and scalable vectors. However, the top-level `LLVMVectorType` class
has been removed in favor of free functions capable of inspecting both built-in
and LLVM dialect vector types: `LLVM::getVectorElementType`,
`LLVM::getNumVectorElements` and `LLVM::getFixedVectorType`. Additional work is
necessary to design an implemented the extensions to built-in types so as to
remove the `LLVMFixedVectorType` entirely.
Note that the default output format for the built-in vectors does not have
whitespace around the `x` separator, e.g., `vector<4xf32>` as opposed to the
LLVM dialect vector type format that does, e.g., `!llvm.vec<4 x fp128>`. This
required changing the FileCheck patterns in several tests.
Reviewed By: mehdi_amini, silvas
Differential Revision: https://reviews.llvm.org/D94405
getDynOperands behavior is commonly used in a number of passes. Refactored to
use a helper function and avoid code reuse.
Differential Revision: https://reviews.llvm.org/D94340
Based on the comments in lib/Parser/TypeParser.cpp on the
parseMemRefType and parseTensorType functions.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D94262
This ensures the memref base + indices expression is well-formed
Reviewed By: ThomasRaoux, ftynse
Differential Revision: https://reviews.llvm.org/D94441
This reverts commit df86f15f0c.
The gcc-5 build was broken by this change:
mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1275:77: required from here
/usr/include/c++/5/ext/new_allocator.h:120:4: error: no matching function for call to 'std::pair<const std::__cxx11::basic_string<char>, {anonymous}::TCParser::RegisteredAttr>::pair(llvm::StringRef&, {anonymous}::TCParser::RegisteredAttr'
* Registers a small set of sample dialects.
* NFC with respect to existing C-API symbols but some headers have been moved down a level to the Dialect/ sub-directory.
* Adds an additional entry point per dialect that is needed for dynamic discovery/loading.
* See discussion: https://llvm.discourse.group/t/dialects-and-the-c-api/2306/16
Differential Revision: https://reviews.llvm.org/D94370
* We've got significant missing features in order to use most of these effectively (i.e. custom builders, region-based builders).
* We presently also lack a mechanism for actually registering these dialects but they can be use with contexts that allow unregistered dialects for further prototyping.
Differential Revision: https://reviews.llvm.org/D94368
The type tablegen backend now has enough support to represent these types well enough, so we can now move them to be declaratively defined.
Differential Revision: https://reviews.llvm.org/D94275
This allows for specifying additional get/getChecked methods that should be generated on the type, and acts similarly to how OpBuilders work. TypeBuilders have two additional components though:
* InferredContextParam
- Bit indicating that the context parameter of a get method is inferred from one of the builder parameters
* checkedBody
- A code block representing the body of the equivalent getChecked method.
Differential Revision: https://reviews.llvm.org/D94274
This removes the need for OpDefinitionsGen to use raw tablegen API, and will also
simplify adding builders to TypeDefs as well.
Differential Revision: https://reviews.llvm.org/D94273
Lowering of async dialect uses a fixed type converter and therefore does not support lowering non-standard types.
This revision adds a structural conversion so that non-standard types in `!async.value`s can be lowered to LLVM before lowering the async dialect itself.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D94404
This wasn't possible before because there was no support for affine expressions
as maps. Now that this support is available, provide the mechanism for
constructing maps with a layout and inspecting it.
Rework the `get` method on MemRefType in Python to avoid needing an explicit
memory space or layout map. Remove the `get_num_maps`, it is too low-level,
using the length of the now-avaiable pseudo-list of layout maps is more
pythonic.
Depends On D94297
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D94302
Now that the bindings for AffineExpr have been added, add more bindings for
constructing and inspecting AffineMap that consists of AffineExprs.
Depends On D94225
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D94297
This adds the Python bindings for AffineExpr and a couple of utility functions
to the C API. AffineExpr is a top-level context-owned object and is modeled
similarly to attributes and types. It is required, e.g., to build layout maps
of the built-in memref type.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D94225
When fusing tensor_reshape ops with generic/indexed_Generic op, new
linalg.init_tensor operations were created for the `outs` of the fused
op. While correct (technically) it is better to just reshape the
original `outs` operands and rely on canonicalization of init_tensor
-> tensor_reshape to achieve the same effect.
Differential Revision: https://reviews.llvm.org/D93774
This allow more accurate modeling of the side effects and allow dead code
elimination to remove dead transfer ops.
Differential Revision: https://reviews.llvm.org/D94318
Linalg ops are perfect loop nests. When materializing the concrete
loop nest, the default order specified by the Linalg op's iterators
may not be the best for further CodeGen: targets frequently need
to plan the loop order in order to gain better data access. And
different targets can have different preferences. So there should
exist a way to control the order.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D91795
With this, now we can specify a list of attributes on named ops
generated from the spec. The format is defined as
```
attr-id ::= bare-id (`?`)?
attr-typedef ::= type (`[` `]`)?
attr-def ::= attr-id `:` attr-typedef
tc-attr-def ::= `attr` `(` attr-def-list `)`
tc-def ::= `def` bare-id
`(`tensor-def-list`)` `->` `(` tensor-def-list`)`
(tc-attr-def)?
```
For example,
```
ods_def<SomeCppOp>
def some_op(...) -> (...)
attr(
f32_attr: f32,
i32_attr: i32,
array_attr : f32[],
optional_attr? : f32
)
```
where `?` means optional attribute and `[]` means array type.
Reviewed By: hanchung, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94240
Do not cache gpu.async.token type so that the pass can be created before the GPU dialect is registered.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94397
This commit adds support for (de-)serializing SpecConstantOpeation. One
thing worth noting is that during deserialization, we assign a fake ID to
enclosed ops inside SpecConstantOpeation. We need to do this in order
for deserialization logic to properly update ID to value map and to
later reference the created value from the sibling 'spv::YieldOp'.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D93591
This assertion is an old remnant from earlier days when only affine functions existed.
It is not the place of affine map composition to check whether orthogonal considerations
on what is allowed to be a symbol under the AffineScope trait.
mlir-opt supports many more test cases. All available dialects supported by latest mlir-opt should be coverted in the test case.
Reviewed By: aartbik, stephenneuendorffer, mehdi_amini
Differential Revision: https://reviews.llvm.org/D93821
This change makes the scatter/gather syntax more consistent with
the syntax of all the other memory operations in the Vector dialect
(order of types, use of [] for index, etc.). This will make the MLIR
code easier to read. In addition, the pass_thru parameter of the
gather has been made mandatory (there is very little benefit in
using the implicit "undefined" values).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94352
The dialect conversion framework was enhanced to handle type
conversion automatically. OpConversionPattern already contains
a pointer to the TypeConverter. There is no need to duplicate it
in a separate subclass. This removes the only reason for a
SPIRVOpLowering subclass. It adapts to use core infrastructure
and simplifies the code.
Also added a utility function to OpConversionPattern for getting
TypeConverter as a certain subclass.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D94080
Large integers are generated in Circt commonly which exceed 4kbits. This aligns the maximum bitwidth in MLIR and LLVM.
Reviewed By: rriddle, lattner, mehdi_amini
Differential Revision: https://reviews.llvm.org/D94116
Adding the ability to index the base address brings these operations closer
to the transfer read and write semantics (with lowering advantages), ensures
more consistent use in vector MLIR code (easier to read), and reduces the
amount of code duplication to lower memrefs into base addresses considerably
(making codegen less error-prone).
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D94278
This revision adds a new `initialize(MLIRContext *)` hook to passes that allows for them to initialize any heavy state before the first execution of the pass. A concrete use case of this is with patterns that rely on PDL, given that PDL is compiled at run time it is imperative that compilation results are cached as much as possible. The first use of this hook is in the Canonicalizer, which has the added benefit of reducing the number of expensive accesses to the context when collecting patterns.
Differential Revision: https://reviews.llvm.org/D93147
Use custom mlir runner init/destroy functions to safely init and destroy shared libraries loaded by the JitRunner.
This mechanism is ignored for Windows builds (for now) because init/destroy functions are not exported, and library unloading relies on static destructors.
Re-submit https://reviews.llvm.org/D94270 with a temporary workaround for windows
Differential Revision: https://reviews.llvm.org/D94312
Change the implementation of LinalgOp with TensorReshapeOp by
expansion to be more modular and easier to follow.
Differential Revision: https://reviews.llvm.org/D93748