Add folder for the case where ExtractStridedSliceOp source comes from a chain
of InsertStridedSliceOp. Also add a folder for the trivial case where the
ExtractStridedSliceOp is a no-op.
Differential Revision: https://reviews.llvm.org/D89850
This patch provides C API for MLIR affine expression.
- Implement C API for methods of AffineExpr class.
- Implement C API for methods of derived classes (AffineBinaryOpExpr, AffineDimExpr, AffineSymbolExpr, and AffineConstantExpr).
Differential Revision: https://reviews.llvm.org/D89856
Added optimization pass to convert heap-based allocs to stack-based allocas in
buffer placement. Added the corresponding test file.
Differential Revision: https://reviews.llvm.org/D89688
Before this change, we would run `maxIterations` if the first iteration changed the op.
After this change, we exit the loop as soon as an iteration hasn't changed the op.
Assuming that we have reached a fixed point when an iteration doesn't change the op, this doesn't affect correctness.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D89981
This reverts commit 4986d5eaff with
proper patches to CMakeLists.txt:
- Add MLIRAsync as a dependency to MLIRAsyncToLLVM
- Add Coroutines as a dependency to MLIRExecutionEngine
Lower from Async dialect to LLVM by converting async regions attached to `async.execute` operations into LLVM coroutines (https://llvm.org/docs/Coroutines.html):
1. Outline all async regions to functions
2. Add LLVM coro intrinsics to mark coroutine begin/end
3. Use MLIR conversion framework to convert all remaining async types and ops to LLVM + Async runtime function calls
All `async.await` operations inside async regions converted to coroutine suspension points. Await operation outside of a coroutine converted to the blocking wait operations.
Implement simple runtime to support concurrent execution of coroutines.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89292
Forward missing attributes when creating the new transfer op otherwise the
builder would use default values.
Differential Revision: https://reviews.llvm.org/D89907
* Adds a new MlirOpPrintingFlags type and supporting accessors.
* Adds a new mlirOperationPrintWithFlags function.
* Adds a full featured python Operation.print method with all options and the ability to print directly to files/stdout in text or binary.
* Adds an Operation.get_asm which delegates to print and returns a str or bytes.
* Reworks Operation.__str__ to be based on get_asm.
Differential Revision: https://reviews.llvm.org/D89848
A "structural" type conversion is one where the underlying ops are
completely agnostic to the actual types involved and simply need to update
their types. An example of this is shape.assuming -- the shape.assuming op
and the corresponding shape.assuming_yield op need to update their types
accordingly to the TypeConverter, but otherwise don't care what type
conversions are happening.
Also, the previous conversion code would not correctly materialize
conversions for the shape.assuming_yield op. This should have caused a
verification failure, but shape.assuming's verifier wasn't calling
RegionBranchOpInterface::verifyTypes (which for reasons can't be called
automatically as part of the trait verification, and requires being
called manually). This patch also adds that verification.
Differential Revision: https://reviews.llvm.org/D89833
A "structural" type conversion is one where the underlying ops are
completely agnostic to the actual types involved and simply need to update
their types. An example of this is scf.if -- the scf.if op and the
corresponding scf.yield ops need to update their types accordingly to the
TypeConverter, but otherwise don't care what type conversions are happening.
To test the structural type conversions, it is convenient to define a
bufferize pass for a dialect, which exercises them nicely.
Differential Revision: https://reviews.llvm.org/D89757
Docstrings for `__str__` method in many classes was recycling the constant
string defined for `Type`, without being types themselves. Use proper
docstrings instead. Since they are succint, use string literals instead of
top-level constants to avoid further mistakes.
Differential Revision: https://reviews.llvm.org/D89780
The pybind class typedef for concrete attribute classes was erroneously
deriving all of them from PyAttribute instead of the provided base class. This
has not been triggering any error because only one level of the hierarchy is
currently exposed.
Differential Revision: https://reviews.llvm.org/D89779
Values are ubiquitous in the IR, in particular block argument and operation
results are Values. Define Python classes for BlockArgument, OpResult and their
common ancestor Value. Define pseudo-container classes for lists of block
arguments and operation results, and use these containers to access the
corresponding values in blocks and operations.
Differential Revision: https://reviews.llvm.org/D89778
This still satisfies the constraints required by the affine dialect and
gives more flexibility in what iteration bounds can be used when
loewring to the GPU dialect.
Differential Revision: https://reviews.llvm.org/D89782
The Value hierarchy consists of BlockArgument and OpResult, both of which
derive Value. Introduce IsA functions and functions specific to each class,
similarly to other class hierarchies. Also, introduce functions for
pointer-comparison of Block and Operation that are necessary for testing and
are generally useful.
Reviewed By: stellaraccident, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89714
* Interops with Python buffers/numpy arrays to create.
* Also cleans up 'get' factory methods on some types to be consistent.
* Adds mlirAttributeGetType() to C-API to facilitate error handling and other uses.
* Punts on a lot of features of the ElementsAttribute hierarchy for now.
* Does not yet support bool or string attributes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D89363
Now, convert-shape-to-std doesn't internally create memrefs, which was
previously a bit of a layering violation. The conversion to memrefs
should logically happen as part of bufferization.
Differential Revision: https://reviews.llvm.org/D89669
It's unfortunate that this requires adding a dependency on scf dialect
to std bufferization (and hence all of std transforms). This is a bit
perilous. We might want a lib/Transforms/Bufferize/ with a separate
bufferization library per dialect?
Differential Revision: https://reviews.llvm.org/D89667
The current BufferPlacement transformation contains several concepts for
hoisting allocations. However, more advanced hoisting techniques should not be
integrated into the BufferPlacement transformation. Hence, this CL refactors the
current BufferPlacement pass into three separate pieces: BufferDeallocation and
BufferAllocation(Loop)Hoisting. Moreover, it extends the hoisting functionality
by allowing to move allocations out of loops.
Differential Revision: https://reviews.llvm.org/D87756
Usage of nested parallel regions were not working correctly and leading
to assertion failures. Fix contains the following changes,
1) Don't set the insertion point in the body callback.
2) Save the continuation IP in a stack and set the branch to
continuationIP at the terminator.
Reviewed By: SouraVX, jdoerfert, ftynse
Differential Revision: https://reviews.llvm.org/D88720
AllReduceLowering is currently the only GPU rewrite pattern, but more are coming. This is a preparation change.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89370
This transforms the symbol lookups to O(1) from O(NM), greatly speeding up both passes. For a large MLIR module this shaved seconds off of the compilation time.
Differential Revision: https://reviews.llvm.org/D89522
The initial goal of this interface is to fix the current problems with verifying symbol user operations, but can extend beyond that in the future. The current problems with the verification of symbol uses are:
* Extremely inefficient:
Most current symbol users perform the symbol lookup using the slow O(N) string compare methods, which can lead to extremely long verification times in large modules.
* Invalid/break the constraints of verification pass
If the symbol reference is not-flat(and even if it is flat in some cases) a verifier for an operation is not permitted to touch the referenced operation because it may be in the process of being mutated by a different thread within the pass manager.
The new SymbolUserOpInterface exposes a method `verifySymbolUses` that will be invoked from the parent symbol table to allow for verifying the constraints of any referenced symbols. This method is passed a `SymbolTableCollection` to allow for O(1) lookups of any necessary symbol operation.
Differential Revision: https://reviews.llvm.org/D89512
This revision contains two optimizations related to symbol checking:
* Optimize SymbolOpInterface to only check for a name attribute if the operation is an optional symbol.
This removes an otherwise unnecessary attribute lookup from a majority of symbols.
* Add a new SymbolTableCollection class to represent a collection of SymbolTables.
This allows for perfoming non-flat symbol lookups in O(1) time by caching SymbolTables for symbol table operations. This class is very useful for algorithms that operate on multiple symbol tables, either recursively or not.
Differential Revision: https://reviews.llvm.org/D89505
(Note: This is a reland of D82597)
This class allows for defining thread local objects that have a set non-static lifetime. This internals of the cache use a static thread_local map between the various different non-static objects and the desired value type. When a non-static object destructs, it simply nulls out the entry in the static map. This will leave an entry in the map, but erase any of the data for the associated value. The current use cases for this are in the MLIRContext, meaning that the number of items in the static map is ~1-2 which aren't particularly costly enough to warrant the complexity of pruning. If a use case arises that requires pruning of the map, the functionality can be added.
This is especially useful in the context of MLIR for implementing thread-local caching of context level objects that would otherwise have very high lock contention. This revision adds a thread local cache in the MLIRContext for attributes, identifiers, and types to reduce some of the locking burden. This led to a speedup of several seconds when compiling a somewhat large mlir module.
Differential Revision: https://reviews.llvm.org/D89504
This trait simply adds a fold of f(f(x)) = f(x) when an operation is labelled as idempotent
Reviewed By: rriddle, andyly
Differential Revision: https://reviews.llvm.org/D89421
* Also fixes the const-ness of the various DenseElementsAttr construction functions.
* Both issues identified when trying to use the DenseElementsAttr functions.
Differential Revision: https://reviews.llvm.org/D89517
Added an underlying matcher for generic constant ops. This
included a rewriter of RewriterGen to make variable use more
clear.
Differential Revision: https://reviews.llvm.org/D89161