* Finishes support for Context, InsertionPoint and Location to be carried by the thread using context managers.
* Introduces type casters and utilities so that DefaultPyMlirContext and DefaultPyLocation in method signatures does the right thing (allows explicit or gets from the thread context).
* Extend the rules for the thread context stack to handle nesting, appropriately inheriting and clearing depending on whether the context is the same.
* Refactors all method signatures to follow the new convention on trailing parameters for defaulting parameters (loc, ip, context). When the objects are carried in the thread context, this allows most explicit uses of these values to be elided.
* Removes the style guide section on putting accessors to construct global objects on the PyMlirContext: this style fails to make good use of the new facility since it is often the only thing remaining needing an MlirContext.
* Moves Module parse/creation from mlir.ir.Context to static methods on mlir.ir.Module.
* Moves Context.create_operation to a static Operation.create method.
* Moves Type parsing from mlir.ir.Context to static methods on mlir.ir.Type.
* Moves Attribute parsing from mlir.ir.Context to static methods on mlir.ir.Attribute.
* Move Location factory methods from mlir.ir.Context to static methods on mlir.ir.Location.
* Refactors the std dialect fake "ODS" generated code to take advantage of the new scheme.
Differential Revision: https://reviews.llvm.org/D90547
This pass allows removing getResultConversionKind from
BufferizeTypeConverter. This pass replaces the AppendToArgumentsList
functionality. As far as I could tell, the only use of this functionlity
is to perform the transformation that is implemented in this pass.
Future patches will remove the getResultConversionKind machinery from
BufferizeTypeConverter, but sending this patch for individual review for
clarity.
Differential Revision: https://reviews.llvm.org/D90071
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90477
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90477
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90022
This op returns a boolean value indicating whether 2 ops are
broadcastable or not. This follows the same logic as the other ops with
broadcast in their names in the shape dialect.
Concretely, shape.is_broadcastable returning true implies that
shape.broadcast will not give an error, and shape.cstr_broadcastable
will not result in an assertion failure. Similarly, false implies an
error or assertion failure.
Previously they were separated into "instance" and "kind" aliases, and also required that the dialect know ahead of time all of the instances that would have a corresponding alias. This approach was very clunky and not ergonomic to interact with. The new approach is to provide the dialect with an instance of an attribute/type to provide an alias for, fully replacing the original split approach.
Differential Revision: https://reviews.llvm.org/D89354
* Removes index based insertion. All insertion now happens through the insertion point.
* Introduces thread local context managers for implicit creation relative to an insertion point.
* Introduces (but does not yet use) binding the Context to the thread local context stack. Intent is to refactor all methods to take context optionally and have them use the default if available.
* Adds C APIs for mlirOperationGetParentOperation(), mlirOperationGetBlock() and mlirBlockGetTerminator().
* Removes an assert in PyOperation creation that was incorrectly constraining. There is already a TODO to rework the keepAlive field that it was guarding and without the assert, it is no worse than the current state.
Differential Revision: https://reviews.llvm.org/D90368
Fix semantic in the distribute integration test based on offline feedback. This
exposed a bug in block distribution, we need to make sure the id is multiplied
by the stride of the vector. Fix the transformation and unit test.
Differential Revision: https://reviews.llvm.org/D89291
This is a roll-forward of rGec7780ebdab4, now that the remaining
gpu.launch_func have been converted to custom form in rGb22f111023ba.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90420
This should fix the reason for the failures after ec7780ebda. I will roll forward in a separate change.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90410
Linalg "tile-and-fuse" is currently exposed as a Linalg pass "-linalg-fusion" but only the mechanics of the transformation are currently relevant.
Instead turn it into a "-test-linalg-greedy-fusion" pass which performs canonicalizations to enable more fusions to compose.
This allows dropping the OperationFolder which is not meant to be used with the pattern rewrite infrastructure.
Differential Revision: https://reviews.llvm.org/D90394
Update op is modelling the update directive (2.14.4) from the OpenACC specs.
An if condition and a device_type list can be attached to the directive. This patch add
these two information to the current op.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90310
Often times the legality of inlining can change depending on if the callable is going to be inlined in-place, or cloned. For example, some operations are not allowed to be duplicated and can only be inlined if the original callable will cease to exist afterwards. The new `wouldBeCloned` flag allows for dialects to hook into this when determining legality.
Differential Revision: https://reviews.llvm.org/D90360
In certain situations it isn't legal to inline a call operation, but this isn't something that is possible(at least not easily) to prevent with the current hooks. This revision adds a new hook so that dialects with call operations that shouldn't be inlined can prevent it.
Differential Revision: https://reviews.llvm.org/D90359
This patch fixes a bug [[ https://bugs.llvm.org/show_bug.cgi?id=46091 | 46091 ]]
Raw data for the `dense-element attribute` is written in little endian (LE) format.
This commit converts the format to big endian (BE) in ʻAttribute Parser` on the
BE machine. Also, when outputting on a BE machine, the BE format is converted
to LE in "AsmPrinter".
Differential Revision: https://reviews.llvm.org/D80695
This revision optimizes the parsing of hex strings by using the checked variant of llvm::fromHex, and adding a specialized method to Token for extracting hex strings. This leads a large decrease in compile time when parsing large hex constants (one example: 2.6 seconds -> 370 miliseconds)
Differential Revision: https://reviews.llvm.org/D90266
This fixes a subtle issue, described in the comment starting with
"Clone the op without the regions and inline the regions from the old op",
which prevented this conversion from working on non-trivial examples.
Differential Revision: https://reviews.llvm.org/D90203
Getting the body of a Module is a common need which justifies a
dedicated accessor instead of forcing users to go through the
region->blocks->front unwrapping manually.
Differential Revision: https://reviews.llvm.org/D90287
This patch adds support for fusing linalg.indexed_generic op with
linalg.tensor_reshape op by expansion, i.e.
- linalg.indexed_generic op -> linalg.tensor_reshape op when the
latter is expanding.
- linalg.tensor_reshape op -> linalg.indexed_generic op when the
former is folding.
Differential Revision: https://reviews.llvm.org/D90082
* Still rough edges that need more sugar but the bones are there. Notes left in the test case for things that can be improved.
* Does not actually yield custom OpViews yet for traversing. Will rework that in a followup.
Differential Revision: https://reviews.llvm.org/D89932
A recent commit introduced a new syntax for specifying builder arguments in
ODS, which is better amenable to automated processing, and deprecated the old
form. Transition all dialects as well as Linalg ODS generator to use the new
syntax.
Add a deprecation notice to ODS generator.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D90038
This class represents a rewrite pattern list that has been frozen, and thus immutable. This replaces the uses of OwningRewritePatternList in pattern driver related API, such as dialect conversion. When PDL becomes more prevalent, this API will allow for optimizing a set of patterns once without the need to do this per run of a pass.
Differential Revision: https://reviews.llvm.org/D89104
There are several pieces of pattern rewriting infra in IR/ that really shouldn't be there. This revision moves those pieces to a better location such that they are easier to evolve in the future(e.g. with PDL). More concretely this revision does the following:
* Create a Transforms/GreedyPatternRewriteDriver.h and move the apply*andFold methods there.
The definitions for these methods are already in Transforms/ so it doesn't make sense for the declarations to be in IR.
* Create a new lib/Rewrite library and move PatternApplicator there.
This new library will be focused on applying rewrites, and will also include compiling rewrites with PDL.
Differential Revision: https://reviews.llvm.org/D89103
The Pattern class was originally intended to be used for solely matching operations, but that use never materialized. All of the pattern infrastructure uses RewritePattern, and the infrastructure for pure matching(Matchers.h) is implemented inline. This means that this class isn't a useful abstraction at the moment, so this revision refactors it to solely encapsulate the "metadata" of a pattern. The metadata includes the various state describing a pattern; benefit, root operation, etc. The API on PatternApplicator is updated to now operate on `Pattern`s as nothing special from `RewritePattern` is necessary.
This refactoring is also necessary for the upcoming use of PDL patterns alongside C++ rewrite patterns.
Differential Revision: https://reviews.llvm.org/D86258
The conversion between PDL and the interpreter is split into several different parts.
** The Matcher:
The matching section of all incoming pdl.pattern operations is converted into a predicate tree and merged. Each pattern is first converted into an ordered list of predicates starting from the root operation. A predicate is composed of three distinct parts:
* Position
- A position refers to a specific location on the input DAG, i.e. an
existing MLIR entity being matched. These can be attributes, operands,
operations, results, and types. Each position also defines a relation to
its parent. For example, the operand `[0] -> 1` has a parent operation
position `[0]` (the root).
* Question
- A question refers to a query on a specific positional value. For
example, an operation name question checks the name of an operation
position.
* Answer
- An answer is the expected result of a question. For example, when
matching an operation with the name "foo.op". The question would be an
operation name question, with an expected answer of "foo.op".
After the predicate lists have been created and ordered(based on occurrence of common predicates and other factors), they are formed into a tree of nodes that represent the branching flow of a pattern match. This structure allows for efficient construction and merging of the input patterns. There are currently only 4 simple nodes in the tree:
* ExitNode: Represents the termination of a match
* SuccessNode: Represents a successful match of a specific pattern
* BoolNode/SwitchNode: Branch to a specific child node based on the expected answer to a predicate question.
Once the matcher tree has been generated, this tree is walked to generate the corresponding interpreter operations.
** The Rewriter:
The rewriter portion of a pattern is generated in a very straightforward manor, similarly to lowerings in other dialects. Each PDL operation that may exist within a rewrite has a mapping into the interpreter dialect. The code for the rewriter is generated within a FuncOp, that is invoked by the interpreter on a successful pattern match. Referenced values defined in the matcher become inputs the generated rewriter function.
An example lowering is shown below:
```mlir
// The following high level PDL pattern:
pdl.pattern : benefit(1) {
%resultType = pdl.type
%inputOperand = pdl.input
%root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
pdl.rewrite %root {
pdl.replace %root with (%inputOperand)
}
}
// is lowered to the following:
module {
// The matcher function takes the root operation as an input.
func @matcher(%arg0: !pdl.operation) {
pdl_interp.check_operation_name of %arg0 is "foo.op" -> ^bb2, ^bb1
^bb1:
pdl_interp.return
^bb2:
pdl_interp.check_operand_count of %arg0 is 1 -> ^bb3, ^bb1
^bb3:
pdl_interp.check_result_count of %arg0 is 1 -> ^bb4, ^bb1
^bb4:
%0 = pdl_interp.get_operand 0 of %arg0
pdl_interp.is_not_null %0 : !pdl.value -> ^bb5, ^bb1
^bb5:
%1 = pdl_interp.get_result 0 of %arg0
pdl_interp.is_not_null %1 : !pdl.value -> ^bb6, ^bb1
^bb6:
// This operation corresponds to a successful pattern match.
pdl_interp.record_match @rewriters::@rewriter(%0, %arg0 : !pdl.value, !pdl.operation) : benefit(1), loc([%arg0]), root("foo.op") -> ^bb1
}
module @rewriters {
// The inputs to the rewriter from the matcher are passed as arguments.
func @rewriter(%arg0: !pdl.value, %arg1: !pdl.operation) {
pdl_interp.replace %arg1 with(%arg0)
pdl_interp.return
}
}
}
```
Differential Revision: https://reviews.llvm.org/D84580
Adds support for
- Dropping unit dimension loops for indexed_generic ops.
- Folding consecutive folding (or expanding) reshapes when the result
(or src) is a scalar.
- Fixes to indexed_generic -> generic fusion when zero-dim tensors are
involved.
Differential Revision: https://reviews.llvm.org/D90118
The alignment attribute in the 'alloca' op treats the '0' value as 'unset'.
When parsing the custom form of the 'alloca' op, ignore the alignment attribute
with if its value is '0' instead of actually creating it and producing a
slightly different textually yet equivalent semantically form in the output.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90179
Based on discourse discussion, fix the doc string and remove examples with
wrong semantic. Also fix insert_map semantic by adding missing operand for
vector we are inserting into.
Differential Revision: https://reviews.llvm.org/D89563
This revision allows the fusion of the producer of input tensors in the consumer under a tiling transformation (which produces subtensors).
Many pieces are still missing (e.g. support init_tensors, better refactor LinalgStructuredOp interface support, try to merge implementations and reuse code) but this still allows getting started.
The greedy pass itself is just for testing purposes and will be extracted in a separate test pass.
Differential revision: https://reviews.llvm.org/D89491
This patch introduces a SPIR-V runner. The aim is to run a gpu
kernel on a CPU via GPU -> SPIRV -> LLVM conversions. This is a first
prototype, so more features will be added in due time.
- Overview
The runner follows similar flow as the other runners in-tree. However,
having converted the kernel to SPIR-V, we encode the bind attributes of
global variables that represent kernel arguments. Then SPIR-V module is
converted to LLVM. On the host side, we emulate passing the data to device
by creating in main module globals with the same symbolic name as in kernel
module. These global variables are later linked with ones from the nested
module. We copy data from kernel arguments to globals, call the kernel
function from nested module and then copy the data back.
- Current state
At the moment, the runner is capable of running 2 modules, nested one in
another. The kernel module must contain exactly one kernel function. Also,
the runner supports rank 1 integer memref types as arguments (to be scaled).
- Enhancement of JitRunner and ExecutionEngine
To translate nested modules to LLVM IR, JitRunner and ExecutionEngine were
altered to take an optional (default to `nullptr`) function reference that
is a custom LLVM IR module builder. This allows to customize LLVM IR module
creation from MLIR modules.
Reviewed By: ftynse, mravishankar
Differential Revision: https://reviews.llvm.org/D86108
This patch introduces a pass for running
`mlir-spirv-cpu-runner` - LowerHostCodeToLLVMPass.
This pass emulates `gpu.launch_func` call in LLVM dialect and lowers
the host module code to LLVM. It removes the `gpu.module`, creates a
sequence of global variables that are later linked to the varables
in the kernel module, as well as a series of copies to/from
them to emulate the memory transfer to/from the host or to/from the
device sides. It also converts the remaining Standard dialect into
LLVM dialect, emitting C wrappers.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86112
The current pattern for vector unrolling takes the native shape to
unroll to at pattern instantiation time, but the native shape might
defer based on the types of the operand. Introduce a
UnrollVectorOptions struct which allows for using a function that will
return the native shape based on the operation. Move other options of
unrolling like `filterConstraints` into this struct.
Differential Revision: https://reviews.llvm.org/D89744
Add folder for the case where ExtractStridedSliceOp source comes from a chain
of InsertStridedSliceOp. Also add a folder for the trivial case where the
ExtractStridedSliceOp is a no-op.
Differential Revision: https://reviews.llvm.org/D89850
This patch provides C API for MLIR affine expression.
- Implement C API for methods of AffineExpr class.
- Implement C API for methods of derived classes (AffineBinaryOpExpr, AffineDimExpr, AffineSymbolExpr, and AffineConstantExpr).
Differential Revision: https://reviews.llvm.org/D89856
Added optimization pass to convert heap-based allocs to stack-based allocas in
buffer placement. Added the corresponding test file.
Differential Revision: https://reviews.llvm.org/D89688
This reverts commit 4986d5eaff with
proper patches to CMakeLists.txt:
- Add MLIRAsync as a dependency to MLIRAsyncToLLVM
- Add Coroutines as a dependency to MLIRExecutionEngine
Lower from Async dialect to LLVM by converting async regions attached to `async.execute` operations into LLVM coroutines (https://llvm.org/docs/Coroutines.html):
1. Outline all async regions to functions
2. Add LLVM coro intrinsics to mark coroutine begin/end
3. Use MLIR conversion framework to convert all remaining async types and ops to LLVM + Async runtime function calls
All `async.await` operations inside async regions converted to coroutine suspension points. Await operation outside of a coroutine converted to the blocking wait operations.
Implement simple runtime to support concurrent execution of coroutines.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89292
Forward missing attributes when creating the new transfer op otherwise the
builder would use default values.
Differential Revision: https://reviews.llvm.org/D89907
* Adds a new MlirOpPrintingFlags type and supporting accessors.
* Adds a new mlirOperationPrintWithFlags function.
* Adds a full featured python Operation.print method with all options and the ability to print directly to files/stdout in text or binary.
* Adds an Operation.get_asm which delegates to print and returns a str or bytes.
* Reworks Operation.__str__ to be based on get_asm.
Differential Revision: https://reviews.llvm.org/D89848
A "structural" type conversion is one where the underlying ops are
completely agnostic to the actual types involved and simply need to update
their types. An example of this is shape.assuming -- the shape.assuming op
and the corresponding shape.assuming_yield op need to update their types
accordingly to the TypeConverter, but otherwise don't care what type
conversions are happening.
Also, the previous conversion code would not correctly materialize
conversions for the shape.assuming_yield op. This should have caused a
verification failure, but shape.assuming's verifier wasn't calling
RegionBranchOpInterface::verifyTypes (which for reasons can't be called
automatically as part of the trait verification, and requires being
called manually). This patch also adds that verification.
Differential Revision: https://reviews.llvm.org/D89833
A "structural" type conversion is one where the underlying ops are
completely agnostic to the actual types involved and simply need to update
their types. An example of this is scf.if -- the scf.if op and the
corresponding scf.yield ops need to update their types accordingly to the
TypeConverter, but otherwise don't care what type conversions are happening.
To test the structural type conversions, it is convenient to define a
bufferize pass for a dialect, which exercises them nicely.
Differential Revision: https://reviews.llvm.org/D89757
Historically, custom builder specification in OpBuilder has been accepting the
formal parameter list for the builder method as a raw string containing C++.
While this worked well to connect the signature and the body, this became
problematic when ODS needs to manipulate the parameter list, e.g. to inject
OpBuilder or to trim default values when generating the definition. This has
also become inconsistent with other method declarations, in particular in
interface definitions.
Introduce the possibility to define OpBuilder formal parameters using a
TableGen dag similarly to other methods. Additionally, introduce a mechanism to
declare parameters with default values using an additional class. This
mechanism can be reused in other methods. The string-based builder signature
declaration is deprecated and will be removed after a transition period.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D89470
Values are ubiquitous in the IR, in particular block argument and operation
results are Values. Define Python classes for BlockArgument, OpResult and their
common ancestor Value. Define pseudo-container classes for lists of block
arguments and operation results, and use these containers to access the
corresponding values in blocks and operations.
Differential Revision: https://reviews.llvm.org/D89778
This still satisfies the constraints required by the affine dialect and
gives more flexibility in what iteration bounds can be used when
loewring to the GPU dialect.
Differential Revision: https://reviews.llvm.org/D89782
The Value hierarchy consists of BlockArgument and OpResult, both of which
derive Value. Introduce IsA functions and functions specific to each class,
similarly to other class hierarchies. Also, introduce functions for
pointer-comparison of Block and Operation that are necessary for testing and
are generally useful.
Reviewed By: stellaraccident, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89714
* Interops with Python buffers/numpy arrays to create.
* Also cleans up 'get' factory methods on some types to be consistent.
* Adds mlirAttributeGetType() to C-API to facilitate error handling and other uses.
* Punts on a lot of features of the ElementsAttribute hierarchy for now.
* Does not yet support bool or string attributes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D89363
Now, convert-shape-to-std doesn't internally create memrefs, which was
previously a bit of a layering violation. The conversion to memrefs
should logically happen as part of bufferization.
Differential Revision: https://reviews.llvm.org/D89669
It's unfortunate that this requires adding a dependency on scf dialect
to std bufferization (and hence all of std transforms). This is a bit
perilous. We might want a lib/Transforms/Bufferize/ with a separate
bufferization library per dialect?
Differential Revision: https://reviews.llvm.org/D89667
The current BufferPlacement transformation contains several concepts for
hoisting allocations. However, more advanced hoisting techniques should not be
integrated into the BufferPlacement transformation. Hence, this CL refactors the
current BufferPlacement pass into three separate pieces: BufferDeallocation and
BufferAllocation(Loop)Hoisting. Moreover, it extends the hoisting functionality
by allowing to move allocations out of loops.
Differential Revision: https://reviews.llvm.org/D87756
Usage of nested parallel regions were not working correctly and leading
to assertion failures. Fix contains the following changes,
1) Don't set the insertion point in the body callback.
2) Save the continuation IP in a stack and set the branch to
continuationIP at the terminator.
Reviewed By: SouraVX, jdoerfert, ftynse
Differential Revision: https://reviews.llvm.org/D88720
AllReduceLowering is currently the only GPU rewrite pattern, but more are coming. This is a preparation change.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89370
Have the ODS TypeDef generator write the getChecked() definition.
Also add to TypeParamCommaFormatter a `JustParams` format and
refactor around that.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D89438
This trait simply adds a fold of f(f(x)) = f(x) when an operation is labelled as idempotent
Reviewed By: rriddle, andyly
Differential Revision: https://reviews.llvm.org/D89421
* Also fixes the const-ness of the various DenseElementsAttr construction functions.
* Both issues identified when trying to use the DenseElementsAttr functions.
Differential Revision: https://reviews.llvm.org/D89517
Added an underlying matcher for generic constant ops. This
included a rewriter of RewriterGen to make variable use more
clear.
Differential Revision: https://reviews.llvm.org/D89161
Adding unroll support for transfer read and transfer write operation. This
allows to pick the ideal size for the memory access for a given target.
Differential Revision: https://reviews.llvm.org/D89289
The opposite of tensor_to_memref is tensor_load.
- Add some basic tensor_load/tensor_to_memref folding.
- Add source/target materializations to BufferizeTypeConverter.
- Add an example std bufferization pattern/pass that shows how the
materialiations work together (more std bufferization patterns to come
in subsequent commits).
- In coming commits, I'll document how to write composable
bufferization passes/patterns and update the other in-tree
bufferization passes to match this convention. The populate* functions
will of course continue to be exposed for power users.
The naming on tensor_load/tensor_to_memref and their pretty forms are
not very intuitive. I'm open to any suggestions here. One key
observation is that the memref type must always be the one specified in
the pretty form, since the tensor type can be inferred from the memref
type but not vice-versa.
With this, I've been able to replace all my custom bufferization type
converters in npcomp with BufferizeTypeConverter!
Part of the plan discussed in:
https://llvm.discourse.group/t/what-is-the-strategy-for-tensor-memref-conversion-bufferization/1938/17
Differential Revision: https://reviews.llvm.org/D89437
Parsing of a scalar subview did not create the required static_offsets attribute.
This also adds support for folding scalar subviews away.
Differential Revision: https://reviews.llvm.org/D89467
Each hardware that supports SPV_C_CooperativeMatrixNV has a list of
configurations that are supported natively. Add an attribute to
specify the configurations supported to the `spv.target_env`.
Reviewed By: antiagainst, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D89364
The current fusion on tensors fuses reshape ops with generic ops by
linearizing the indexing maps of the fused tensor in the generic
op. This has some limitations
- It only works for static shapes
- The resulting indexing map has a linearization that would be
potentially prevent fusion later on (for ex. tile + fuse).
Instead, try to fuse the reshape consumer (producer) with generic op
producer (consumer) by expanding the dimensionality of the generic op
when the reshape is expanding (folding). This approach conflicts with
the linearization approach. The expansion method is used instead of
the linearization method.
Further refactoring that changes the fusion on tensors to be a
collection of patterns.
Differential Revision: https://reviews.llvm.org/D89002
This CL allows user to specify the same name for the operands in the source pattern which implicitly enforces equality on operands with the same name.
E.g., Pat<(OpA $a, $b, $a) ... > would create a matching rule for checking equality for the first and the last operands. Equality of the operands is enforced at any depth, e.g., OpA ($a, $b, OpB($a, $c, OpC ($a))).
Example usage: Pat<(Reshape $arg0, (Shape $arg0)), (replaceWithValue $arg0)>
Note, this feature only covers operands but not attributes.
Current use cases are based on the operand equality and explicitly add the constraint into the pattern. Attribute equality will be worked out on the different CL.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D89254
This revision adds a programmable codegen strategy from linalg based on staged rewrite patterns. Testing is exercised on a simple linalg.matmul op.
Differential Revision: https://reviews.llvm.org/D89374
This reverts commit 7271c1bcb9.
This broke the gcc-5 build:
/usr/include/c++/5/ext/new_allocator.h:120:4: error: no matching function for call to 'std::pair<const std::__cxx11::basic_string<char>, mlir::tblgen::SymbolInfoMap::SymbolInfo>::pair(llvm::StringRef&, mlir::tblgen::SymbolInfoMap::SymbolInfo)'
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^
In file included from /usr/include/c++/5/utility:70:0,
from llvm/include/llvm/Support/type_traits.h:18,
from llvm/include/llvm/Support/Casting.h:18,
from mlir/include/mlir/Support/LLVM.h:24,
from mlir/include/mlir/TableGen/Pattern.h:17,
from mlir/lib/TableGen/Pattern.cpp:14:
/usr/include/c++/5/bits/stl_pair.h:206:9: note: candidate: template<class ... _Args1, long unsigned int ..._Indexes1, class ... _Args2, long unsigned int ..._Indexes2> std::pair<_T1, _T2>::pair(std::tuple<_Args1 ...>&, std::tuple<_Args2 ...>&, std::_Index_tuple<_Indexes1 ...>, std::_Index_tuple<_Indexes2 ...>)
pair(tuple<_Args1...>&, tuple<_Args2...>&,
^
Adds a TypeDef class to OpBase and backing generation code. Allows one
to define the Type, its parameters, and printer/parser methods in ODS.
Can generate the Type C++ class, accessors, storage class, per-parameter
custom allocators (for the storage constructor), and documentation.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D86904
This CL allows user to specify the same name for the operands in the source pattern which implicitly enforces equality on operands with the same name.
E.g., Pat<(OpA $a, $b, $a) ... > would create a matching rule for checking equality for the first and the last operands. Equality of the operands is enforced at any depth, e.g., OpA ($a, $b, OpB($a, $c, OpC ($a))).
Example usage: Pat<(Reshape $arg0, (Shape $arg0)), (replaceWithValue $arg0)>
Note, this feature only covers operands but not attributes.
Current use cases are based on the operand equality and explicitly add the constraint into the pattern. Attribute equality will be worked out on the different CL.
Differential Revision: https://reviews.llvm.org/D89254
This is the same diff as https://reviews.llvm.org/D88809/ except side effect
free check is removed for involution and a FIXME is added until the dependency
is resolved for shared builds. The old diff has more details on possible fixes.
Reviewed By: rriddle, andyly
Differential Revision: https://reviews.llvm.org/D89333
Update linalg-to-loops lowering for pooling operations to perform
padding of the input when specified by the corresponding attribute.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D88911
* Extends Context/Operation interning to cover Module as well.
* Implements Module.context, Attribute.context, Type.context, and Location.context back-references (facilitated testing and also on the TODO list).
* Adds method to create an empty Module.
* Discovered missing in npcomp.
Differential Revision: https://reviews.llvm.org/D89294
TensorConstantOp bufferization currently uses the vector dialect to store constant data into memory.
Due to natural vector size and alignment properties, this is problematic with n>1-D vectors whose most minor dimension is not naturally aligned.
Instead, this revision linearizes the constant and introduces a linalg.reshape to go back to the desired shape.
Still this is still to be considered a workaround and a better longer term solution will probably involve `llvm.global`.
Differential Revision: https://reviews.llvm.org/D89311
This combines two separate ops (D88972: `gpu.create_token`, D89043: `gpu.host_wait`) into one.
I do after all like the idea of combining the two ops, because it matches exactly the pattern we are
going to have in the other gpu ops that will implement the AsyncOpInterface (launch_func, copies, alloc):
If the op is async, we return a !gpu.async.token. Otherwise, we synchronize with the host and don't return a token.
The use cases for `gpu.wait async` and `gpu.wait` are further apart than those of e.g. `gpu.h2d async` and `gpu.h2d`,
but I like the consistent meaning of the `async` keyword in GPU ops.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89160
This PR adds support for identified and recursive structs.
This includes: parsing, printing, serializing, and
deserializing such structs.
The following C struct:
```C
struct A {
A* next;
};
```
which is translated to the following MLIR code as:
```mlir
!spv.struct<A, (!spv.ptr<!spv.struct<A>, Generic>)>
```
would be represented in the SPIR-V module as:
```spirv
OpName %A "A"
OpTypeForwardPointer %APtr Generic
%A = OpTypeStruct %APtr
%APtr = OpTypePointer Generic %A
```
In particular the following changes are included:
- SPIR-V structs can now be either identified or literal
(i.e. non-identified).
- All structs now have their members surrounded by a ()-pair.
- For recursive references,
(1) an OpTypeForwardPointer instruction is emitted before
the OpTypeStruct instruction defining the recursive struct
(2) an OpTypePointer instruction is emitted after the
OpTypeStruct instruction which actually defines the recursive
pointer to struct type.
Reviewed By: antiagainst, rriddle, ftynse
Differential Revision: https://reviews.llvm.org/D87206
* Links against libMLIR.so if the project is built for DYLIBs.
* Puts things in the right place in build and install time python/ trees so that RPaths line up.
* Adds install actions to install both the extension and sources.
* Copies py source files to the build directory to match (consistent layout between build/install time and one place to point a PYTHONPATH for tests and interactive use).
* Finally, "import mlir" from an installed LLVM just works.
Differential Revision: https://reviews.llvm.org/D89167
This revision introduces support for buffer allocation for any named linalg op.
To avoid template instantiating many ops, a new ConversionPattern is created to capture the LinalgOp interface.
Some APIs are updated to remain consistent with MLIR style:
`OwningRewritePatternList * -> OwningRewritePatternList &`
`BufferAssignmentTypeConverter * -> BufferAssignmentTypeConverter &`
Differential revision: https://reviews.llvm.org/D89226
The buffer placement preparation tests in
test/Transforms/buffer-placement-preparation* are using Linalg as a test
dialect which leads to confusion and "copy-pasta", i.e. Linalg is being
extended now and when TensorsToBuffers.cpp is changed, TestBufferPlacement is
sometimes kept in-sync, which should not be the case.
This has led to the unnoticed bug, because the tests were in a different directory and the patterns were slightly off.
Differential Revision: https://reviews.llvm.org/D89209
This patch introduces the acc.enter_data operation that represents an OpenACC Enter Data directive.
Operands and attributes are dervied from clauses in the spec 2.6.6.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D88941
This is required or broadcasting with operands of different ranks will lead to
failures as the select op requires both possible outputs and its output type to
be the same.
Differential Revision: https://reviews.llvm.org/D89134
The patch adds a canonicalization pattern that removes the unused results of scf.if operation. As a result, cse may remove unused computations in the then and else regions of the scf.if operation.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D89029
This patch introduces the acc.exit_data operation that represents an OpenACC Exit Data directive.
Operands and attributes are derived from clauses in the spec 2.6.6.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D88969
They are currently marked as unsupported when windows is part of the triple, but they actually fail when they are run on Windows, so they are unsupported on system-windows
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D89169
Async execute operation can take async arguments as dependencies.
Change `async.execute` custom parser/printer format to use `%value as %unwrapped: !async.value<!type>` sytax.
Reviewed By: mehdi_amini, herhut
Differential Revision: https://reviews.llvm.org/D88601
The updated version of kernel outlining did not handle cases correctly
where an operand of a candidate for sinking itself was defined by an operation
that is a sinking candidate. In such cases, it could happen that sunk
operations were inserted in the wrong order, breaking ssa properties.
Differential Revision: https://reviews.llvm.org/D89112
This reverts commit 1ceaffd95a.
The build is broken with -DBUILD_SHARED_LIBS=ON ; seems like a possible
layering issue to investigate:
tools/mlir/lib/IR/CMakeFiles/obj.MLIRIR.dir/Operation.cpp.o: In function `mlir::MemoryEffectOpInterface::hasNoEffect(mlir::Operation*)':
Operation.cpp:(.text._ZN4mlir23MemoryEffectOpInterface11hasNoEffectEPNS_9OperationE[_ZN4mlir23MemoryEffectOpInterface11hasNoEffectEPNS_9OperationE]+0x9c): undefined reference to `mlir::MemoryEffectOpInterface::getEffects(llvm::SmallVectorImpl<mlir::SideEffects::EffectInstance<mlir::MemoryEffects::Effect> >&)'
This change allows folds to be done on a newly introduced involution trait rather than having to manually rewrite this optimization for every instance of an involution
Reviewed By: rriddle, andyly, stephenneuendorffer
Differential Revision: https://reviews.llvm.org/D88809
When distributing a vector larger than the given multiplicity, we can
distribute it by block where each id gets a chunk of consecutive element
along the dimension distributed. This adds a test for this case and adds extra
checks to make sure we don't distribute for cases not multiple of multiplicity.
Differential Revision: https://reviews.llvm.org/D89061
This revision also inserts an end-to-end test that lowers tensors to buffers all the way to executable code on CPU.
Differential revision: https://reviews.llvm.org/D88998
The simplest case is when the indexing maps are DimIds in every component. This covers cwise ops.
Also:
* Expose populateConvertLinalgOnTensorsToBuffersPatterns in Transforms.h
* Expose emitLoopRanges in Transforms.h
Differential Revision: https://reviews.llvm.org/D88781
Added missing strides check to verification method of rank reducing subview
which enforces strides specification for the resulting type.
Differential Revision: https://reviews.llvm.org/D88879
* New functions: mlirOperationSetAttributeByName, mlirOperationRemoveAttributeByName
* Also adds some *IsNull checks and standardizes the rest to use "static inline" form, which makes them all non-opaque and not part of the ABI (which is desirable).
* Changes needed to resolve TODOs in npcomp PyTorch capture.
Differential Revision: https://reviews.llvm.org/D88946
Add basic support for registering diagnostic handlers with the context
(actually, the diagnostic engine contained in the context) and processing
diagnostic messages from the C API.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D88736
Add conversion pass for Vector dialect to SPIR-V dialect and add some simple
conversion pattern for vector.broadcast, vector.insert, vector.extract.
Differential Revision: https://reviews.llvm.org/D88761
Combine ExtractOp with scalar result with BroadcastOp source. This is useful to
be able to incrementally convert degenerated vector of one element into scalar.
Differential Revision: https://reviews.llvm.org/D88751
This revision adds init_tensors support to buffer allocation for Linalg on tensors.
Currently makes the assumption that the init_tensors fold onto the first output tensors.
This assumption is not currently enforced or cast in stone and requires experimenting with tiling linalg on tensors for ops **without reductions**.
Still this allows progress towards the end-to-end goal.
A pattern to convert `spv.CompositeInsert` and `spv.CompositeExtract`.
In LLVM, there are 2 ops that correspond to each instruction depending
on the container type. If the container type is a vector type, then
the result of conversion is `llvm.insertelement` or `llvm.extractelement`.
If the container type is an aggregate type (i.e. struct, array), the
result of conversion is `llvm.insertvalue` or `llvm.extractvalue`.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D88205
Adds support for SPIR-V composite speciailization constants to spv._reference_of.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D88732