This will allow us to define select(pred, in, out) for TC ops, which is useful
for pooling ops.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97312
The cuda-runner registers two pass pipelines for nested passes,
so that we don't have to use verbose textual pass pipeline specification.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D97091
This prevents a bug in the pass instrumentation implementation where the main thread would end up with a different pass manager in different runs of the pass.
llvm::parallelTransformReduce does not schedule work on the caller thread, which becomes very costly for
the inliner where a majority of SCCs are small, often ~1 element. The switch to llvm::parallelForEach solves this,
and also aligns the implementation with the PassManager (which realistically should share the same implementation).
This change dropped compile time on an internal benchmark by ~1(25%) second.
Differential Revision: https://reviews.llvm.org/D96086
A majority of operations have a very small number of interfaces, which means that the cost of using a hash map is generally larger for interface lookups than just a binary search. In the future when there are a number of operations with large amounts of interfaces, we can switch to a hybrid approach that optimizes lookups based on the number of interfaces. For now, however, a binary search is the best approach.
This dropped compile time on a largish TF MLIR module by 20%(half a second).
Differential Revision: https://reviews.llvm.org/D96085
When computing dense address, a vectorized index must be accounted
for properly. This bug was formerly undetected because we get 0 * prev + i
in most cases, which folds away the scalar part. Now it works for all cases.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97317
Affine parallel ops may contain and yield results from MemRefsNormalizable ops in the loop body. Thus, both affine.parallel and affine.yield should have the MemRefsNormalizable trait.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D96821
This transformation was only used for quick experimentation and is not general enough.
Retire it.
Differential Revision: https://reviews.llvm.org/D97266
DebugCounters allow for selectively enabling the execution of a debug action based upon a "counter". This counter is comprised of two components that are used in the control of execution of an action, a "skip" value and a "count" value. The "skip" value is used to skip a certain number of initial executions of a debug action. The "count" value is used to prevent a debug action from executing after it has executed for a set number of times (not including any executions that have been skipped). For example, a counter for a debug action with `skip=47` and `count=2`, would skip the first 47 executions, then execute twice, and finally prevent any further executions.
This is effectively the same as the DebugCounter infrastructure in LLVM, but using the DebugAction infrastructure in MLIR. We can't simply reuse the DebugCounter support already present in LLVM due to its heavy reliance on global constructors (which are not allowed in MLIR). The DebugAction infrastructure already nicely supports the debug counter use case, and promotes the separation of policy and mechanism design philosophy.
Differential Revision: https://reviews.llvm.org/D96395
This revision adds the infrastructure for `Debug Actions`. This is a DEBUG only
API that allows for external entities to control various aspects of compiler
execution. This is conceptually similar to something like DebugCounters in LLVM, but at a lower level. This framework doesn't make any assumptions about how the higher level driver is controlling the execution, it merely provides a framework for connecting the two together. This means that on top of DebugCounter functionality, we could also provide more interesting drivers such as interactive execution. A high level overview of the workflow surrounding debug actions is
shown below:
* Compiler developer defines an `action` that is taken by the a pass,
transformation, utility that they are developing.
* Depending on the needs, the developer dispatches various queries, pertaining
to this action, to an `action manager` that will provide an answer as to
what behavior the action should do.
* An external entity registers an `action handler` with the action manager,
and provides the logic to resolve queries on actions.
The exact definition of an `external entity` is left opaque, to allow for more
interesting handlers.
This framework was proposed here: https://llvm.discourse.group/t/rfc-debug-actions-in-mlir-debug-counters-for-the-modern-world
Differential Revision: https://reviews.llvm.org/D84986
This commit is the first baby step towards detensoring in
linalg-on-tensors.
Detensoring is the process through which a tensor value is convereted to one
or potentially more primitive value(s). During this process, operations with
such detensored operands are also converted to an equivalen form that works
on primitives.
The detensoring process is driven by linalg-on-tensor ops. In particular, a
linalg-on-tensor op is checked to see whether *all* its operands can be
detensored. If so, those operands are converted to thier primitive
counterparts and the linalg op is replaced by an equivalent op that takes
those new primitive values as operands.
This works towards handling github/google/iree#1159.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96271
This does not change the behavior directly: the tests only run when
`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON` is configured. However running
`ninja check-mlir` will not run all the tests within a single
lit invocation. The previous behavior would wait for all the integration
tests to complete before starting to run the first regular test. The
test results were also reported separately. This change is unifying all
of this and allow concurrent execution of the integration tests with
regular non-regression and unit-tests.
Differential Revision: https://reviews.llvm.org/D97241
This makes the implementation of each bytecode operation much easier to reason about, and lets the compiler decide which implementations are beneficial to inline into the main switch.
Differential Revision: https://reviews.llvm.org/D95716
We should be ordering predicates with higher primary/secondary sums first, but we are currently ordering them last. This allows for predicates more frequently encountered to be checked first.
Differential Revision: https://reviews.llvm.org/D95715
`verifyConstructionInvariants` is intended to allow for verifying the invariants of an attribute/type on construction, and `getChecked` is intended to enable more graceful error handling aside from an assert. There are a few problems with the current implementation of these methods:
* `verifyConstructionInvariants` requires an mlir::Location for emitting errors, which is prohibitively costly in the situations that would most likely use them, e.g. the parser.
This creates an unfortunate code duplication between the verifier code and the parser code, given that the parser operates on llvm::SMLoc and it is an undesirable overhead to pre-emptively convert from that to an mlir::Location.
* `getChecked` effectively requires duplicating the definition of the `get` method, creating a quite clunky workflow due to the subtle different in its signature.
This revision aims to talk the above problems by refactoring the implementation to use a callback for error emission. Using a callback allows for deferring the costly part of error emission until it is actually necessary.
Due to the necessary signature change in each instance of these methods, this revision also takes this opportunity to cleanup the definition of these methods by:
* restructuring the signature of `getChecked` such that it can be generated from the same code block as the `get` method.
* renaming `verifyConstructionInvariants` to `verify` to match the naming scheme of the rest of the compiler.
Differential Revision: https://reviews.llvm.org/D97100
Simplifies the way lattices are optimized with less, but more
powerful rules. This also fixes an inaccuracy where too many
lattices resulted (expecting a non-existing universal index).
Also puts no-side-effects on all proper getters and unifies
bufferization flags order in integration tests (for future,
more complex use cases).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97134
The current implementation of tilePerfectlyNested utility doesn't handle
the non-unit step size. We have added support to perform tiling
correctly even if the step size of the loop to be tiled is non-unit.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49188.
Differential Revision: https://reviews.llvm.org/D97037
This patch adds Linalg named ops for various types of integer matmuls.
Due to limitations in the tc spec/linalg-ods-gen ops cannot be type
polymorphic, so this instead creates new ops (improvements to the
methods for defining Linalg named ops are underway with a prototype at
https://github.com/stellaraccident/mlir-linalgpy).
To avoid the necessity of directly referencing these many new ops, this
adds additional methods to ContractionOpInterface to allow classifying
types of operations based on their indexing maps.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D97006
operands[2] can be nullptr here. I'm not able to build a lit test for
this because of the commutative reordering of operands. It's possible to
trigger this with a createOrFold<BroadcastOp> though.
Differential Revision: https://reviews.llvm.org/D97206
This commit fixes a bug in affine fusion pipeline where an
incorrect fusion is performed despite a dealloc op is present
between a producer and a consumer. This is done by creating a
node for dealloc op in the MDG.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D97032
Simple jupyter kernel using mlir-opt and reproducer to run passes.
Useful for local experimentation & generating examples. The export to
markdown from here is not immediately useful nor did I define a
CodeMirror synax to make the HTML output prettier. It only supports one
level of history (e.g., `_`) as I was mostly using with expanding a
pipeline one pass at a time and so was all I needed.
I placed this in utils directory next to editor & debugger utils.
Differential Revision: https://reviews.llvm.org/D95742
* It was decided that this was the end of the line for the existing custom tc parser/generator, and this is the first step to replacing it with a declarative format that maps well to mathy source languages.
* One such source language is implemented here: https://github.com/stellaraccident/mlir-linalgpy/blob/main/samples/mm.py
* In fact, this is the exact source of the declarative `polymorphic_matmul` in this change.
* I am working separately to clean this python implementation up and add it to MLIR (probably as `mlir.tools.linalg_opgen` or equiv). The scope of the python side is greater than just generating named ops: the ops are callable and directly emit `linalg.generic` ops fully dynamically, and this is intended to be a feature for frontends like npcomp to define custom linear algebra ops at runtime.
* There is more work required to handle full type polymorphism, especially with respect to integer formulations, since they require more specificity wrt types.
* Followups to this change will bring the new generator to feature parity with the current one and delete the current. Roughly, this involves adding support for interface declarations and attribute symbol bindings.
Differential Revision: https://reviews.llvm.org/D97135
Rationale:
Touching function level information can only be done within a module pass.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97102
A folder of `tensor_load + tensor_to_memref` exists but it only applies when
source and destination memref types are the same.
This revision adds a canonicalize `tensor_load + tensor_to_memref` to `memref_cast`
when type mismatches prevent folding to kick in.
Differential Revision: https://reviews.llvm.org/D97038
Rationale:
Providing the wrong number of sparse/dense annotations was silently
ignored or caused unrelated crashes. This minor change verifies that
the provided number matches the rank.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97034
Extracts the relevant dimensions from the map under test to build up the
maps to test against in a permutation-invariant way.
This also includes a fix to the indexing maps used by
isColumnMajorMatmul. The maps as currently written do not describe a
column-major matmul. The linalg named op column_major_matmul has the
correct maps (and notably fails the current test).
If `C = matmul(A, B)` we want an operation that given A in column major
format and B in column major format produces C in column major format.
Given that for a matrix, faux column major is just transpose.
`column_major_matmul(transpose(A), transpose(B)) = transpose(C)`. If
`A` is `NxK` and `B` is `KxM`, then `C` is `NxM`, so `transpose(A)` is
`KxN`, `transpose(B)` is `MxK` and `transpose(C)` is `MxN`, not `NxM`
as these maps currently have.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96984
Static subtensor / subtensor_insert of the same size as the source / destination tensor and root @[0..0] with strides [1..1] are folded away.
Differential revision: https://reviews.llvm.org/D96991
It's not necessarily the case on all architectures that all memory is
addressable in addrspace 0, so casting the pointer to addrspace 0 is
liable to cause problems.
Reviewed By: aartbik, ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96380
This patch adds lowering to Linalg for the following TOSA ops: negate, rsqrt, mul, select, clamp and reluN and includes support for signless integer and floating point types
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D96924
`subtensor_insert` was used instead of `linalg.subtensor_yield` to make this PR
smaller. Verification will be added in a follow-up PR.
Differential Revision: https://reviews.llvm.org/D96943
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
The functions translating enums to LLVM IR are generated in a single
file included in many places, not all of which use all translations.
Generate functions with "unused" attribute to silence compiler warnings.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96880
Rationale:
Narrower types for overhead storage yield a smaller memory footprint for
sparse tensors and thus needs to be supported. Also, more value types
need to be supported to deal with all kinds of kernels. Since the
"one-size-fits-all" sparse storage scheme implementation is used
instead of actual codegen, the library needs to be able to support
all combinations of desired types. With some crafty templating and
overloading, the actual code for this is kept reasonably sized though.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D96819
Add a pattern to converting some value to a boolean. spirv.S/UConvert does not
work on i1 types. Thus, the pattern is lowered to cmpi + select.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D96851
Resolving the dim of outputs of a tensor_reshape op in terms of its
input shape allows the op to be eliminated when its used only in its
dims. The init_tensor -> tensor_reshape canonicalization can be
simplified to use the dims of the output of the tensor_reshape which
gets canonicalized away later making the tensor_reshape dead.
Differential Revision: https://reviews.llvm.org/D96635
Separating the AffineMapAccessInterface from AffineRead/WriteOp interface so that dialects which extend Affine capabilities (e.g. PlaidML PXA = parallel extensions for Affine) can utilize relevant passes (e.g. MemRef normalization).
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D96284
When the destination of the subview has a lower rank than its source we need to
fix the result type of the new subview op.
Differential Revision: https://reviews.llvm.org/D96804
A series of preceding patches changed the mechanism for translating MLIR to
LLVM IR to use dialect interface with delayed registration. It is no longer
necessary for specific dialects to derive from ModuleTranslation. Remove all
virtual methods from ModuleTranslation and factor out the entry point to be a
free function.
Also perform some cleanups in ModuleTranslation internals.
Depends On D96774
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96775
Verification of the LLVM IR produced when translating various MLIR dialects was
only active when calling the translation programmatically. This has led to
several cases of invalid LLVM IR being generated that could not be caught with
textual mlir-translate tests. Add verifiers for these cases and fix the tests
in preparation for enforcing the validation of LLVM IR.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96774
Some of the lowering of vector.contract didn't support integer case. Since
reduction of integer cannot accumulate we always break up the reduction op, it
should be merged by a separate canonicalization if possible.
Differential Revision: https://reviews.llvm.org/D96461
These patterns unrolls transfer read/write ops if the vector consumers/
producers are extract/insert slices op. Transfer ops can map to hardware
load/store functionalities, where the vector size matters for bandwidth
considerations. So these patterns should be collected separately, instead
of being generic canonicalization patterns.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96782
Previously this might happen if there was no elseRegion and the method
was asked for all successor regions.
Differential Revision: https://reviews.llvm.org/D96764
This revision adds support for hoisting "subtensor + vector.transfer_read" / "subtensor_insert + vector.transfer_write pairs" across scf.for.
The unit of hoisting becomes a HoistableRead / HoistableWrite struct which contains a pair of "vector.transfer_read + optional subtensor" / "vector.transfer_write + optional subtensor_insert".
scf::ForOp canonicalization patterns are applied greedily on the successful application of the transformation to cleanup the IR more eagerly and potentially expose more transformation opportunities.
Differential revision: https://reviews.llvm.org/D96731
SliceAnalysis originally was developed in the context of affine.for within mlfunc.
It predates the notion of region.
This revision updates it to not hardcode specific ops like scf::ForOp.
When rooted at an op, the behavior of the slice computation changes as it recurses into the regions of the op. This does not support gathering all values transitively depending on a loop induction variable anymore.
Additional variants rooted at a Value are added to also support the existing behavior.
Differential revision: https://reviews.llvm.org/D96702
Allow clients to create a new ShapedType of the same "container" type
but with different element or shape. First use case is when refining
shape during shape inference without needing to consider which
ShapedType is being refined.
Differential Revision: https://reviews.llvm.org/D96682
This corresponds with the previous work to make shape.broadcast nary.
Additionally, simplify the ConvertShapeConstraints pass. It now doesn't
lower an implicit shape.is_broadcastable. This is still the same in
combination with shape-to-standard when the 2 passes are used in either
order.
Differential Revision: https://reviews.llvm.org/D96401
Port the translation of five dialects that define LLVM IR intrinsics
(LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect
interface-based mechanism. This allows us to remove individual translations
that were created for each of these dialects and just use one common
MLIR-to-LLVM-IR translation that potentially supports all dialects instead,
based on what is registered and including any combination of translatable
dialects. This removal was one of the main goals of the refactoring.
To support the addition of GPU-related metadata, the translation interface is
extended with the `amendOperation` function that allows the interface
implementation to post-process any translated operation with dialect attributes
from the dialect for which the interface is implemented regardless of the
operation's dialect. This is currently applied to "kernel" functions, but can
be used to construct other metadata in dialect-specific ways without
necessarily affecting operations.
Depends On D96591, D96504
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96592
Dialects themselves do not support repeated addition of interfaces with the
same TypeID. However, in case of delayed registration, the registry may contain
such an interface, or have the same interface registered several times due to,
e.g., dependencies. Make sure we delayed registration does not attempt to add
an interface with the same TypeID more than once.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96606
The patch extends the runner utils by verification methods that compare two memrefs. The methods compare the content of the two memrefs and print success if the data is identical up to a small numerical error. The methods are meant to simplify the development of integration tests that compare the results against a reference implementation (cf. the updates to the linalg matmul integration tests).
Originally landed in 5fa893c (https://reviews.llvm.org/D96326) and reverted in dd719fd due to a Windows build failure.
Changes:
- Remove the max function that requires the "algorithm" header on Windows
- Eliminate the truncation warning in the float specialization of verifyElem by using a float constant
Reviewed By: Kayjukh
Differential Revision: https://reviews.llvm.org/D96593
Currently, vector.contract joins the intermediate result and the accumulator
argument (of ranks K) using summation. We desire more joining operations ---
such as max --- to help vector.contract express reductions. This change extends
Vector_ContractionOp to take an optional attribute (called "kind", of enum type
CombiningKind) specifying the joining operation to be add/mul/min/max for int/fp
, and and/or/xor for int only. By default this attribute has value "add".
To implement this we also need to extend vector.outerproduct, since
vector.contract gets transformed to vector.outerproduct (and that to
vector.fma). The extension for vector.outerproduct is also an optional kind
attribute that uses the same enum type and possible values. The default is
"add". In case of max/min we transform vector.outerproduct to a combination of
compare and select.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D93280
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
This reverts commit 3f22547fd1 and reland 973e133b76 with a workaround for
a gcc bug that does not accept lambda default parameters:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59949
Differential Revision: https://reviews.llvm.org/D96598