This patch remove `spaceKind` from PresburgerSpace, making PresburgerSpace only
a space supporting relations.
Sets are still implemented in the same way, i.e. with a zero domain but instead
the asserts to check if the space is still set are added to users of
PresburgerSpace which treat it as a Set space.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D121357
The enableObjectCache option was added in
https://reviews.llvm.org/rG06e8101034e, defaulting to false. However,
the init code added there got its logic reversed
(cache(enableObjectCache ? nullptr : new SimpleObjectCache()), which was
fixed in https://reviews.llvm.org/rGd1186fcb04 by setting the default to
true, thereby preserving the existing behavior even if it was
unintentional.
Default now the object cache to false as it was originally intended.
While at it, mention in enableObjectCache's documentation how the
cache can be dumped.
Reviewed-by: mehdi_amini
Differential Revision: https://reviews.llvm.org/D121291
This patch moves the testcases from
`mlir/test/Target/LLVMIR/openmp-llvm-bad-schedule-modifier.mlir` to
`mlir/test/Dialect/OpenMP/invalid.mlir` as they test the verifier
(not the translation to LLVM IR).
Reviewed By: NimishMishra
Differential Revision: https://reviews.llvm.org/D120877
On Windows (at least), cmake ignores Python3_EXECUTABLE unless the
'Interpreter' component is being found. If the user is specifying a
different version than the latest installed (say, 3.8 vs 3.9) with the
Python3_EXECUTABLE, cmake was using a combination of the newest version
and the desired version. Mitigated by adding 'Interpreter' in the first
invocation like the second one.
This patch adds lowering from omp.atomic.update to LLVM IR. Whenever a
special LLVM IR instruction is available for the operation, `atomicrmw`
instruction is emitted, otherwise a compare-exchange loop based update
is emitted.
Depends on D119522
Reviewed By: ftynse, peixin
Differential Revision: https://reviews.llvm.org/D119657
When `addCoalescedPolyhedron` was called with `j == n - 1`,
the `polyhedrons`-vector was not properly updated (the
`IntegerPolyhedron` at position `n - 2` was "lost"). This patch adds
special handling to that case and a regression testcase.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D121356
This patch moves PresburgerSpace::removeIdRange(idStart, idLimit) to
PresburgerSpace::removeIdRange(kind, idStart, idLimit), i.e. identifiers
can only be removed at once for a single kind.
This makes users of PresburgerSpace to not assume any inside ordering of
identifier kinds.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D121079
NFC. Clean up memref utils library. This library had a single function
that was completely misplaced. MemRefUtils is expected to be (also per
its comment) a library providing analysis/transforms utilities on memref
dialect ops or memref types. However, in reality it had a helper that
was depended upon by the MemRef dialect, i.e., it was a helper for the
dialect ops library and couldn't contain anything that itself depends on
the MemRef dialect. Move the single method to the memref dialect that
will now allow actual utilities depending on the memref dialect to be
placed in it.
Put findDealloc in the `memref` namespace. This is a pure move.
Differential Revision: https://reviews.llvm.org/D121273
ValueShapeRange::getShape() returns ShapeAdaptor rather than ShapedType
and ShapeAdaptor allows implicit conversion to bool. It ends up that
ShapedTypeComponents can be constructed with ShapeAdaptor incorrectly.
The reason is that the type trait
std::is_constructible<ShapeStorageT, Arg>::value
is fulfilled because ShapeAdaptor can be converted to bool and it can be
used to construct ShapeStorageT. In the end, we won't give any warning
or error message when doing things like
inferredReturnShapes.emplace_back(valueShapeRange.getShape(0));
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D120845
Currently when we fold an empty loop, we assume that any loop
with iterArgs returns its iterArgs in order, which is not always
the case. It may return values defined outside of the loop or
return its iterArgs out of order. This patch adds support to
those cases.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D120776
This revision adds support for the linalg.index to the sparse compiler
pipeline. In essence, this adds the ability to refer to indices in
the tensor index expression, as illustrated below:
Y[i, j, k, l, m] = T[i, j, k, l, m] * i * j
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D121251
BuiltinOps.h
These includes are going to be removed from BuiltinOps.h in a followup
when FuncOp is moved out of the Builtin dialect. This commit
pre-emptively adds those includes to simplify the patch moving FuncOp.
It's fine to use any integer (vector) values regardless of the
signedness. The opcode decides how to interpret the bits.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D121238
This feels like a layering violation, but it fixes the build.
Fixes#54242
tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SerializeToHsaco.cpp.o:SerializeToHsaco.cpp:function (anonymous namespace)::SerializeToHsacoPass::optimizeLlvm(llvm::Module&, llvm::TargetMachine&):
error: undefined reference to 'mlir::makeOptimizingTransformer(unsigned int, unsigned int, llvm::TargetMachine*)'
This pass doesn't rely on any specific characteristics of FuncOp, and
can just be a generic operation pass.
Differential Revision: https://reviews.llvm.org/D121193
It is currently a module pass, but shouldn't be. All of the patterns
are local conversions, and don't require anything about
functions/modules.
Differential Revision: https://reviews.llvm.org/D121192
These passes generally don't rely on any special aspects of FuncOp, and moving allows
for these passes to be used in many more situations. The passes that obviously weren't
relying on invariants guaranteed by a "function" were updated to be generic pass, the
rest were updated to be FunctionOpinterface InterfacePasses.
The test updates are NFC switching from implicit nesting (-pass -pass2) form to
the -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do).
Differential Revision: https://reviews.llvm.org/D121190
FuncOp isn't really important to hardcode here, it is only used to act
as a root operation for the transformation.
Differential Revision: https://reviews.llvm.org/D121195
A lot of test passes are currently anchored on FuncOp, but this
dependency
is generally just historical. A majority of these test passes can run on
any operation, or can operate on a specific interface
(FunctionOpInterface/SymbolOpInterface).
This allows for greatly reducing the API dependency on FuncOp, which
is slated to be moved out of the Builtin dialect.
Differential Revision: https://reviews.llvm.org/D121191
Commit rG1a2bb03edab9d7aa31beb587d0c863acc6715d27 introduced a pattern
to convert dynamic dimensions in operands of `GenericOp`s to static
values based on indexing maps and shapes of other operands. The logic
is directly usable to any `LinalgOp`. Move that pattern as an
`OpInterfaceRewritePattern`.
Differential Revision: https://reviews.llvm.org/D120968
This is a pass that can be used by downstream consumers directly
to avoid the boilerplate to wrap around the `populate*Patterns`.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D121222
A `tensor.cast` consumer can be folded with its producer. This is
beneficial only if the result of the tensor cast is more static than
the source. This patch adds a utility function to check that this is
the case, and adds a couple of canonicalizations patterns that fold an
operation with `tensor.cast` conusmers.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D120950
It's valid to create a TypedArrayAttr or MixedContainerType with
nullptr, e.g.,
std::vector<mlir::Attribute> attrs = {mlir::StringAttr()};
builder.createArrayAttr(attrs);
The predicate didn't check if it's a nullptr and it ended up a crash in
the attribute static verifier. We always check if an attribute is null
so it's better to align the check for these two container type attr.
Reviewed By: rdzhabarov
Differential Revision: https://reviews.llvm.org/D121178
With the recent improvements to OpDSL it is cheap to reintroduce a linalg.copy operation.
This operation is needed in at least 2 cases:
1. for copies that may want to change the elemental type (e.g. cast, truncate, quantize, etc)
2. to specify new tensors that should bufferize to a copy operation. The linalg.generic form
always folds away which is not always the right call.
Differential Revision: https://reviews.llvm.org/D121230
Allow pointwise operations to take rank zero input tensors similarly to scalar inputs. Use an empty indexing map to broadcast rank zero tensors to the iteration domain of the operation.
Depends On D120734
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120807
The revision removes the SoftPlus2DOp operation that previously served as a test operation. It has been replaced by the elemwise_unary operation, which is now used to test unary log and exp functions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120794
Simplify tests that use `linalg.fill_rng_2d` to focus on testing the `const` and `index` functions. Additionally, cleanup emit_misc.py to use simpler test functions and fix an error message in config.py.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120734
Extend OpDSL with a `defines` method that can set the `hasCanonicalizer` flag for an OpDSL operation. If the flag is set via `defines(Canonicalizer)` the operation needs to implement the `getCanonicalizationPatterns` method. The revision specifies the flag for linalg.fill_tensor and adds an empty `FillTensorOp::getCanonicalizationPatterns` implementation.
This revision is a preparation step to replace linalg.fill by its OpDSL counterpart linalg.fill_tensor. The two are only functionally equivalent if both specify the same canonicalization patterns. The revision is thus a prerequisite for the linalg.fill replacement.
Depends On D120725
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120726
Enhance `LinalgTileAndFuseTensorOpsPattern` with an additional rewrite signature that returns the result of the rewrite.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D121212
Add a FillOpInterface similar to the contraction and convolution op interfaces. The FillOpInterface is a preparation step to replace linalg.fill by its OpDSL version linalg.fill_tensor. The interface implements the `value()`, `output()`, and `result()` methods that by default are not available on linalg.fill_tensor.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120725
Currently, the transfer mask is materialized by generating the vector
comparison: [offset + 0, .., offset + length - 1] < [dim, .., dim]
A better alternative is to materialize the transfer mask by using the
operation: `vector.create_mask (dim - offset)`, which will generate
simpler code and compose better with scalable vectors.
Differential Revision: https://reviews.llvm.org/D120487
Rewrite isInnermostAffineForOp utility to make it more direct/efficient.
Drop unnecessary check. NFC.
Differential Revision: https://reviews.llvm.org/D121170
This is to align with the PyTACO API better.
Modify an existing unit test to test the new routines.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D121083
This patch cleans up the interface to PresburgerSet. At a high level it does
the following changes:
- Move member functions around to have constructors at top and print/dump
at end.
- Move a private function to be a static function instead.
- Change member functions of type "getAllIntegerPolyhedron" to "getAllPolys"
instead.
- Improve documentation for PresburgerSet.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D121027
In quantized comutation, there are casting ops around computation ops.
Reorder the ops to make reduce-to-contract actually work.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120760
The current StandardToLLVM conversion patterns only really handle
the Func dialect. The pass itself adds patterns for Arithmetic/CFToLLVM, but
those should be/will be split out in a followup. This commit focuses solely
on being an NFC rename.
Aside from the directory change, the pattern and pass creation API have been renamed:
* populateStdToLLVMFuncOpConversionPattern -> populateFuncToLLVMFuncOpConversionPattern
* populateStdToLLVMConversionPatterns -> populateFuncToLLVMConversionPatterns
* createLowerToLLVMPass -> createConvertFuncToLLVMPass
Differential Revision: https://reviews.llvm.org/D120778
These unit tests resides in an internal repository. Porting the tests to the
public repository.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D121021
The default lowering of vector transpose operations generates a large sequence of
scalar extract/insert operations, one pair for each scalar element in the input tensor.
In other words, the vector transpose is scalarized. However, there are transpose
patterns where one or more adjacent high-order dimensions are not transposed (for
example, in the transpose pattern [1, 0, 2, 3], dimensions 2 and 3 are not transposed).
This patch improves the lowering of those cases by not scalarizing them and extracting/
inserting a full n-D vector, where 'n' is the number of adjacent high-order dimensions
not being transposed. By doing so, we prevent the scalarization of the code and generate a
more performant vector version.
Paradoxically, this patch shouldn't improve the performance of transpose operations if
we are using LLVM. The LLVM pipeline is able to optimize away some of the extract/insert
operations and the SLP vectorizer is converting the scalar operations back to its vector
form. However, scalarizing a vector version of the code in MLIR and relying on the SLP
vectorizer to reconstruct the vector code again is highly undesirable for several reasons.
Reviewed By: nicolasvasilache, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120601
This patch fixes the crash when printing some ops (like affine.for and
scf.for) when they are dumped in invalid state, e.g. during pattern
application. Now the AsmState constructor verifies the operation
first and switches to generic operation printing when the verification
fails. Also operations are now printed in generic form when emitting
diagnostics and the severity level is Error.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D117834
Translation.h is currently awkwardly shoved into the top-level mlir, even though it is
specific to the mlir-translate tool. This commit moves it to a new Tools/mlir-translate
directory, which is intended for libraries used to implement tools. It also splits the
translate registry from the main entry point, to more closely mirror what mlir-opt
does.
Differential Revision: https://reviews.llvm.org/D121026
MlirOptMain is currently awkwardly shoved into mlir/Support. This commit
moves it to the Tools/ directory, which is intended for libraries used to
implement tools.
Differential Revision: https://reviews.llvm.org/D121025
There is no reason for this file to be at the top-level, and
its current placement predates the Parser/ folder's existence.
Differential Revision: https://reviews.llvm.org/D121024
Mark `parseSourceFile()` deprecated. The functions will be removed two weeks after landing this change.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D121075
The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease)
```
omp.parallel {
%a = ...
omp.parallel {
use(%a)
}
}
```
As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this
allocation to the inner parallel. For example, we would want to get something like the following:
```
omp.parallel {
%a = ...
%tmp = alloc
store %tmp[] = %a
kmpc_fork(outlined, %tmp)
}
```
However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder,
the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each
region.
```
entry:
...
parallel1:
%a = ...
...
parallel2:
use(%a)
...
endparallel2:
...
endparallel1:
...
```
When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent
allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1.
This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example.
This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D121061
This commit adds a new hook Pass `bool canScheduleOn(RegisteredOperationName)` that
indicates if the given pass can be scheduled on operations of the given type. This makes it
easier to define constraints on generic passes without a) adding conditional checks to
the beginning of the `runOnOperation`, or b) defining a new pass type that forwards
from `runOnOperation` (after checking the invariants) to a new hook. This new hook is
used to implement an `InterfacePass` pass class, that represents a generic pass that
runs on operations of the given interface type.
The PassManager will also verify that passes added to a pass manager can actually be
scheduled on that pass manager, meaning that we will properly error when an Interface
is scheduled on an operation that doesn't actually implement that interface.
Differential Revision: https://reviews.llvm.org/D120791
RegionBranchOpInterface and BranchOpInterface are allowed to make implicit type conversions along control-flow edges. In effect, this adds an interface method, `areTypesCompatible`, to both interfaces, which should return whether the types of corresponding successor operands and block arguments are compatible. Users of the interfaces, here on forth, must be aware that types may mismatch, although current users (in MLIR core), are not affected by this change. By default, type equality is used.
`async.execute` already has unequal types along control-flow edges (`!async.value<f32>` vs. `f32`), but it opted out of calling `RegionBranchOpInterface::verifyTypes` in its verifier. That method has now been removed and `RegionBranchOpInterface` will verify types along control edges by default in its verifier.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120790
Such initializer functions can be enqueued in `BufferizationOptions`. They can be used to set up dialect-specific bufferization state.
Differential Revision: https://reviews.llvm.org/D120985
This clarifies that these methods only work in append mode, not for general insertions. This is a prospective change towards https://github.com/llvm/llvm-project/issues/51652 which also performs random-access insertions, so we want to avoid confusion.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120929
This patch extends the existing if combining canonicalization to also handle the case where a value returned by the first if is used within the body of the second if.
This patch also extends if combining to support if's whose conditions are logical negations of each other.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120924
This patch makes coalesce skip the comparison of all pairs of IntegerPolyhedrons with LocalIds rather than crash. The heuristics to handle these cases will be upstreamed later on.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120995
We can simplify an extractvalue of an insertvalue to extract out of the base of the insertvalue, if the insert and extract are at distinct and non-prefix'd indices
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120915
This patch introduces the cut case. If one polytope has only cutting and
redundant inequalities for the other and the facet of the cutting
inequalities are contained within the other polytope, then the polytopes are
be combined into a polytope consisting only of their respective
redundant constraints.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120614
Extend isLoopMemoryParallel check to include locally allocated memrefs.
This strengthens and also speeds up the dependence check used by the
utility by excluding locally allocated memrefs where appropriate.
Additional memref dialect ops can be supported exhaustively via proper
interfaces.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D120617
An OpenMP wsloop is simply a regular for loop with the bounds determined by the thread number, and the same justification to allow this for scf.for works for omp.wsloop.
An OpenMP parallel is a parallel for, per thread. Similarly the same justification for scf.parallel having recursive side effects applies here.
In both cases the general justification is that the ops themselves don't have side effects (besides inaccessible runtime-specific memory) and thus the side effects are simply that of the contained ops.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D120853
This commit adds support for processing tablegen include files, and importing
various information from ODS. This includes operations, attribute+type constraints,
attribute/operation/type interfaces, etc. This will allow for much more robust tooling,
and also allows for referencing ODS constructs directly within PDLL (imported interfaces
can be used as constraints, operation result names can be used for member access, etc).
Differential Revision: https://reviews.llvm.org/D119900
The SerializeToHsaco pass does not depend on ROCm being available on
the build system - it only requires ROCm to be present at runtime.
However, the CMake file that built it tested for
MLIR_ENABLE_ROCM_RUNNER , which implies that ROCm is currently
available and is used to control building ROCm integration tests.
Referencing MLIR_ENABLE_ROCM_RUNNER instead of
MLIR_ENABLE_ROCM_CONVERSIONS in the SerializeToHsaco build therefore
causes problems for clients who wish to build projects that depend on
this pass on a system without an AMD GPU present.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D120663
Ensure that `Handler` within the class is interpreted as the as the current template instantiation (instead the class template itself).
Fixes#53447
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120852
This ensures that we generate memref types with matching layout maps. (Especially when using partial bufferization passes.)
Differential Revision: https://reviews.llvm.org/D120893
This commit deletes the old dialect conversion-based bufferization patterns, which are now obsolete.
Differential Revision: https://reviews.llvm.org/D120883
The loopInfos gets invalidated after collapsing nested loops. Use the
saved afterIP since the returned afterIP by applyDynamicWorkshareLoop
may be not valid.
Reviewed By: shraiysh
Differential Revision: https://reviews.llvm.org/D120294
StandardToSPIRV currently contains an assortment of patterns converting from
different dialects to SPIRV. This commit splits up StandardToSPIRV into separate
conversions for each of the dialects involved (some of which already exist).
Differential Revision: https://reviews.llvm.org/D120767
Add support for extensible dialects, which are dialects that can be
extended at runtime with new operations and types.
These operations and types cannot at the moment implement traits
or interfaces.
Differential Revision: https://reviews.llvm.org/D104554
LLVM defines several default datalayouts for integer and floating point types that are not being considered when importing into MLIR. This patch remedies this.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120832
https://reviews.llvm.org/D120423 replaced the use of stacksave/restore with memref.alloca_scope, but kept the save/restore at the same location. This PR places the allocation scope within the wsloop, thus keeping the same allocation scope as the original scf.parallel (e.g. no longer over stack allocating).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120772
This patch moves all functionality from IntegerPolyhedron to IntegerRelation.
IntegerPolyhedron is now implemented as a relation with no domain. All existing
functionality is extended to work on relations.
This patch does not affect external users like FlatAffineConstraints as they
can still continue to use IntegerPolyhedron abstraction.
This patch is part of a series of patches to support relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120652
Add support for translating data layout specifications for integer and float
types between MLIR and LLVM IR. This is a first step towards removing the
string-based LLVM dialect data layout attribute on modules. The latter is still
available and will remain so until the first-class MLIR modeling can fully
replace it.
Depends On D120739
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D120740
Add support for integer and float types into the data layout subsystem with
default logic similar to LLVM IR. Given the flexibility of the sybsystem, the
logic can be easily overwritten by operations if necessary. This provides the
connection necessary, e.g., for the GPU target where alignment requirements for
integers and floats differ from those provided by default (although still
compatible with the LLVM IR model). Previously, it was impossible to use
non-default alignment requirements for integer and float types, which could
lead to incorrect address and size calculations when targeting GPUs.
Depends On D120737
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D120739
This patch adds assemblyFormat for `omp.critical.declare`, `omp.atomic.read`,
`omp.atomic.write`, `omp.atomic.update` and `omp.atomic.capture`.
Also removing those clauses from `parseClauses` that aren't needed
anymore, thanks to the new assemblyFormats.
Reviewed By: NimishMishra, rriddle
Differential Revision: https://reviews.llvm.org/D120248
sparsity values.
Previously, we can't properly handle input tensors with a dimension
ordering that is different from the natural ordering or with a mixed of
compressed and dense dimensions. This change fixes the problems by
passing the dimension ordering and sparsity values to the runtime
routine.
Modify an existing test to test the situation.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120777
This adds an option to configure the CMake python search priming
behaviour that was introduced in D118148. In some environments the
priming would cause the "real" search to fail. The default behaviour is
unchanged, i.e. the search will be primed.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D120765
The remanants of Standard was renamed to Func, but the test directory
remained named as Standard. In adidition to fixing the name, this commit
also moves the tests for operations not in the Func dialect to the proper
parent dialect test directory.
The Func has a large number of legacy dependencies carried over from the old
Standard dialect, which was pervasive and contained a large number of varied
operations. With the split of the standard dialect and its demise, a lot of lingering
dead dependencies have survived to the Func dialect. This commit removes a
large majority of then, greatly reducing the dependence surface area of the
Func dialect.
The last remaining operations in the standard dialect all revolve around
FuncOp/function related constructs. This patch simply handles the initial
renaming (which by itself is already huge), but there are a large number
of cleanups unlocked/necessary afterwards:
* Removing a bunch of unnecessary dependencies on Func
* Cleaning up the From/ToStandard conversion passes
* Preparing for the move of FuncOp to the Func dialect
See the discussion at https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061
Differential Revision: https://reviews.llvm.org/D120624
Previously, convertToMLIRSparseTensor assumes identity storage ordering and all
compressed dimensions. This change extends the function with two parameters for
users to specify the storage ordering and the sparsity of each dimension.
Modify PyTACO to reflect this change.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120643
As discussed in https://reviews.llvm.org/D119743 scf.parallel would continuously stack allocate since the alloca op was placd in the wsloop rather than the omp.parallel. This PR is the second stage of the fix for that problem. Specifically, we now introduce an alloca scope around the inlined body of the scf.parallel and enable a canonicalization to hoist the allocations to the surrounding allocation scope (e.g. omp.parallel).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120423
The llvm.mlir.global operation accepts a region as initializer. This region
corresponds to an LLVM IR constant expression and therefore should not accept
operations with side effects. Add a corresponding verifier.
Reviewed By: wsmoses, bondhugula
Differential Revision: https://reviews.llvm.org/D120632
The revision renames the following OpDSL functions:
```
TypeFn.cast -> TypeFn.cast_signed
BinaryFn.min -> BinaryFn.min_signed
BinaryFn.max -> BinaryFn.max_signed
```
The corresponding enum values on the C++ side are renamed accordingly:
```
#linalg.type_fn<cast> -> #linalg.type_fn<cast_signed>
#linalg.binary_fn<min> -> #linalg.binary_fn<min_signed>
#linalg.binary_fn<max> -> #linalg.binary_fn<max_signed>
```
Depends On D120110
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120562
The revision extends OpDSL with unary and binary function attributes. A function attribute, makes the operations used in the body of a structured operation configurable. For example, a pooling operation may take an aggregation function attribute that specifies if the op shall implement a min or a max pooling. The goal of this revision is to define less and more flexible operations.
We may thus for example define an element wise op:
```
linalg.elem(lhs, rhs, outs=[out], op=BinaryFn.mul)
```
If the op argument is not set the default operation is used.
Depends On D120109
Reviewed By: nicolasvasilache, aartbik
Differential Revision: https://reviews.llvm.org/D120110
Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.
This patch includes the following related changes:
* Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
* Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
* Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
* Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.
Differential Revision: https://reviews.llvm.org/D114413
Add a pattern matcher for ExtractSliceOp when its source is a constant.
The matching heuristics can be governed by the control function since
generating a new constant is not always beneficial.
Differential Revision: https://reviews.llvm.org/D119605
If we have a chain of `tensor.insert_slice` ops inserting some
`tensor.pad` op into a `linalg.fill` and ranges do not overlap,
we can also elide the `tensor.pad` later.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120446
Fold tensor.insert_slice(tensor.pad(<input>), linalg.fill) into
tensor.insert_slice(<input>, linalg.fill) if the padding value and
the filling value are the same.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120410
Improve the LinalgOp verification to ensure the iterator types is known. Previously, unknown iterator types have been ignored without warning, which can lead to confusing bugs.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120649
This patch removes the builders for `omp.wsloop` operation that aren't
specifically needed anywhere. We can add them later if the need arises.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D120533
This patch moves IntegerPolyhedron::reset to FlatAffineConstraints::reset. This
function is not required in IntegerPolyhedron and creates ambiguity while
shifting implementations to IntegerRelation.
This patch is part of a series of patches to introduce relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120628
PDL currently doesn't support result values from constraints, meaning we need
to error out until this is actually supported to avoid crashes.
Differential Revision: https://reviews.llvm.org/D119782
This commits adds a C++ generator to PDLL that generates wrapper PDL patterns
directly usable in C++ code, and also generates the definitions of native constraints/rewrites
that have code bodies specified in PDLL. This generator is effectively the PDLL equivalent of
the current DRR generator, and will allow easy replacement of DRR patterns with PDLL patterns.
A followup will start to utilize this for end-to-end integration testing and show case how to
use this as a drop-in replacement for DRR tablegen usage.
Differential Revision: https://reviews.llvm.org/D119781
If the operand list or result list of an operation expression is not specified, we interpret
this as meaning that the operands/results are "unconstraint" (i.e. "could be anything").
We currently don't properly handle differentiating this case from the case of
"no operands/results". This commit adds the insertion of implicit value/type range
variables when these lists are unspecified. This allows for adding proper support
for when zero operands or results are expected.
Differential Revision: https://reviews.llvm.org/D119780
This commits starts to plumb PDLL down into MLIR and adds an initial
PDL generator. After this commit, we will have conceptually support
end-to-end execution of PDLL. Followups will add CPP generation to
match the current DRR setup, and begin to add various end-to-end
tests to test PDLL execution.
Differential Revision: https://reviews.llvm.org/D119779
This patch removes a redundant check in hasConsistentState which is always true
after introduction of PresburgerSpace.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120615
This patch moves identifier kind specific insert/append functions like
`insertDimId`, `appendSymbolId`, etc. from IntegerPolyhedron to
FlatAffineConstraints.
This change allows for a smoother transition to IntegerRelation.
This change is part of a series of patches to introduce Relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120576
This patch factors out various checks for dimension compatibility to
PresburgerSpace::isEqual and PresburgerLocalSpace::isEqual (for local
identifiers).
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120565
In the main-loop of the current coalesce implementation `i` was incremented
twice for some cases. This patch fixes this bug and adds a regression
testcase.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120613
The PyTACO DSL doesn't support reduction to scalars. This change
enhances the MLIR-PyTACO implementation to support reduction to scalars.
Extend an existing test to show the syntax of reduction to scalars and
two methods to retrieve the scalar values.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120572
The AVX2 lowering for transpose operations is only applicable to f32 vector types.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120427
The existing AVX2 lowering patterns for the transpose op only triggers if the
input vector is 2-D. This patch extends the patterns to trigger for n-D vectors
which are effectively 2-D vectors (e.g., vector<1x4x1x8x1). The main constraint
for the generalized AVX2 patterns to be applicable to these vectors is that the
dimensions that are greater than one must be transposed. Otherwise, the existing
patterns are not applicable.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119505
This change gives explicit order of verifier execution and adds
`hasRegionVerifier` and `verifyWithRegions` to increase the granularity
of verifier classification. The orders are as below,
1. InternalOpTrait will be verified first, they can be run independently.
2. `verifyInvariants` which is constructed by ODS, it verifies the type,
attributes, .etc.
3. Other Traits/Interfaces that have marked their verifier as
`verifyTrait` or `verifyWithRegions=0`.
4. Custom verifier which is defined in the op and has marked
`hasVerifier=1`
If an operation has regions, then it may have the second phase,
5. Traits/Interfaces that have marked their verifier as
`verifyRegionTrait` or
`verifyWithRegions=1`. This implies the verifier needs to access the
operations in its regions.
6. Custom verifier which is defined in the op and has marked
`hasRegionVerifier=1`
Note that the second phase will be run after the operations in the
region are verified. Based on the verification order, you will be able to
avoid verifying duplicate things.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D116789
Fix MLIR-PyTACO and some tests to use np.array_equal to compare integer
values.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120526
Split arithmetic function into unary and binary functions. The revision prepares the introduction of unary and binary function attributes that work similar to type function attributes.
Depends On D120108
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120109
This change allows the use of scalar tensors with index 0 in tensor index
expressions. In this case, the scalar value is broadcast to match the
dimensions of other tensors in the same expression.
Using scalar tensors as a destination in tensor index expressions is not
supported in the PyTACO DSL.
Add a PyTACO test to show the use of scalar tensors.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120524
Prepare the OpDSL function handling to introduce more function classes. A follow up commit will split ArithFn into UnaryFn and BinaryFn. This revision prepares the split by adding a function kind enum to handle different function types using a single class on the various levels of the stack (for example, there is now one TensorFn and one ScalarFn).
Depends On D119718
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120108
This patch refactors the looping strategy of coalesce for future patches. The new strategy works in-place and uses IneqType to organize inequalities into vectors of the same type. Future coalesce cases will pattern match on this organization. E.g. the contained case needs all inequalities and equalities to be redundant, so this case becomes checking whether the respective vectors are empty. For other cases, the patterns consider the types of all inequalities of both sets making it wasteful to only consider whether a can be coalesced with b in one step, as inequalities would need to be typed again for the opposite case. Therefore, the new strategy tries to coalesce a with b and b with a in a single step.
Reviewed By: Groverkss, arjunp
Differential Revision: https://reviews.llvm.org/D120392
This patch replaces various functions over inequalities/equalities in
IntegerPolyhedron with Matrix functions already implementing them or refactors
them to a Matrix function.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120482
This patch moves the Presburger library to a new `presburger` namespace.
This allows to shorten some names, helps to avoid polluting the mlir namespace,
and also provides some structure.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120505
This patch removes redundant code from fourierMotzkinEliminate implementation
using existing functions in IntegerPolyhedron.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120502
Previously, OpDSL operation used hardcoded type conversion operations (cast or cast_unsigned). Supporting signed and unsigned casts thus meant implementing two different operations. Type function attributes allow us to define a single operation that has a cast type function attribute which at operation instantiation time may be set to cast or cast_unsigned. We may for example, defina a matmul operation with a cast argument:
```
@linalg_structured_op
def matmul(A=TensorDef(T1, S.M, S.K), B=TensorDef(T2, S.K, S.N), C=TensorDef(U, S.M, S.N, output=True),
cast=TypeFnAttrDef(default=TypeFn.cast)):
C[D.m, D.n] += cast(U, A[D.m, D.k]) * cast(U, B[D.k, D.n])
```
When instantiating the operation the attribute may be set to the desired cast function:
```
linalg.matmul(lhs, rhs, outs=[out], cast=TypeFn.cast_unsigned)
```
The revsion introduces a enum in the Linalg dialect that maps one-by-one to the type functions defined by OpDSL.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D119718
This patch removes the `const` from `usingBigM` to enable the implicit copy assignment operator for Simplex.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D120542
This transformation is useful to break dependency between consecutive loop
iterations by increasing the size of a temporary buffer. This is usually
combined with heavy software pipelining.
Differential Revision: https://reviews.llvm.org/D119406
This adds a variable op, emitted as C/C++ locale variable, which can be
used if the `emitc.constant` op is not sufficient.
As an example, the canonicalization pass would transform
```mlir
%0 = "emitc.constant"() {value = 0 : i32} : () -> i32
%1 = "emitc.constant"() {value = 0 : i32} : () -> i32
%2 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
%3 = emitc.apply "&"(%1) : (i32) -> !emitc.ptr<i32>
emitc.call "write"(%2, %3) : (!emitc.ptr<i32>, !emitc.ptr<i32>) -> ()
```
into
```mlir
%0 = "emitc.constant"() {value = 0 : i32} : () -> i32
%1 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
%2 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
emitc.call "write"(%1, %2) : (!emitc.ptr<i32>, !emitc.ptr<i32>) -> ()
```
resulting in pointer aliasing, as %1 and %2 point to the same address.
In such a case, the `emitc.variable` operation can be used instead.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D120098
This patch removes binary operator enum which was introduced with `omp.atomic.update`. Now the update operation handles update in a region so this is no longer required.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D120458
Documentation exists about the details of the API but is missing a
description of the overall structure per dialect.
Reviewed By: shabalin
Differential Revision: https://reviews.llvm.org/D117002
The current implementation of ShuffleVectorOp assumes all vectors are
scalable. LLVM IR allows shufflevector operations on scalable vectors,
and the current translation between LLVM Dialect and LLVM IR does the
rigth thing when the shuffle mask is all zeroes. This is required to
do a splat operation on a scalable vector, but it doesn't make sense
for scalable vectors outside of that operation, i.e.: with non-all zero
masks.
Differential Revision: https://reviews.llvm.org/D118371
In D115022, we introduced an optimization where OpResults of a `linalg.generic` may bufferize in-place with an "in" OpOperand if the corresponding "out" OpOperand is not used in the computation.
This optimization can lead to unexpected behavior if the newly chosen OpOperand is in the same alias set as another OpOperand (that is used in the computation). In that case, the newly chosen OpOperand must bufferize out-of-place. This can be confusing to users, as always choosing the "out" OpOperand (regardless of whether it is used) would be expected when having the notion of "destination-passing style" in mind.
With this change, we go back to always bufferizing in-place with "out" OpOperands by default, but letting users override the behavior with a bufferization option.
Differential Revision: https://reviews.llvm.org/D120182
Previously only accessing values for `index` and signless int types
would work; signed and unsigned ints would hit an assert in
`IntegerAttr::getInt`. This exposes `IntegerAttr::get{S,U}Int` to the C
API and calls the appropriate function from the python bindings.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120194
Previously, we only support float64. We now support float32 and float64. When
constructing a tensor without providing a data type, the default is float32.
Fix the tests to data type consistency. All PyTACO application tests now use
float32 to match the default data type of TACO. Other tests may use float32 or
float64.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120356
Now that sparse tensor types are first-class citizens and the sparse compiler
is taking shape, it is time to make sure other compiler optimizations compose
well with sparse tensors. Mostly, this should be completely transparent (i.e.,
dense and sparse take the same path). However, in some cases, optimizations
only make sense in the context of sparse tensors. This is a first example of
such an optimization, where fusing a sampled elt-wise multiplication only makes
sense when the resulting kernel has a potential lower asymptotic complexity due
to the sparsity.
As an extreme example, running SDDMM with 1024x1024 matrices and a sparse
sampling matrix with only two elements runs in 463.55ms in the unfused
case but just 0.032ms in the fused case, with a speedup of 14485x that
is only possible in the exciting world of sparse computations!
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D120429
By specifying a sectionMemoryMapper, users can control how
memory for JIT code is allocated.
In particular, I need this in order to use a named memory
region so that profilers such as perf(1) can correctly label
execution cycles coming from JIT'ed code.
Reviewed-by: ezhulenev
Differential Revision: https://reviews.llvm.org/D120415
Given a cmpf of either uitofp or sitofp and a constant, attempt to canonicalize it to a cmpi.
This PR rewrites equivalent code within LLVM to now apply to MLIR arith.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117257
+ compare block size with the unrollable inner dimension
+ reduce nesting in the code and simplify a bit IR building
Reviewed By: cota
Differential Revision: https://reviews.llvm.org/D120075
This eliminates the requirement that pass-related strings outlive pass
instances, which will facilitate future work enabling dynamic passes
written in other languages.
Differential Revision: https://reviews.llvm.org/D120341
Its number of optional parameters has grown too large,
which makes adding new optional parameters quite a chore.
Fix this by using an options struct.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D120380
Use an `MLIRContext` declared in a single place in the `parsePoly` function that almost all Presburger unit tests use for parsing sets. This function is only used in tests.
This saves us from having to declare and pass a new `MLIRContext` in every test.
Reviewed By: bondhugula, mehdi_amini
Differential Revision: https://reviews.llvm.org/D119251
This trait results in PDL ops being erroneously CSE'd. These ops are side-effect free in the rewriter but not in the matcher (where unused values aren't allowed anyways). These ops should have a more nuanced side-effect modeling, this is fixing a bug introduced by a previous change.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120222
Header class in SPIR-V HTML spec has changed. Update script to reflect that.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D120179
The related functionality is moved over to the bufferization dialect. Test cases are cleaned up a bit.
Differential Revision: https://reviews.llvm.org/D120191
This patch adds a class to represent a relation in Presburger Library.
This patch only adds the skeleton class. Functionality from IntegerPolyhedron
will be moved to IntegerRelation in later patches to make it easier to review.
This patch is a part of a series of patches adding support for relations in
Presburger Library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120156
The `.def` and `.def_property_readonly` functions in PybindAdaptors.h should
construct the functions as method of the current class rather than as method of
pybind11:none(), which is an object and not even a class.
Depends On D117658
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D117659
This commit adds canonicalization pattern in `linalg.generic` op
for static shape inference. If any of the inputs or outputs have
static shape or is casted from a tensor of static shape, then
shapes of all the inputs and outputs can be inferred by using the
affine map of the static shape input/output.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D118929
This patch adds assemblyFormat for omp.sections operation.
Some existing functions have been altered to fit the custom directive
in assemblyFormat. This has led to their callsites to get modified too,
but those will be removed in later patches, when other operations get
their assemblyFormat. All operations were not changed in one patch for
ease of review.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D120176
This patch adds typing of inequalities to the simplex. This is a cental part of the coalesce algorithm and will be heavily used in later coalesce patches. Currently, only the three most basic types are supported with more to be introduced when they are needed.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D119925
This allows to differentiate between the cases where the optimum does not
exist due to being unbounded and due to the polytope being empty.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D120127
Add `BufferizableOpInterface::verifyAnalysis`. Ops can implement this method to check for expected invariants and limitations.
The purpose of this change is to introduce a modular way of checking assertions such as `assertScfForAliasingProperties`.
Differential Revision: https://reviews.llvm.org/D120189
This patch adds assemblyFormat for omp.parallel operation.
Some existing functions have been altered to fit the custom directive
in assemblyFormat. This has led to their callsites to get modified too,
but those will be removed in later patches, when other operations get
their assemblyFormat. All operations were not changed in one patch for
ease of review.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D120157
These routines will need to be specialized a lot more based on value types,
index types, pointer types, and permutation/dimension ordering. This is a
careful first step, providing some functionality needed in PyTACO bridge.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D120154
This patch removes the following clauses from OpenMP Dialect:
- private
- firstprivate
- lastprivate
- shared
- default
- copyin
- copyprivate
The privatization clauses are being handled in the flang frontend. The
data copying clauses are not being handled anywhere for now. Once
we have a better picture of how to handle these clauses in OpenMP
Dialect, we can add these. For the time being, removing unneeded
clauses.
For detailed discussion about this refer to [[ https://discourse.llvm.org/t/rfc-privatisation-in-openmp-dialect/3526 | Privatisation in OpenMP dialect ]]
Reviewed By: kiranchandramohan, clementval
Differential Revision: https://reviews.llvm.org/D120029
This patch introducing seperating dimensions into two types: Domain and Range.
This allows building relations over PresburgerSpace.
This patch is part of a series of patches to introduce relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D119709
MLIR has the notion of allocation scopes which specify that stack allocations (e.g. memref.alloca, llvm.alloca) should be freed or equivalently aren't available at the end of the corresponding region.
Currently neither OpenMP parallel nor SCF parallel regions have the notion of such a scope.
This clearly makes sense for an OpenMP parallel as this is implemented in with a new function which outlines the region, and clearly any allocations in that newly outlined function have a lifetime that ends at the return of the function, by definition.
While SCF.parallel doesn't have a guaranteed runtime which it is implemented with, this similarly makes sense for SCF.parallel since otherwise an allocation within an SCF.parallel will needlessly continue to allocate stack memory that isn't cleaned up until the function (or other allocation scope op) which contains the SCF.parallel returns. This means that it is impossible to represent thread or iteration-local memory without causing a stack blow-up. In the case that this stack-blow-up behavior is intended, this can be equivalently represented with an allocation outside of the SCF.parallel with a size equal to the number of iterations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119743
insert is soft deprecated, so remove all references so it's less likely
to be used and can be easily removed in the future.
Differential Revision: https://reviews.llvm.org/D120021
This is a bit awkward since ExtractOp allows both `f32` and
`vector<1xf32>` results for a scalar extraction. Allow both, but make
inference return the scalar to make this as NFC as possible.
This change changes the handling of trailing dimensions with unknown
extent. Users of the changessociationIndicesForReshape helper should
see benefits when transforming reshape like operations into
expand/collapse pairs if the higher-rank type has trailing unknown
dimensions.
The motivating example is a reshape from tensor<16x1x?xi32> to
tensor<16xi32> that can be modeled as collapsing the three dimensions.
Differential Revision: https://reviews.llvm.org/D119730
Previously, NaNs would be dropped in favor of bounded values which was
strictly incorrect. Now the min/max operation propagate this
information. Not all uses of min/max need this, but the given change
will help protect future additions, and this prevents the need for an
additional cmpf and select operation to handle NaNs.
Differential Revision: https://reviews.llvm.org/D120020
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D119883
NamedAttrList.append(StringAttr, StringAttr) fails to compile because it is
matched to the IteratorT append. Fixes the method to only match if the type is
an iterator.
It is time to compose Linalg related optimizations with SparseTensor
related optimizations. This is a careful first start by adding some
general Linalg optimizations "upstream" of the sparse compiler in the
full sparse compiler pipeline. Some minor changes were needed to make
those optimizations aware of sparsity.
Note that after this, we will add a sparse specific fusion rule,
just to demonstrate the power of the new composition.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119971
In SPIR-V, resources are represented as global variables that
are bound to certain descriptor. SPIR-V requires those global
variables to be declared as aliased if multiple ones are bound
to the same slot. Such aliased decorations can cause issues
for transcompilers like SPIRV-Cross when converting to source
shading languages like MSL.
So this commit adds a pass to perform analysis of aliased
resources and see if we can unify them into one.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119872
This would create a double free when a memref is passed twice to the
same op. This wasn't a problem at the time the pass was written but is
common since the introduction of scf.while.
There's a latent non-determinism that's triggered by the test, but this
change is messy enough as-is so I'll leave that for later.
Differential Revision: https://reviews.llvm.org/D120044
The MLIR parser allows regions to have an unnamed entry block.
Make this explicit in the language grammar.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D119950
During dialect conversion, target materialization is triggered to create
cast-like operations when a type mismatch occurs between the value that
replaces a rewritten operation and the type that another operations expects as
operands processed by the type conversion. First, a dummy cast is inserted to
make sure the pattern application can proceed. The decision to trigger the
user-provided materialization hook is taken later based on the result of the
dummy cast having uses. However, it only has uses if other patterns constructed
new operations using the casted value as operand. If existing (legal)
operations use the replaced value, they may have not been updated to use the
casted value yet. The conversion infra would then delete the dummy cast first,
and then would replace the uses with now-invalid (null in the bast case) value.
When deciding whether to trigger cast materialization, check for liveness the
uses not only of the casted value, but also of all the values that it replaces.
This was discovered in the finalizing bufferize pass that cleans up
mutually-cancelling casts without touching other operations. It is not
impossible that there are other scenarios where the dialect converison infra
could produce invalid operand uses because of dummy casts erased too eagerly.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D119937
Previously `gpu-kernel-outlining` pass was also doing index computation sinking into gpu.launch before actual outlining.
Split ops sinking from `gpu-kernel-outlining` pass into separate pass, so users can use theirs own sinking pass before outlining.
To achieve old behavior users will need to call both passes: `-gpu-launch-sink-index-computations -gpu-kernel-outlining`.
Differential Revision: https://reviews.llvm.org/D119932
Currently some of the nested IR building inconsistently uses `nb` and `b`, it's very easy to call wrong builder outside of the current scope, so for simplicity all builders are always called `b`, and in nested IR building regions they just shadow the "parent" builder.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D120003
A very small refactoring, but a big impact on tests that expect an exact order.
This revision fixes the tests, but also makes them less brittle for similar
minor changes in the future!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119992
This commit adds a pattern to wrap a tensor.pad op with
an scf.if op to separate the cases where we don't need padding
(all pad sizes are actually zeros) and where we indeed need
padding.
This pattern is meant to handle padding inside tiled loops.
Under such cases the padding sizes typically depend on the
loop induction variables. Splitting them would allow treating
perfect tiles and edge tiles separately.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D117018
The pad-slice swap pattern generates `scf.if` and `tensor.generate`
to guard against zero-sized slices if it cannot prove the slice is
always non-zero. This is safe but quite conservative. It can be
unnecessary for cases where we know by problem definition such cases
does not exist, even if with dynamic shaped ops or unknown tile/slice
sizes, e.g., convolution padding size = 1 with kernel dim size = 3.
So this commit introduces a control to the pattern to specify
whether to generate the if constructs to handle such cases better,
given that once the if constructs is materialized, it's very hard
to analyze and simplify.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D117017
Fusion of `linalg.generic` with
`tensor.expand_shape/tensor.collapse_shape` currently handles fusion
with reshape by expanding the dimensionality of the `linalg.generic`
operation. This helps fuse elementwise operations better since they
are fused at the highest dimensionality while keeping all indexing
maps involved projected permutations. The intent of these is to push
the reshape to the boundaries of functions.
The presence of named ops (or other ops across which the reshape
cannot be propagated) stops the propagation to the edges of the
function. At this stage, the converse patterns that fold the reshapes
with generic ops by collapsing the dimensions of the generic op can
push the reshape towards edges. In particular it helps the case where
reshapes exist in between named ops and generic ops.
`linalg.named_op` -> `tensor.expand_shape` -> `linalg.generic`
Pushing the reshape down will help fusion of `linalg.named_op` ->
`linalg.generic` using tile + fuse transformations.
This pattern is intended to replace the following patterns
1) FoldReshapeByLinearization : These patterns create indexing maps
that are not projected permutations that affect future
transformations. They are only useful for folding unit-dimensions.
2) PushReshapeByExpansion : This pattern has the same functionality
but has some restrictions
a) It tries to avoid creating new reshapes that limits its
applicability. The pattern added here can achieve the same
functionality through use of the `controlFn` that allows clients
of the pattern freedom to make this decision.
b) It does not work for ops with indexing semantics.
These patterns will be deprecated in a future patch.
Differential Revision: https://reviews.llvm.org/D119365
This change exposes printer flags in AsmState and AsmStateImpl. All functions
receiving AsmState as a parameter now use the flags from the AsmState instead of
taking an additional OpPrintingFlags parameter.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D119870
This allow user to register a callback that can annotate operations
during software pipelining. This allows user potential annotate op to
know what part of the pipeline they correspond to.
Differential Revision: https://reviews.llvm.org/D119866
Without results, there is no getType injected and so generating one in prefixed form doesn't result in any failures during C++ compilation.
Differential Revision: https://reviews.llvm.org/D119871
This test shows that when access patterns do not match (e.g. transposing
a row-wise sparse matrix into another row-wise sparse matrix), a conversion
operation in between can enable codegen (i.e. avoid cycle in iteration graph).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119864
Optional parameters with `defaultValue` set will be populated with that value if they aren't encountered during parsing. Moreover, parameters equal to their default values are elided when printing.
Depends on D118210
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118544
This change is needed when lowering alloca()-using code on targets
such as ROCDL that represent private scratch space as a separate
address space.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119775
Casting between scalable vectors and fixed-length vectors doesn't make
sense. If one of the operands is scalable, the other has to be scalable
to be able to guarantee they have the same shape at runtime.
Differential Revision: https://reviews.llvm.org/D119568
This patch changes the syntax of omp.atomic.update to allow the other
dialects to modify the variable with appropriate operations in the
region.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119522
Add verifier for gpu.alloc op to verify if the dimension operand counts
and symbol operand counts are same as their memref counterparts.
Differential Revision: https://reviews.llvm.org/D117427
Support ALLOW filters and DENY filters. This is needed for compatibility with existing code that specifies more complex op filters.
Differential Revision: https://reviews.llvm.org/D119820
Also, it seems Khronos has changed html spec format so small adjustment to script was needed.
Base op parsing is also probably broken.
Differential Revision: https://reviews.llvm.org/D119678
The only method to create a true dense tensor (i.e un-annotated) in MLIR-PyTACO
is through the from_array method. However, the annotated all dense tensors are
also implemented as true dense tensor currently. The PR fixes the
implementation to support annotated all dense sparse tensors.
Extend the tensor init method to support the construction of a tensor without
any sparsity annotation.
Change the tensor to_file method to only support writing unpacked sparse
tensors to file through the MLIR sparse tensor dialect.
Add unit tests for true dense tensors and all dense sparse tensors.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D119500
This patch changes the argument from template-IRBuilder to IRBuilderBase
thus allowing us to write less code while getting the location from a
builder.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119717
* While annoying, this is the only way to get C++ exception handling out of the happy path for normal iteration.
* Implements sq_length and sq_item for the sequence protocol (used for iteration, including list() construction).
* Implements mp_subscript for general use (i.e. foo[1] and foo[1:1]).
* For constructing a `list(op.results)`, this reduces the time from ~4-5us to ~1.5us on my machine (give or take measurement overhead) and eliminates C++ exceptions, which is a worthy goal in itself.
* Compared to a baseline of similar construction of a three-integer list, which takes 450ns (might just be measuring function call overhead).
* See issue discussed on the pybind side: https://github.com/pybind/pybind11/issues/2842
Differential Revision: https://reviews.llvm.org/D119691
Rationale:
empty line between main include for this file
moved include that actually defines code into right section
Note that this revision started as breaking up ops/attrs even more
(for bug https://github.com/llvm/llvm-project/issues/52748), but due
the the connection in Dialect.initalize(), this cannot be split further).
All heavy lifting refactoring was already done by River in previous cleanup.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119617
Adds a pointer type to EmitC. The emission of pointers is so far only
possible by using the `emitc.opaque` type
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D119337
Adapt the region builder signature to hand in the attributes of the created ops. The revision is a preparation step the support named ops that need access to the operation attributes during op creation.
Depends On D119692
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119693
Group and reorder the classed defined by comprehension.py and add type annotations.
Depends On D119126
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119692
Index attributes had no default value, which means the attribute values had to be set on the operation. This revision adds a default parameter to `IndexAttrDef`. After the change, every index attribute has to define a default value. For example, we may define the following strides attribute:
```
```
When using the operation the default stride is used if the strides attribute is not set. The mechanism is implemented using `DefaultValuedAttr`.
Additionally, the revision uses the naming index attribute instead of attribute more consistently, which is a preparation for follow up revisions that will introduce function attributes.
Depends On D119125
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D119126
This pass doesn't have any limitations specific to FuncOp and it will be useful to be able to run it on other ops (e.g. gpu.func).
Differential Revision: https://reviews.llvm.org/D119662
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.
The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
When lowering to memrefCopy call, the size for i1 type was calculated as 0.
Instead of using getTypeSizeInBits() and dividing by 8, we should just use getTypeSize().
Differential Revision: https://reviews.llvm.org/D119540
The lowering creates llvm.insertvalue with the rank value, so it needs to use
index type instead of 64 bit integer type. Otherwise, we get an error:
llvm.insertvalue' op Type mismatch: cannot insert 'i64' into '!llvm.struct<(i32, ptr<i8>)>'
Differential Revision: https://reviews.llvm.org/D119534
This patch simply adds an optional garbage collector attribute to LLVMFuncOp which maps 1:1 to the "gc" property of functions in LLVM.
Differential Revision: https://reviews.llvm.org/D119492
This allows operations to control the block ids used by the printer in nested regions.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D115849
Previously, OpDSL did not support rank polymorphism, which required a separate implementation of linalg.fill. This revision extends OpDSL to support rank polymorphism for a limited class of operations that access only scalars and tensors of rank zero. At operation instantiation time, it scales these scalar computations to multi-dimensional pointwise computations by replacing the empty indexing maps with identity index maps. The revision does not change the DSL itself, instead it adapts the Python emitter and the YAML generator to generate different indexing maps and and iterators depending on the rank of the first output.
Additionally, the revision introduces a `linalg.fill_tensor` operation that in a future revision shall replace the current handwritten `linalg.fill` operation. `linalg.fill_tensor` is thus only temporarily available and will be renamed to `linalg.fill`.
Reviewed By: nicolasvasilache, stellaraccident
Differential Revision: https://reviews.llvm.org/D119003
Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.
We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924
Differential Revision: https://reviews.llvm.org/D119191
These functions allow for defining pattern fragments usable within the `match` and `rewrite` sections of a pattern. The main structure of Constraints and Rewrites functions are the same, and are similar to functions in other languages; they contain a signature (i.e. name, argument list, result list) and a body:
```pdll
// Constraint that takes a value as an input, and produces a value:
Constraint Cst(arg: Value) -> Value { ... }
// Constraint that returns multiple values:
Constraint Cst() -> (result1: Value, result2: ValueRange);
```
When returning multiple results, each result can be optionally be named (the result of a Constraint/Rewrite in the case of multiple results is a tuple).
These body of a Constraint/Rewrite functions can be specified in several ways:
* Externally
In this case we are importing an external function (registered by the user outside of PDLL):
```pdll
Constraint Foo(op: Op);
Rewrite Bar();
```
* In PDLL (using PDLL constructs)
In this case, the body is defined using PDLL constructs:
```pdll
Rewrite BuildFooOp() {
// The result type of the Rewrite is inferred from the return.
return op<my_dialect.foo>;
}
// Constraints/Rewrites can also implement a lambda/expression
// body for simple one line bodies.
Rewrite BuildFooOp() => op<my_dialect.foo>;
```
* In PDLL (using a native/C++ code block)
In this case the body is specified using a C++(or potentially other language at some point) code block. When building PDLL in AOT mode this will generate a native constraint/rewrite and register it with the PDL bytecode.
```pdll
Rewrite BuildFooOp() -> Op<my_dialect.foo> [{
return rewriter.create<my_dialect::FooOp>(...);
}];
```
Differential Revision: https://reviews.llvm.org/D115836
This allows for defining simple patterns in a single line. The lambda
body of a Pattern expects a single operation rewrite statement:
```
Pattern => replace op<my_dialect.foo>(operands: ValueRange) with operands;
```
Differential Revision: https://reviews.llvm.org/D115835
Having clarified that executing the SerializeToHsaco pass can
depend on a ROCm installation, switch from calling lld as a library to
using the copy of lld guaranteed to be included in a ROCm install.
This removes the workaround introduced in D119277
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D119463
If the result operand has a unit leading dim it is removed from all operands.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119206
This patch factors out space information from IntegerPolyhedron, PresburgerSet
and PWMAFunction to PresburgerSpace and its extension with local variables,
PresburgerLocalSpace.
Generally any new data structure additions in Presburger library will require
space information. This patch removes the need to duplicate the space
information.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D119280
Fix fold-memref-subview-ops for affine.load/store. We need to expand out
the affine apply on its operands.
Differential Revision: https://reviews.llvm.org/D119402
Move expandAffineMap and expandAffineApplyExpr out to AffineUtils. This
is a useful method. The child revision uses it. NFC.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D119401
removed obsoleted TODO
removed strange Fp precision for coordinates
lined up meta data testing code for readability
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119377
TypeIDAllocator enables the allocation of new TypeIDs at runtime,
that are unique during the lifetime of the allocator.
NonMovableTypeIDOwner is a class used to define a new TypeID for each instance
of a class, using the instance address. This class cannot be copied or moved.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104534
While testing LLVM 14.0.0 rc1 on Solaris, I ran into a compile failure:
from /var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/lib/ExecutionEngine/SparseTensorUtils.cpp:22:
/usr/include/sys/types.h:103:16: error: conflicting declaration ‘typedef short int index_t’
103 | typedef short index_t;
| ^~~~~~~
In file included from
/var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/lib/ExecutionEngine/SparseTensorUtils.cpp:17:
/var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/include/mlir/ExecutionEngine/SparseTensorUtils.h:26:7:
note: previous declaration as ‘using index_t = uint64_t’
26 | using index_t = uint64_t;
| ^~~~~~~
The same issue had already occured in the past and fixed in D72619
<https://reviews.llvm.org/D72619>. More detailed explanation can also be
found there.
Tested on `amd64-pc-solaris2.11` and `sparcv9-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D119323
This makes getAliasingOpResult symmetric to getAliasingOpOperand. The previous implementation was confusing for users and implemented in such a way only because there are currently no bufferizable ops that have multiple aliasing OpResults.
Differential Revision: https://reviews.llvm.org/D119259
They used to be classes with a virtual `run` function. This was inconvenient because post analysis steps are stored in BufferizationOptions. Because of this design choice, BufferizationOptions were not copyable.
Differential Revision: https://reviews.llvm.org/D119258
Supports whitespace elements: ` ` and `\\n` as well as the "empty" whitespace `` that removes an otherwise printed space.
Depends on D118208
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118210
Reuse the higher precision F32 approximation for the F16 one (by expanding and
truncating). This is partly RFC as I'm not sure what the expectations are here
(e.g., these are only for F32 and should not be expanded, that reusing
higher-precision ones for lower precision is undesirable due to increased
compute cost and only approximations per exact type is preferred, or this is
appropriate [at least as fallback] but we need to see how to make it more
generic across all the patterns here).
Differential Revision: https://reviews.llvm.org/D118968
Implements optional attribute or type parameters, including support for such parameters in the assembly format `struct` directive. Also implements optional groups.
Depends on D117971
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118208
For 0-D as well as 1-D vectors, both these patterns should
return a failure as there is no need to collapse the shape
of the source. Currently, only 1-D vectors were handled. This
patch handles the 0-D case as well.
Reviewed By: Benoit, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119202
There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).
Add a Python method, output_sparse_tensor, to use sparse_tensor.out to write
a sparse tensor value to a file.
Modify the method that evaluates a tensor expression to return a pointer of the
MLIR sparse tensor for the result to delay the extraction of the coordinates and
non-zero values.
Implement the Tensor to_file method to evaluate the tensor assignment and write
the result to a file.
Add unit tests. Modify test golden files to reflect the change that TNS outputs
now have a comment line and two meta data lines.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D118956
The conversion to the new ControlFlow dialect didn't change the
GPUToROCDL pass - this commit fixes this issue.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D119188
Add support for computing an overapproximation of the number of integer points
in a polyhedron. The returned result is actually the number of integer points
one gets by computing the "rational shadow" obtained by projecting out the
local IDs, finding the minimal axis-parallel hyperrectangular approximation
of the shadow, and returning the number of integer points in that. This does
not currently support symbols.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D119228