Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D123482
This supports the threadprivate directive in OpenMP dialect following
the OpenMP 5.1 [2.21.2] standard. Also lowering to LLVM IR using OpenMP
IRBduiler.
Reviewed By: kiranchandramohan, shraiysh, arnamoy10
Differential Revision: https://reviews.llvm.org/D123350
Adding annotations on as-needed bases, currently only for memrefCopy, but in general all C API functions that take pointers to memory allocated/initialized inside the jit-compiled code must be annotated, to be able to run with msan.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123557
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reland: Fixed custom calls to `-lower-matrix-intrinsics` in integration tests by replacing them with `-O0 -enable-matrix`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425
The method to add elementwise ops fusion patterns pulls in many other
patterns by default. The patterns to pull in along with the
elementwise op fusion should be upto the caller. Split the method to
pull in just the elementwise ops fusion pattern. Other cleanup changes
include
- Move the pattern for constant folding of generic ops (currently only
constant folds transpose) into a separate file, cause it is not
related to fusion
- Drop the uber LinalgElementwiseFusionOptions. With the
populateElementwiseOpsFusionPatterns being split, this has no
utility now.
- Drop defaults for the control function.
- Fusion of splat constants with generic ops doesnt need a control
function. It is always good to do.
Differential Revision: https://reviews.llvm.org/D123236
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425
Normalize some of the division and inequality expressions used,
which can improve performance. Also deduplicate some of the
normalization functionality throughout the Presburger library.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123314
When making the subtract implementation non-recursive, tail calls were
implemented by incrementing the level but not pushing a frame, and returning
was implemented as returning to the level corresponding to the number of frames in the stack.
This is incorrect, as there could be a case where we tail-recurse at `level`,
and then recurse at `level + 1`, pushing a frame. However, because the previous
frame was missing, this new frame would be interpreted as corresponding to
`level` and not `level + 1`. Fix this by removing the special handling of tail
calls and just doing them as normal recursion, as this is the simplest correct
implementation and handling them specifically would be a premature optimization.
The impact of this bug is only on performance as this can only lead to
unnecessary subtractions of the same disjuncts multiples times. As subtraction
is idempotent, and rationally empty disjuncts are always discarded, this
does not affect the output, so this patch does not include a regression test.
(This also does not affect termination.)
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123327
Rewrite tensor::ExtractSliceOp(vector::TransferWriteOp) to vector::TransferWriteOp(tensor::ExtractSliceOp) if the full slice is overwritten and inserted into another tensor. After this rewrite, the operations bufferize in-place since all of them work on the same %iter_arg slice.
For example:
```mlir
%0 = vector.transfer_write %vec, %init_tensor[%c0, %c0]
: vector<8x16xf32>, tensor<8x16xf32>
%1 = tensor.extract_slice %0[0, 0] [%sz0, %sz1] [1, 1]
: tensor<8x16xf32> to tensor<?x?xf32>
%r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<?x?xf32> into tensor<27x37xf32>
```
folds to
```mlir
%0 = tensor.extract_slice %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<27x37xf32> to tensor<?x?xf32>
%1 = vector.transfer_write %vec, %0[%c0, %c0]
: vector<8x16xf32>, tensor<?x?xf32>
%r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<?x?xf32> into tensor<27x37xf32>
Reviewed By: nicolasvasilache, hanchung
Differential Revision: https://reviews.llvm.org/D123190
Insert a buffer copy unless the dims are guaranteed to be collapsible. In the verifier, accept collapses unless they are guaranteed to be non-collapsible.
Differential Revision: https://reviews.llvm.org/D123316
Insert a cast if the two tensors with identical layout (that are passed to `arith.select`) have different layout maps after bufferization.
Differential Revision: https://reviews.llvm.org/D123321
This patch revamps the BranchOpInterface a bit and allows a proper implementation of what was previously `getMutableSuccessorOperands` for operations, which internally produce arguments to some of the block arguments. A motivating example for this would be an invoke op with a error handling path:
```
invoke %function(%0)
label ^success ^error(%1 : i32)
^error(%e: !error, %arg0 : i32):
...
```
The advantages of this are that any users of `BranchOpInterface` can still argue over remaining block argument operands (such as `%1` in the example above), as well as make use of the modifying capabilities to add more operands, erase an operand etc.
The way this patch implements that functionality is via a new class called `SuccessorOperands`, which is now returned by `getSuccessorOperands`. It basically contains an `unsigned` denoting how many operator produced operands exist, as well as a `MutableOperandRange`, which are the usual forwarded operands we are used to. The produced operands are assumed to the first few block arguments, followed by the forwarded operands afterwards. The role of `SuccessorOperands` is to provide various utility functions to modify and query the successor arguments from a `BranchOpInterface`.
Differential Revision: https://reviews.llvm.org/D123062
Reland Note: Adds a fix to properly mark a commutative operation as folded if we change the order
of its operands. This was uncovered by the fact that we no longer re-process constants.
This avoids accidentally reversing the order of constants during successive
application, e.g. when running the canonicalizer. This helps reduce the number
of iterations, and also avoids unnecessary changes to input IR.
Fixes#51892
Differential Revision: https://reviews.llvm.org/D122692
In addition, fixed a small bug with padding incorrectly inferring output shape for dynaic inputs in convolution
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D121872
This case is handled in neither the folding or canonicalization
patterns. The folding pattern cannot generate new broadcast ops,
so it should be handled by the canonicalization pattern.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D123307
Subtraction was previously implemented recursively. This refactors it to be
non-recursive to avoid issues with potential stack overflows.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123248
This patch enhances the CSE pass to deal with simple cases of duplicated
operations with MemoryEffects.
It allows the CSE pass to remove safely duplicate operations with the
MemoryEffects::Read that have no other side-effecting operations in
between. Other MemoryEffects::Read operation are allowed.
The use case is pretty simple so far so we can build on top of it to add
more features.
This patch is also meant to avoid a dedicated CSE pass in FIR and was
brought together afetr discussion on https://reviews.llvm.org/D112711.
It does not currently cover the full range of use cases described in
https://reviews.llvm.org/D112711 but the idea is to gradually enhance
the MLIR CSE pass to handle common use cases that can be used by
other dialects.
This patch takes advantage of the new CSE capabilities in Fir.
Reviewed By: mehdi_amini, rriddle, schweitz
Differential Revision: https://reviews.llvm.org/D122801
This commit refactors the expected form of native constraint and rewrite
functions, and greatly reduces the necessary user complexity required when
defining a native function. Namely, this commit adds in automatic processing
of the necessary PDLValue glue code, and allows for users to define
constraint/rewrite functions using the C++ types that they actually want to
use.
As an example, lets see a simple example rewrite defined today:
```
static void rewriteFn(PatternRewriter &rewriter, PDLResultList &results,
ArrayRef<PDLValue> args) {
ValueRange operandValues = args[0].cast<ValueRange>();
TypeRange typeValues = args[1].cast<TypeRange>();
...
// Create an operation at some point and pass it back to PDL.
Operation *op = rewriter.create<SomeOp>(...);
results.push_back(op);
}
```
After this commit, that same rewrite could be defined as:
```
static Operation *rewriteFn(PatternRewriter &rewriter ValueRange operandValues,
TypeRange typeValues) {
...
// Create an operation at some point and pass it back to PDL.
return rewriter.create<SomeOp>(...);
}
```
Differential Revision: https://reviews.llvm.org/D122086
Rationale:
Allocating the temporary buffers for access pattern expansion on the stack
(using alloca) is a bit too agressive, since it easily runs out of stack space
for large enveloping tensor dimensions. This revision changes the dynamic
allocation of these buffers with explicit alloc/dealloc pairs.
Reviewed By: bixia, wrengr
Differential Revision: https://reviews.llvm.org/D123253
Adds `mlirBlockDetach` to the CAPI to remove a block from its parent
region. Use it in the Python bindings to implement
`Block.append_to(region)`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123165
Refactor the operation of subtraction by
- removing the usage of SimplexRollbackScopeExit since this
can't be used in the iterative version
- reducing the number of stack variables to make the
iterative version easier to follow
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123156
Support returning arbitrary tensors from functions. Even those that are
not equivalent. To that end, additional information is gathered during
the analysis phase. In particular, which function args are aliasing with
which return values.
Also fix bugs in the current implementation when returning equivalent
tensors. Various unit tests are added to ensure that we have better test
coverage.
Note: Returning non-equivalent tensors is only allowed when
allowReturnAllocs is enabled. This functionality is useful for unit
testing and compatibility with other bufferizations such as the sparse
compiler. This is also towards using ModuleBufferization as a
replacement for --func-bufferize.
Differential Revision: https://reviews.llvm.org/D119120
* Bufferize FuncOp bodies and boundaries in the same loop. This is in preparation of moving FuncOp bufferization into an external model implementation.
* As a side effect, stop bufferization earlier if there was an error. (Do not continue bufferization, fewer error messages.)
* Run equivalence analysis of CallOps before the main analysis. This is needed so that equialvence info is propagated properly.
Differential Revision: https://reviews.llvm.org/D123208
* Store bbArg indices instead of BlockArguments, so that args can be changed during bufferizationn.
* Use type aliases for better readability.
Differential Revision: https://reviews.llvm.org/D123191
https://reviews.llvm.org/D122641 introduced fixes to the ExpandShapeOp verifier
but also introduced an artificial layout limitation that prevents the consideration of transposed layouts.
This revision fixes the omissions and reimplements the logic using saturated arithmetic which is more
idiomatic and avoids leaking internal implementation details.
Tests cases are added for transposed layouts.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D122845
Returning `std::array<uint8_t, N>` is better ergonomics for the hashing functions usage, instead of a `StringRef`:
* When returning `StringRef`, client code is "jumping through hoops" to do string manipulations instead of dealing with fixed array of bytes directly, which is more natural
* Returning `std::array<uint8_t, N>` avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a `StringRef`
As part of this patch also:
* Introduce `TruncatedBLAKE3` which is useful for using BLAKE3 as the hasher type for `HashBuilder` with non-default hash sizes.
* Make `MD5Result` inherit from `std::array<uint8_t, 16>` which improves & simplifies its API.
Differential Revision: https://reviews.llvm.org/D123100
This avoids a rather big bug where we were reserving
dense space for the ptx/idx in the first sparse dimension.
For example, using CSR for a 140874 x 140874 matrix with
3977139 nonzero would reserve the full 19845483876 space.
This revision fixes this for now, but we need to revisit
the reservation heuristic to make this better.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D123166
With the introduction of IntegerPolyhedron and IntegerRelation in Presburger
directory, the purpose of FlatAffineConstraints becomes redundant. For users
requiring Presburger arithmetic without IR information, Presburger library can
directly be used. For users requiring IR information,
FlatAffineValueConstraints can be used.
This patch merges FAC and FACV to remove redundancy of FAC.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D122476
Right now `populateVectorInsertExtractStridedSliceTransforms` contains
two categories of patterns, one for decomposing high-D insert/extract
strided slices, the other for lowering them to shuffle ops.
They are at different levels---the former is in the middle, while
the latter is a step of final lowering. Split them to give users
more control of which pattern to pick.
This means break down the previous `VectorExtractStridedSliceOpRewritePattern`,
which is doing two things together.
Also renamed those patterns to be clearer.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D123137
Add support for computing the symbolic integer lexmin of a polyhedron.
This finds, for every assignment to the symbols, the lexicographically
minimum value attained by the dimensions. For example, the symbolic lexmin
of the set
`(x, y)[a, b, c] : (a <= x, b <= x, x <= c)`
can be written as
```
x = a if b <= a, a <= c
x = b if a < b, b <= c
```
This also finds the set of assignments to the symbols that make the lexmin unbounded.
This was previously landed in da92f92621 and
reverted in b238c252e8 due to a build failure
in the code. Re-landing now with a fixed build.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122985
This removes tens of warnings from build logs that we can't do
anything about.
Reviewed By: pcf000
Differential Revision: https://reviews.llvm.org/D122927
Originally in the returnOp conversion, the result type was changing to bare
pointer if the type was a memref. This is incorrect as conversion to bare
pointer can only be done if the memref has static shape, strides and offset.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D123121
Add the description textual field to the Attr ODS class to mirror an
identical field in the Type ODS class. Add support for generating
documentation for attribute constraints defined using this field. This
ensures mlir-tblgen produces at least some documentation for dialects
that only define attribute constraints, such as DLTI.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123024
This change introduces two new methods: `finalizeSegment` and `appendIndex`; and removes three old methods: `endDim`, `appendCurrentPointer`, `appendIndex`. The two new methods better encapsulate their algorithms, thus allowing to remove repetitious code in several other places.
Depends On D122435
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D122625
Add support for computing the symbolic integer lexmin of a polyhedron.
This finds, for every assignment to the symbols, the lexicographically
minimum value attained by the dimensions. For example, the symbolic lexmin
of the set
`(x, y)[a, b, c] : (a <= x, b <= x, x <= c)`
can be written as
```
x = a if b <= a, a <= c
x = b if a < b, b <= c
```
This also finds the set of assignments to the symbols that make the lexmin unbounded.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122985
This commit restructures how TypeID is implemented to ideally avoid
the current problems related to shared libraries. This is done by changing
the "implicit" fallback path to use the name of the type, instead of using
a static template variable (which breaks shared libraries). The major downside to this
is that it adds some additional initialization costs for the implicit path. Given the
use of type names for uniqueness in the fallback, we also no longer allow types
defined in anonymous namespaces to have an implicit TypeID. To simplify defining
an ID for these classes, a new `MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID` macro
was added to allow for explicitly defining a TypeID directly on an internal class.
To help identify when types are using the fallback, `-debug-only=typeid` can be
used to log which types are using implicit ids.
This change generally only requires changes to the test passes, which are all defined
in anonymous namespaces, and thus can't use the fallback any longer.
Differential Revision: https://reviews.llvm.org/D122775
Apply scale should be optionally disabled when lowering via TosaToStandard.
In most cases it should persist until the lowering to specific backend.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D122948
Adds the ability to create external passes using the C-API. This allows passes
to be written in C or languages that use the C-bindings.
Differential Revision: https://reviews.llvm.org/D121866
This patch changes the implementation of SetCoalescer to use PresburgerSpace
instead of reimplementing parts of PresburgerSpace.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D122984
Add integer-exact checks for inequalities being separate and redundant in LexSimplex.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122921
This significantly simplifies the boilerplate necessary for passes
to define nested pass pipelines.
Differential Revision: https://reviews.llvm.org/D122880
ListOption currently uses llvm:🆑:list under the hood, but the usages
of ListOption are generally a tad different from llvm:🆑:list. This
commit codifies this by making ListOption implicitly comma separated,
and removes the explicit flag set for all of the current list options.
The new parsing for comma separation of ListOption also adds in support
for skipping over delimited sub-ranges (i.e. {}, [], (), "", ''). This
more easily supports nested options that use those as part of the
format, and this constraint (balanced delimiters) is already codified
in the syntax of pass pipelines.
See https://discourse.llvm.org/t/list-of-lists-pass-option/5950 for
related discussion
Differential Revision: https://reviews.llvm.org/D122879
Prior to this change there were a number of places where the allocation and deallocation of SparseTensorCOO objects were not cleanly paired, leading to inconsistencies regarding whether each function released its tensor/coo arguments or not, as well as making it easy to run afoul of memory leaks, use-after-free, or double-free errors. This change cleans up the codegen vs runtime boundary to resolve those issues. Now, the only time the runtime library frees an object is either (a) because it's a function explicitly designed to do so, or (b) because the allocated object is entirely local to the function and would be a memory leak if not released. Thus, now the codegen takes complete responsibility for releasing any objects it caused to be allocated.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D122435
This patch modifies the name "integerRelations" and "relation" to refer to the
disjuncts in PresburgerRelation to "disjunct(s)". This is done to be
consistent with the rest of the interface.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D122892
Previously, when an input set had a duplicate division, the duplicates might
be removed by a call to mergeLocalIds due to being detected as being duplicate
for the first time. The subtraction implementation cannot handle existing
locals being removed, so this would lead to unexpected behaviour. Resolve this
by removing all the duplicates up front.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122826
LexSimplex cannot be made to support symbols for symbolic lexmin; this requires
a second class. In preparation for upstreaming support for symbolic lexmin,
keep the part of LexSimplex that are specific to non-symbolic lexmin in LexSimplex
and move the parts that are required to a common class LexSimplexBase for both to
inherit from.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122828
This patch modifies IntegerPolyhedron, IntegerRelation, PresburgerRelation,
PresburgerSet, PWMAFunction, constructors to take PresburgerSpace instead of
dimensions. This allows information present in PresburgerSpace to be carried
better and allows for a general interface.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D122842
This patch supports ordered clause specified without parameter in
worksharing-loop directive in the OpenMPIRBuilder and lowering MLIR to
LLVM IR.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D114940
This reverts commit 59bbc7a085.
This exposes an issue breaking the contract of
`applyPatternsAndFoldGreedily` where we "converge" without applying
remaining patterns.
This avoids accidentally reversing the order of constants during successive
application, e.g. when running the canonicalizer. This helps reduce the number
of iterations, and also avoids unnecessary changes to input IR.
Fixes#51892
Differential Revision: https://reviews.llvm.org/D122692
Bubble up extract_slice above Linalg operation.
A sequence of operations
%0 = linalg.<op> ... arg0, arg1, ...
%1 = tensor.extract_slice %0 ...
can be replaced with
%0 = tensor.extract_slice %arg0
%1 = tensor.extract_slice %arg1
%2 = linalg.<op> ... %0, %1, ...
This results in the reduce computation of the linalg operation.
The implementation uses the tiling utility functions. One difference
from the tiling process is that we don't need to insert the checking
code for the out-of-bound accesses. The use of the slice itself
represents that the code writer is sure about the boundary condition.
To avoid adding the boundary condtion check code, `omitPartialTileCheck`
is introduced for the tiling utility functions.
Differential Revision: https://reviews.llvm.org/D122437
This patch fixes a bug in LinearTransform::applyTo where it did not carry the
IdKind information, and instead treated every id as IdKind::Domain.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D122823
Previously, when updating the outputs matrix, the local offset was not being considered.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122812
Infer a tighter MemRef type instead of always falling back to the most dynamic MemRef type. This is inefficient and caused op verification errors.
Differential Revision: https://reviews.llvm.org/D122649
* Complete rewrite of the verifier.
* CollapseShapeOp verifier will be updated in a subsequent commit.
* Update and expand op documentation.
* Add a new builder that infers the result type based on the source type, result shape and reassociation indices. In essence, only the result layout map is inferred.
Differential Revision: https://reviews.llvm.org/D122641
For example, we could do the following eliminations:
fold vector.shuffle V1, V2, [0, 1, 2, 3] : <4xi32>, <2xi32> -> V1
fold vector.shuffle V1, V2, [4, 5] : <4xi32>, <2xi32> -> V2
Differential Revision: https://reviews.llvm.org/D122706
This reverts commit 5630143af3. The
implementation in this commit was incorrect. Also, handling this representation
of equalities in the upcoming support for symbolic lexicographic minimization
makes that patch much more complex. It will be easier to review that without
this representaiton and then reintroduce the fixed column representation
later, hence the revert rather than a bug fix.
linalg.generic can also take scalars instead of tensors, which
tensor.cast doesn't support. We don't have an easy way to cast between
scalars and tensors so just keep the linalg.generic in those cases.
Differential Revision: https://reviews.llvm.org/D122575
We are using "enable-index-optimizations" and "indexOptimizations" as
names for an optimization that consists of using i32 for indices within
a vector. For instance, when building a vector comparison for mask
generation. The name is confusing and suggests a scope beyond these
vector indices. This change makes the function of the option explicit
in its name.
Differential Revision: https://reviews.llvm.org/D122415
A Block is optionally allocated & leaks in case of failed parse. Inline the
function and ensure Block gets freed unless parse is successful.
Differential Revision: https://reviews.llvm.org/D122112
The change of https://reviews.llvm.org/D121513#3411651
has caused a build error when building with clang:
/mnt/vss/_work/1/llvm-project/mlir/lib/Dialect/Tosa/IR/TosaOps.cpp:599:26: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
ReduceFolder(ReduceAllOp);
Reviewed By: hpmorgan, Mogball
Differential Revision: https://reviews.llvm.org/D122599
Adds another pointer to the union in RegionRange to allow RegionRange to work on ArrayRef<Region *> (i.e. vectors of Region *).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D122514
(This was a TODO from the initial patch).
The control-flow sink utility accepts a callback that is used to sink an operation into a region.
The `moveIntoRegion` is called on the same operation and region that return true for `shouldMoveIntoRegion`.
The callback must preserve the dominance of the operation within the region. In the default control-flow
sink implementation, this is moving the operation to the start of the entry block.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D122445
This has been on _Both for a couple of weeks. Flip usages in core with
intention to flip flag to _Prefixed in follow up. Needed to add a couple
of helper methods in AffineOps and Linalg to facilitate a pure flag flip
in follow up as some of these classes are used in templates and so
sensitive to Vector dialect changes.
Differential Revision: https://reviews.llvm.org/D122151
- Adds default implementations of `isDefinedOutsideOfLoop` and `moveOutOfLoop` since 99% of all implementations of these functions were identical
- `moveOutOfLoop` takes one operation and doesn't return anything anymore. 100% of all implementations of this function would always return `success` and uses would either respond with a pass failure or an `llvm_unreachable`.
This revision supports padding only a subset of the iteration dimensions via an additional padding-dimensions parameter. This control allows us to pad an operation in multiple steps. For example, one may want to pad only the output dimensions of a producer matmul fused into a consumer loop nest, before tiling and padding its reduction dimension.
Depends On D122309
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D122560
Pass the padding options using arrays instead of lambdas. In particular pass the padding value as string and use the argument parser to create the padding value. Arrays are a more natural choice that matches the current use cases and avoids converting arrays to lambdas.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D122309
This patch adds the ReductionClauseInterface and also adds reduction
support for `omp.parallel` operation.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D122402
This patch adds MLIR NVVM support for the various NVPTX `mma.sync`
operations. There are a number of possible data type, shape,
and other attribute combinations supported by the operation, so a
custom assebmly format is added and attributes are inferred where
possible.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D122410
Use "enable-vla-vectorization=vla" to generate a vector length agnostic
loops during vectorization. This option works for vectorization strategy 2.
Differential Revision: https://reviews.llvm.org/D118379
The way vector.create_mask is currently lowered is
vector-length-dependent, and therefore incompatible with scalable vector
types. This patch adds an alternative lowering path for create_mask
operations that return a scalable vector mask.
Differential Revision: https://reviews.llvm.org/D118248
This patch is a cleanup patch that merges PresburgerLocalSpace and
PresburgerSpace. Asserting that there are no locals is shifted to the
users of PresburgerSpace themselves.
The reasoning for this patch is that PresburgerLocalSpace did not contribute
much and only introduced additional complexity as locals could still be present
in PresburgerSpace, just not writable. This could introduce problems if a
PresburgerSpace with locals was copied to PresburgerLocalSpace which expected
no locals in a PresburgerSpace.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D122353
This transformation allow to break up a reduction dimension in a
parallel and a reduction dimension. This is followed by a separate
reduction op. This allows to generate tree reduction which is beneficial
on target allowing to take advantage parallelism.
Differential Revision: https://reviews.llvm.org/D122045
This patch adds the nowait parameter to `createSingle` in
OpenMPIRBuilder and handling for IR generation from OpenMP Dialect.
Also added tests for the same.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D122371
Previously, only LinalgOps whose operands are defined by an ExtractSliceOp could be padded. The revision supports walking a use-def chain of LinalgOps to find an ExtractSliceOp.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D122116
This revision introduces a heuristic to stop fusion for shape-only tensors. A shape-only tensor only defines the shape of the consumer computation while the data is not used. Pure producer consumer fusion thus shall not fuse the producer of a shape-only tensor. In particular, since the shape-only tensor will have other uses that actually consume the data.
The revision enables fusion for consumers that have two uses of the same tensor. One as input operand and one as shape-only output operand. In these cases, we want to fuse only the input operand and avoid output fusion via iteration argument.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D120981
Create the AffineMinOp used to compute the padding width in canonical form and update the tests.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D122311
This patch adds translation from omp.single to LLVM IR.
Depends on D122288
Reviewed By: ftynse, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D122297
Add a baseline implementation of support for local ids for `PWMAFunction::valueAt`. This can be made more efficient later if needed by handling locals with known div representations separately.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122144
In LexSimplex, instead of adding equalities as a pair of inequalities,
add them as a single row, move them into the basis, and keep them there.
There will always be a valid basis involving all non-redundant equalities. Such
equalities will then be ignored in some other operations, such as when looking
for pivot columns. This speeds them up a little bit.
More importantly, this is an important precursor patch to adding support for
symbolic integer lexmin, as this heuristic can sometimes make a big difference there.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122165
This simplifies many places where we just want to do something in a "transient context"
and return some value.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122172
It is often necessary to "rollback" IntegerRelations to an earlier state. Although providing full rollback support is non-trivial, we really only need to support the case where the only changes made are to append ids or append constraints, and then rollback these additions. This patch adds support to rollback in such situations by recording the number of ids and constraints of each kind and providing support to truncate the IntegerRelation to those counts by removing appended ids and constraints. This already simplifies subtraction a little bit and will also be useful in the implementation of symbolic integer lexmin.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122178
This provides a way to create an operation without manipulating
OperationState directly. This is useful for creating unregistered ops.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D120787
Emitting at error at EOF will emit the diagnostic past the end of the file. When emitting an error during parsing at EOF, emit it at the previous character.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D122295
This fixes a bufferization issue with ops that are not supported by the buffer deallocation pass when `allow-return-allocs=0`.
Differential Revision: https://reviews.llvm.org/D122304
This patch adds omp.single according to Section 2.8.2 of OpenMP 5.0.
Also added tests for the same.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D122288
Co-authored-by: Kiran Kumar T P <kirankumar.tp@amd.com>
This is a convenience function for adding new divisions to the Simplex given the numerator and denominator.
This will be needed for symbolic integer lexmin support.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D122159