NFC. Move out and expose affine scalar replacement utility through
affine utils. Renaming misleading forwardStoreToLoad ->
affineScalarReplace. Update a stale doc comment.
Differential Revision: https://reviews.llvm.org/D115495
Switch the attribute creation operations to use attr-dict-with-
keyword to avoid conflicts (in the case of pdl.attribute) and
confusion(in the case of pdl_interp.create_attribute) with
having a DictionaryAttr as a value and specifying the
attributes of the operation itself (as a dictionary).
Differential Revision: https://reviews.llvm.org/D114815
The results of a rewrite are optional, but we currently require
them to be present in the assembly format. This commit
makes the results component in the format optional.
Differential Revision: https://reviews.llvm.org/D114814
Custom ops that have no parser or printer should fall back to the dialect's parser and/or printer hooks. This avoids the need to define parsers and printers that simply dispatch to the dialect hook.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D115481
This patch adds the documentation for the operations `omp.atomic.read`,
`omp.atomic.write` and `omp.atomic.update`.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D115445
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered
This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.
And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.
In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110448
This patch factors out math functionality that is a subset of Presburger arithmetic and moves it from FlatAffineConstraints to Presburger/IntegerPolyhedron. This patch only moves some parts of the functionality planned to be moved, with subsequent patches moving more functionality. There are three main reasons for this:
1. This split makes the Presburger Library easier and more flexible to use
across MLIR, by not depending on IR.
2. This split allows the Presburger library to be developed independently from
Affine Analysis, with Affine Analysis using this library.
3. With more functionality being upstreamed to the Presburger Library, the
mlir/Analysis directory will be cluttered with Presburger library components
since they depend on math functionality from FlatAffineConstraints. Moving this
functionality to the Presburger directory allows keeping the new functionality
in the Presburger directory.
This patch is part of an ongoing effort to make the Presburger Library easier to use. The motivation for this effort is the feedback received at the LLVM conference from Mehdi and others.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D114674
This patch provides functionality for simplifying `PresburgerSet`s by checking if any `FlatAffineConstraints` in the set is contained in another, and removing such redundant FACs.
This is part of a series of patches to provide functionality for [integer set coalescing](http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf) in MLIR.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D110617
This patch supports the atomic construct (update) following section 2.17.7 of OpenMP 5.0 standard. Also added tests and verifier for the same.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D112982
Count leading/trailing zeros are an existing LLVM intrinsic. Added LLVM
support for the intrinsics with lowerings from the math dialect to LLVM
dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D115206
This adds a new option `dialectFilter` to BufferizationOptions. Only ops from dialects that are allow-listed in the filter are bufferized. Other ops are left unbufferized. Note: This option requires `allowUnknownOps = true`.
To make use of `dialectFilter`, BufferizationOptions or BufferizationState must be passed to various helper functions.
The purpose of this change is to provide a better infrastructure for partial bufferization, which will be fully activated in a subsequent change.
Differential Revision: https://reviews.llvm.org/D114691
The new form of printing attribute in the declarative assembly is eliding the `#dialect.mnemonic` prefix to only keep the `<....>` part.
Differential Revision: https://reviews.llvm.org/D113873
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115091
These functions are generic utility functions that operates on
affine ops within SCF regions. Moving them to their own files
for a better code structure, instead of mixing with loop
specialization logic.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115245
This change mainly changes the API. There is no mentioning of FuncOps in ComprehensiveBufferize anymore.
Also, bufferize methods of the op interface are called for ops without tensor operands/results if they have a region.
Differential Revision: https://reviews.llvm.org/D115212
For a 1x1 weight and stride of 1, the input/weight can be reshaped and
multiplied elementwise then reshaped back
Reviewed By: rsuderman, KoolJBlack
Differential Revision: https://reviews.llvm.org/D115207
Make fields private and clean up the interface. In particular, BufferizableOpInterface::bufferize no longer has access to `aliasInfo`. This was potentially dangerous because some of the ops registered in BufferizationAliasInfo may have been deleted.
Differential Revision: https://reviews.llvm.org/D114931
Fixed the tosa.conv2d to tosa.fully_connected canonicalization for incorrect
output channels. Included uptes to tests to include checks for the result
shapes during canonicalization.
This allows conv2d to transform to the simpler fully_connected operation.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D115170
Conversion of LLVM named structs leads to them being renamed since we cannot
modify the body of the struct type once it is set. Previously, this applied to
all named struct types, even if their element types were not affected by the
conversion. Make this behvaior only applicable when element types are changed.
This requires making the LLVM dialect type-compatibility check recursively look
at the element types (arguably, it should have been doing than since the moment
the LLVM dialect type system stopped being closed). In addition, have a more
lax check for outer types only to avoid repeated check when necessary (e.g.,
parser, verifiers that are going to also look at the inner type).
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D115037
This is a cleanup of ModuleBufferization. Instead of storing information about writable function arguments in BufferizationAliasInfo, we can use isWritable and make the decision there, based on dialect-specifc bufferization state.
Differential Revision: https://reviews.llvm.org/D114930
Remove all function calls related to buffer equivalence from bufferize implementations.
Add a new PostAnalysisStep for scf.for that ensures that yielded values are equivalent to the corresponding BBArgs. (This was previously checked in `bufferize`.) This will be relaxed in a subsequent commit.
Note: This commit changes two test cases. These were broken by design
and should not have passed. With the new scf.for PostAnalysisStep, this
bug was fixed.
Differential Revision: https://reviews.llvm.org/D114927
Adding the default implementation of `getLoopIteratorTypes` and
`getLoopBounds` allows ExternalModels to override these methods.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115101
Collect equivalent BBArgs right after the equivalence analysis of the FuncOp and before bufferizing. This is in preparation of decoupling bufferization from aliasInfo.
Also gather equivalence info for CallOps, which was missing in the
previous commit.
Differential Revision: https://reviews.llvm.org/D114847
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:
- `vector.constant_mask [0] : vector<i1>` which gets lowered to
`arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
`arith.constant dense<true> : vector<i1>`
(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115023
Let the user registers their own handler to processing the matching
failure information.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D110896
Instead of checking buffer equivalence during bufferization, gather buffer equivalence information right after the analysis. This is in preparation of decoupling bufferization from BufferizationAliasInfo.
This change also fixes equivalence analysis for scf.if op results, which was not fully implemented. scf.if op results are equivalent to their corresponding yield values if both yield values are equivalent.
Differential Revision: https://reviews.llvm.org/D114774
Also set insertion point right before calling `bufferize`. No need to put an InsertionGuard anymore.
Differential Revision: https://reviews.llvm.org/D114928
This reverts commit 13bdb7ab4a. The commit introduced/uncovered an unintended bug in models containing Conv2D.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D115079
BufferizationState had map/lookup overloads for non-tensor values. This was necessary for IREE. There is now a better way to do this, so these overloads can be removed.
Differential Revision: https://reviews.llvm.org/D114929
Allow ops that are not bufferizable in the input IR. (Deactivated by default.)
bufferization::ToMemrefOp and bufferization::ToTensorOp are generated at the bufferization boundaries.
Differential Revision: https://reviews.llvm.org/D114669
Also store a reference to BufferizationOptions in BufferizationState. This is in preparation of adding support for partial bufferization.
Differential Revision: https://reviews.llvm.org/D114661
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114854
This change provides `BufferizableOpInterface` implementations for ops from the Bufferization dialects. These ops are needed at the bufferization boundaries for partial bufferization.
Differential Revision: https://reviews.llvm.org/D114618
Affine maps and integer sets previously relied on a single lock for creating unique instances. In a multi-threaded setting, this lock becomes a contention point. This commit updates AffineMap and IntegerSet to use StorageUniquer instead. StorageUniquer internally uses sharded locks and thread-local caches to reduce contention. It is already used for affine expressions, types and attributes. On my local machine, this gives me a 5X speedup for an application that manipulates a lot of affine maps and integer sets.
This commit also removes the integer set uniquer threshold. The threshold was used to avoid adding integer sets with a lot of constraints to the hash_map containing unique instances, but the constraints and the integer set were still allocated in the same allocator and never freed, thus not saving any space expect for the hash-map entry.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D114942
This patch implements detecting duplicate local identifiers by extracting their
division representation while merging local identifiers.
For example, given the FACs A, B:
```
A: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 2)
B: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 5)
```
The intersection of A and B without this patch would lead to the following FAC:
```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4], d2 = [x / 4], d3 = [x / 4]: d0 <= s0, d1 <= s0, d2 <= s0, d3 <= s0, x + y >= 2, x + y >= 5)
```
after this patch, merging of local ids will detect that `d0 = d2` and `d1 = d3`,
and the intersection of these two FACs will be (after removing duplicate constraints):
```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4] : d0 <= s0, d1 <= s0, x + y >= 2, x + y >= 5)
```
This reduces the number of constraints by 2 (constraints) + 4 (2 constraints for each extra division) for this case.
This is used to reduce the output size representation of operations like
PresburgerSet::subtract, PresburgerSet::intersect which require merging local
variables.
Reviewed By: arjunp, bondhugula
Differential Revision: https://reviews.llvm.org/D112867
Revert changes that were meant to be sent as a single commit with
summary for the differential review, but were accidently sent directly.
This reverts commit 3bc5353fc6.
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.
Reviewed By: ThomasRaoux, springerm
Differential Revision: https://reviews.llvm.org/D114803
However, since CallOps have no aliasing OpResults, their OpOperands always bufferize out-of-place.
This change removes `bufferizesToMemoryWrite` from `CallOpInterface`. This method was called, but its return value did not matter.
Differential Revision: https://reviews.llvm.org/D114616
Proper test for sparse tensor outputs is a single condition throughout
the whole tensor index expression (not a general conjunction, since this
may include other conditions that cause cancellation).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114810
This revision reintroduces tensor.insert_slice verification which seems
to have vanished over time: a verifier was initially introduced in cf9503c1b7
but for some reason the invalid.mlir was not properly updated; as time passed the verifier was not called anymore and later the code was deleted.
As a consequence, a non-negligible portion of tests has run astray using invalid
tensor.insert_slice semantics and needed to be fixed.
Also, extract isRankReducedType from TensorOps for better reuse
Originally, this facility was used by both tensor and memref forms but
it got copied around as dialects were split.
Differential Revision: https://reviews.llvm.org/D114715
The canonical type of the result of the `memref.subview` needs to make
sure that the previously dropped unit-dimensions are the ones dropped
for the canonicalized type as well. This means the generic
`inferRankReducedResultType` cannot be used. Instead the current
dropped dimensions need to be querried and the same need to be dropped.
Reviewed By: nicolasvasilache, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D114751
For a 1x1 weight and stride of 1, the input/weight can be reshaped and passed into a fully connected op then reshaped back
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D114757
Add the decompose patterns that lower higher dimensional convolutions to lower dimensional ones to CodegenStrategy and use CodegenStrategy to test the decompose patterns. Additionally, remove the assertion that checks the anchor op name is set in the CodegenStrategyTest pass. Removing the assertion allows us to simplify the pipelines used in the interchange and decompose tests.
Depends On D114797
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114798
The revision updates the convolution decomposition patterns to take a linalg transformation filter. The transformation filter in a later revision allows use the patterns from CodegenStrategy.
Depends On D114690
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114797
Add support for an empty anchor op string in vectorization. An empty anchor op string is useful after fusion when there are multiple different operations to vectorize.
Depends On D114689
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114690
This patch introduces a new conversion to convert bufferization.clone operations
into a memref.alloc and a memref.copy operation. This transformation is needed to
transform all remaining clones which "survive" all previous transformations, before
a given program is lowered further (to LLVM e.g.). Otherwise, these operations
cannot be handled anymore and lead to compile errors.
See: https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665
Differential Revision: https://reviews.llvm.org/D114233
* set_symbol_name, get_symbol_name, set_visibility, get_visibility, replace_all_symbol_uses, walk_symbol_tables
* In integrations I've been doing, I've been reaching for all of these to do both general IR manipulation and module merging.
* I don't love the replace_all_symbol_uses underlying APIs since they necessitate SYMBOL_COUNT walks and have various sharp edges. I'm hoping that whatever emerges eventually for this can still retain this simple API as a one-shot.
Differential Revision: https://reviews.llvm.org/D114687
Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114399
We check whether the maximum index of dimensional identifier present
in the result expressions is less than dimCount (number of dimensional
identifiers) argument passed in the AffineMap::get() and the maximum index
of symbolic identifier present in the result expressions is less than
symbolCount (number of symbolic identifiers) argument passed in AffineMap::get().
Reviewed By: nicolasvasilache, bondhugula
Differential Revision: https://reviews.llvm.org/D114238
This changes the op to produce `AnyVectorOfAnyRank` following mostly the code for 1-D vectors.
Depends On D114598
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114550
This changes the op to produce `AnyVectorOfAnyRank` and implements this by just
inserting the element (skipping the shuffle that we do for the 1-D case).
Depends On D114549
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114598
There is special logic for InsertSliceOp to check if a memcpy is needed. This change extracts that piece of code and makes it a PostAnalysisStep.
The purpose of this change is to untangle `bufferize` from BufferizationAliasInfo. (Not fully there yet.)
Differential Revision: https://reviews.llvm.org/D114513
This is commit 4 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).
This PR integrates the various components (root ordering algorithm, nondeterministic execution of PDL bytecode) to implement multi-root PDL matching. The main idea is for the pattern to specify mulitple candidate roots. The PDL-to-PDLInterp lowering selects one of these roots and "hangs" the pattern from this root, traversing the edges downwards (from operation to its operands) when possible and upwards (from values to its uses) when needed. The root is selected by invoking the optimal matching multiple times, once for each candidate root, and the connectors are determined form the optimal matching. The costs in the directed graph are equal to the number of upward edges that need to be traversed when connecting the given two candidate roots. It can be shown that, for this choice of the cost function, "hanging" the pattern an inner node is no better than from the optimal root.
The following three main additions were implemented as a part of this PR:
1. OperationPos predicate has been extended to allow tracing the operation accepting a value (the opposite of operation defining a value).
2. Predicate checking if two values are not equal - this is useful to ensure that we do not traverse the edge back downwards after we traversed it upwards.
3. Function for for building the cost graph among the candidate roots.
4. Updated buildPredicateList, building the predicates optimal branching has been determined.
Testing: unit tests (an integration test to follow once the stack of commits has landed)
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108550
This is commit 2 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).
This commit implements the features needed for the execution of the new operations pdl_interp.get_accepting_ops, pdl_interp.choose_op:
1. The implementation of the generation and execution of the two ops.
2. The addition of Stack of bytecode positions within the ByteCodeExecutor. This is needed because in pdl_interp.choose_op, we iterate over the values returned by pdl_interp.get_accepting_ops until we reach finalize. When we reach finalize, we need to return back to the position marked in the stack.
3. The functionality to extend the lifetime of values that cross the nondeterministic choice. The existing bytecode generator allocates the values to memory positions by representing the liveness of values as a collection of disjoint intervals over the matcher positions. This is akin to register allocation, and substantially reduces the footprint of the bytecode executor. However, because with iterative operation pdl_interp.choose_op, execution "returns" back, so any values whose original liveness cross the nondeterminstic choice must have their lifetime executed until finalize.
Testing: pdl-bytecode.mlir test
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D108547
This is commit 1 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).
These operations are:
* pdl.get_accepting_ops: Returns a list of operations accepting the given value or a range of values at the specified position. Thus if there are two operations `%op1 = "foo"(%val)` and `%op2 = "bar"(%val)` accepting a value at position 0, `%ops = pdl_interp.get_accepting_ops of %val : !pdl.value at 0` will return both of them. This allows us to traverse upwards from a value to operations accepting the value.
* pdl.choose_op: Iteratively chooses one operation from a range of operations. Therefore, writing `%op = pdl_interp.choose_op from %ops` in the example above will select either `%op1`or `%op2`.
Testing: Added the corresponding test cases to mlir/test/Dialect/PDLInterp/ops.mlir.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108543
Use composition instead of inheritance for storing dialect-specific bufferization state. This is in preparation of adding "tensor dialect"-specific bufferization state.
Differential Revision: https://reviews.llvm.org/D114508
Bufferization of function boundaries is extracted from ComprehensiveBufferize into a separate file. This will become its own build target in the future.
Differential Revision: https://reviews.llvm.org/D114226
These methods for results and arguments would create an ArrayRef full
of nullptrs when there were no argument attributes. This is problematic
because this result could not be passed to the FuncOp::build creator
without causing a segfault. Now the list will have empty attributes.
Differential Revision: https://reviews.llvm.org/D114358
Add a helper function to ControlFlowInterfaces for checking if two ops
are in mutually exclusive regions according to RegionBranchOpInterface.
Utilize this new helper in Linalg ComprehensiveBufferize. This makes the
analysis independent of the SCF dialect and generalizes it to other ops
that implement RegionBranchOpInterface.
Differential Revision: https://reviews.llvm.org/D114220
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.
Differential Revision: https://reviews.llvm.org/D114138
For synthesizing an op's implementation of the generated interface
from {Min|Max}Version, we need to define an `initializer` and
`mergeAction`. The `initializer` specifies the initial version,
and `mergeAction` specifies how version specifications from
different parts of the op should be merged to generate a final
version requirements.
Previously we use the specified version enum as the type for both
the initializer and thus the final return type. This means we need
to perform `static_cast` over some hopefully not used number (`~0u`)
as the initializer. This is quite opaque and sort of not guaranteed
to work. Also, there are ops that have an enum attribute where some
values declare version requirements (e.g., enumerant `B` requires
v1.1+) but some not (e.g., enumerant `A` requires nothing). Then a
concrete op instance with `A` will still declare it implements the
version interface (because interface implementation is static for
an op) but actually theirs no requirements for version.
So this commit changes to use an more explicit `llvm::Optional`
to wrap around the returned version enum. This should make it
more clear.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D108312
Add the makeComposedPadHighOp method which creates a new PadTensorOp if necessary. If the source to pad is actually the result of a sequence of padded LinalgOps, the method checks if padding is needed or if we can use the padded result of the padded LinalgOp sequence directly.
Example:
```
%0 = tensor.extract_slice %arg0 [%iv0, %iv1] [%sz0, %sz1]
%1 = linalg.pad_tensor %0 low[0, 0] high[...] { linalg.yield %cst }
%2 = linalg.matmul ins(...) outs(%1)
%3 = tensor.extract_slice %2 [0, 0] [%sz0, %sz1]
```
when padding %3 return %2 instead of introducing
```
%4 = linalg.pad_tensor %3 low[0, 0] high[...] { linalg.yield %cst }
```
Depends On D114161
Reviewed By: nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D114175
The alloc dealloc pair generation callback is really central to the
bufferization algorithm, it modifies the state in a way that affects
correctness. This is not really a configurable option. Moving it to
BufferizationState removes what was probably the reason it was added
as a callback.
Differential Revision: https://reviews.llvm.org/D114417
Remove duplicate `Pass` suffix from view-op-graph pass class name. The
extra suffix would lead to methods like registerViewOpGraphPassPass
being generated.
Differential Revision: https://reviews.llvm.org/D114459
Padding now can explicitly specify the padding value when non-zero is wanted.
This also includes bypassing pads when the pad does nothing.
Differential Revision: https://reviews.llvm.org/D113611
Transpose convolution decomposition is now performed in a separate pass. This
allows padding / constant propagation to be performed at the TOSA level. It
also adds support for striding when there is no dilation.
Differential Revision: https://reviews.llvm.org/D114409
This revision makes concrete use of 0-d vectors to extend the semantics of
InsertElementOp.
Reviewed By: dcaballe, pifon2a
Differential Revision: https://reviews.llvm.org/D114388
This revision starts making concrete use of 0-d vectors to extend the semantics of
ExtractElementOp.
In the process a new VectorOfAnyRank Tablegen OpBase.td is added to allow progressive transition to supporting 0-d vectors by gradually opting in.
Differential Revision: https://reviews.llvm.org/D114387
The interface method `bufferize` controls how (and it what order) nested ops are traversed. This simplifies bufferization of scf::ForOps and scf::IfOps, which used to need special rules in scf::YieldOp.
Differential Revision: https://reviews.llvm.org/D114057
The only users of returned futures from ThreadPool is llvm-reduce after
D113857.
There should be no cases where multiple threads wait on the same future,
so there should be no need to return std::shared_future<>. Instead return
plain std::future<>.
If users need to share a future between multiple threads, they can share
the futures themselves.
Reviewed By: Meinersbur, mehdi_amini
Differential Revision: https://reviews.llvm.org/D114363
Refactored two new parser APIs parseGenericOperationAfterOperands and
parseCustomOperationName out of parseGenericOperation and parseCustomOperation.
Motivation: Sometimes an op can be printed in a special way if certain criteria
is met. While parsing, we need to handle all the versions.
`parseGenericOperationAfterOperands` is handy in situation where we already
parsed the operands and decide to fall back to default parsing.
`parseCustomOperationName` is useful when we need to know details (dialect,
operation name etc.) about a parsed token meant to be an mlir operation.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D113719
This patch adds functionality to parse FlatAffineConstraints from a
StringRef with the intention to be used for unit tests. This should
make the construction of FlatAffineConstraints easier for testing
purposes.
The patch contains an example usage of the functionality in a unit test that
uses FlatAffineConstraints.
Reviewed By: bondhugula, grosser
Differential Revision: https://reviews.llvm.org/D113275
This reverts commit 3028bca6a9.
For some reason using FallbackModel works with CMake and does not work
with bazel. Using `ExternalModel` works. I will check what's going on
and resubmit tomorrow.
Remove the interface from op defs in MemRefOps.td and make it an external model.
This is the first PR of many that will move bufferization-related ops, interfaces, passes to Dialect/Bufferize.
RFC: https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712
It is still debated if the comprehensive bufferization has to be moved there as well, so for now I am just moving the "gradual" bufferization.
Differential Revision: https://reviews.llvm.org/D114147
This reverts commit a9e236bed8.
This broke the Windows build:
mlir\include\mlir/Dialect/X86Vector/Transforms.h(28): error C2061: syntax error: identifier 'uint'
MLIR supports recursive types but they could not be handled by the conversion
infrastructure directly as it would result in infinite recursion in
`convertType` for elemental types. Support this case by keeping the "call
stack" of nested type conversions in the TypeConverter class and by passing it
as an optional argument to the individual conversion callback. The callback can
then check if a specific type is present on the stack more than once to detect
and handle the recursive case.
This approach is preferred to the alternative approach of having a separate
callback dedicated to handling only the recursive case as the latter was
observed to introduce ~3% time overhead on a 50MB IR file even if it did not
contain recursive types.
This approach is also preferred to keeping a local stack in type converters
that need to handle recursive types as that would compose poorly in case of
out-of-tree or cross-project extensions.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D113579
The purpose of the change is to make clear whether the user is
retrieving the original function or the wrapper function, in line with
the invoke commands. This new functionality is useful for users that
already have defined their own packed interface, so they do not want the
extra layer of indirection, or for users wanting to the look at the
resulting primary function rather than the wrapper function.
All locations, except the python bindings now have a `lookupPacked`
method that matches the original `lookup` functionality. `lookup`
still exists, but with new semantics.
- `lookup` returns the function with a given name. If `bool f(int,int)`
is compiled, `lookup` will return a reference to `bool(*f)(int,int)`.
- `lookupPacked` returns the packed wrapper of the function with the
given name. If `bool f(int,int)` is compiled, `lookupPacked` will return
`void(*mlir_f)(void**)`.
Differential Revision: https://reviews.llvm.org/D114352
Remove the tile and fuse test pass that has been replaced by codegen strategy.
Depends On D114067
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114068
Add a pattern to apply the new tile and fuse on tensors method. Integrate the pattern into the CodegenStrategy and use the CodegenStrategy to implement the tests.
Depends On D114012
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114067
Tile and fuse failed if the outermost tile loop is a reduction dimension. Add the necessary check to handle outermost reductions and introduce a test case to verify the change.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114012
Step towards removing the hard coded behavior for this trait and to instead use common interface.
Differential Revision: https://reviews.llvm.org/D114208
- Adds hooks that allow SerializeTo* passes to arbitrarily transform
the produced LLVM Module before it is passed to the code generation
passes.
- Uses these hooks within the SerializeToHsaco pass in order to run
LLVM optimizations and to set the optimization level on the
TargetMachine.
- Adds an optLevel parameter to SerializeToHsaco
Future work may include moving much of what's been added to
SerializeToHsaco to SerializeToBlob, but that would require
confirmation from the NVVM backend maintainers that it would be
appropriate to do so.
Depends on D114107
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D114113
As discussed in D109579, this patch exposes `runRegionDCE` and
`eraseUnreachableBlocks` so they can be used as separate utilities in
other passes.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D114160
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.
LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.
Reviewed By: nicolasvasilache, jpienaar
Differential Revision: https://reviews.llvm.org/D114139
Add a new BufferizableOpInterface method `isNotConflicting` that can be used to implement custom analysis rules.
Differential Revision: https://reviews.llvm.org/D113961
NamedAttribute is currently represented as an std::pair, but this
creates an extremely clunky .first/.second API. This commit
converts it to a class, with better accessors (getName/getValue)
and also opens the door for more convenient API in the future.
Differential Revision: https://reviews.llvm.org/D113956
The current implementation is quite clunky; OperationName stores either an Identifier
or an AbstractOperation that corresponds to an operation. This has several problems:
* OperationNames created before and after an operation are registered are different
* Accessing the identifier name/dialect/etc. from an OperationName are overly branchy
- they need to dyn_cast a PointerUnion to check the state
This commit refactors this such that we create a single information struct for every
operation name, even operations that aren't registered yet. When an OperationName is
created for an unregistered operation, we only populate the name field. When the
operation is registered, we populate the remaining fields. With this we now have two
new classes: OperationName and RegisteredOperationName. These both point to the
same underlying operation information struct, but only RegisteredOperationName can
assume that the operation is actually registered. This leads to a much cleaner API, and
we can also move some AbstractOperation functionality directly to OperationName.
Differential Revision: https://reviews.llvm.org/D114049
There seems to be a consensus that we should allow 0D vectors:
https://llvm.discourse.group/t/should-we-have-0-d-vectors/3097
This commit is only the first step: it changes the verifier and the parser to
allow vectors like `vector<f32>` (but does not allow explicit 0 dimensions,
i.e., `vector<0xf32>` is not allowed).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114086
This struct was added and was intended to be used, but it was missed in the original patch.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D114041
This allows for using SFINAE partial specialization for DenseMapInfo.
In MLIR, this is particularly useful as it will allow for defining partial
specializations that support all Attribute, Op, and Type classes without
needing to specialize DenseMapInfo for each individual class.
Differential Revision: https://reviews.llvm.org/D113641
LLVM switchop currently only permits i32. Both LLVM IR and MLIR Standard switch permit other integer types leading to an illegal state when lowering an i8 switch from MLIR standard
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D113955