Commit Graph

7284 Commits

Author SHA1 Message Date
River Riddle 02670c3f38 [PDLL] Add support for `op` Operation expressions
An operation expression in PDLL represents an MLIR operation. In
the match section of a pattern, this expression models one of
the input operations to the pattern. In the rewrite section of
a pattern, this expression models one of the operations to
create. The general structure of the operation expression is very
similar to that of the "generic form" of textual MLIR assembly:

```
let root = op<my_dialect.foo>(operands: ValueRange) {attr = attr: Attr} -> (resultTypes: TypeRange);
```

For now we only model the components that are within PDL, as PDL
gains support for blocks and regions so will this expression.

Differential Revision: https://reviews.llvm.org/D115296
2021-12-16 02:08:12 +00:00
River Riddle d7e7fdf3aa [PDLL] Add support for literal Attribute and Type expressions
This allows for using literal attributes and types within PDLL,
which simplifies building both constraints and rewriters. For
example, checking if an attribute is true is as simple as
`attr<"true">`.

Differential Revision: https://reviews.llvm.org/D115295
2021-12-16 02:08:12 +00:00
River Riddle 322691ab63 [PDLL] Add support for parsing pattern metadata
This allows for overriding the metadata of a pattern and
providing information such as the benefit, bounded recursion,
and more in the future.

Differential Revision: https://reviews.llvm.org/D115294
2021-12-16 02:08:12 +00:00
River Riddle 11d26bd143 [mlir][PDLL] Add an initial frontend for PDLL
This is a new pattern rewrite frontend designed from the ground
up to support MLIR constructs, and to target PDL. This frontend
language was proposed in https://llvm.discourse.group/t/rfc-pdll-a-new-declarative-rewrite-frontend-for-mlir/4798

This commit starts sketching out the base structure of the
frontend, and is intended to be a minimal starting point for
building up the language. It essentially contains support for
defining a pattern, variables, and erasing an operation. The
features mentioned in the proposal RFC (including IDE support)
will be added incrementally in followup commits.

I intend to upstream the documentation for the language in a
followup when a bit more of the pieces have been landed.

Differential Revision: https://reviews.llvm.org/D115093
2021-12-16 02:08:12 +00:00
Arjun P 15c8b8ad85 [MLIR] Simplex: Assert on the restoreRow return value instead of ignoring it
Previously, the LogicalResult return value of restoreRow was being ignored in
places where it was expected to always be success. Instead, check the result
and go to an `llvm_unreachable` if it turns out to be failure.
2021-12-16 03:27:50 +05:30
Hanhan Wang 501674dc3b [mlir][Vector] Further fix to avoid infinite loop in InnerOuterDimReductionConversion
If all the dims are reduction dims, it is already in inner-most/outer-most
reduction form.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D115820
2021-12-15 13:54:15 -08:00
Mogball c7103810bd [mlir][scf] Add getNumRegionInvocations to IfOp
Implements the RegionBranchOpInterface method getNumRegionInvocations to `scf::IfOp` so that, when the condition is constant, the number of region executions can be analyzed by `NumberOfExecutions`.

Reviewed By: jpienaar, ftynse

Differential Revision: https://reviews.llvm.org/D115087
2021-12-15 14:56:20 +00:00
Matthias Springer 417014170b [mlir][linalg][bufferize] Replace remaining bvm usage with new API
* Call `replaceOp` instead of `mapBuffer`.
* Remove bvm and all helper functions around bvm.
* Simplify FuncOp bufferization and rely on existing functionality to generate ToMemrefOps for function BlockArguments.

Differential Revision: https://reviews.llvm.org/D115515
2021-12-15 23:21:39 +09:00
gysit b7f2c108eb [mlir][linalg] Replace LinalgOps.h and LinalgTypes.h by a single header.
After removing the range type, Linalg does not define any type. The revision thus consolidates the LinalgOps.h and LinalgTypes.h into a single Linalg.h header. Additionally, LinalgTypes.cpp is renamed to LinalgDialect.cpp to follow the convention adopted by other dialects such as the tensor dialect.

Depends On D115727

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115728
2021-12-15 12:15:03 +00:00
Tres Popp fc64a164ec [mlir] Use rewriter in linalg Detensorize
This is to allow rollbacks on failures of dialect lowering to succeed.

Differential Revision: https://reviews.llvm.org/D115789
2021-12-15 11:28:18 +01:00
Shraiysh Vaishay 3425b1bcb4 [mlir][OpenMP] omp.sections and omp.section lowering to LLVM IR
This patch adds lowering from omp.sections and omp.section (simple lowering along with the nowait clause) to LLVM IR.
Tests for the same are also added.

Reviewed By: ftynse, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D115030
2021-12-15 15:41:12 +05:30
Matthias Springer 1652871473 [mlir][linalg][bufferize] Reimplementation of TiledLoopOp bufferization
Instead of modifying the existing linalg.tiled_loop op, create a new op with memref input/outputs and delete the old op.

Differential Revision: https://reviews.llvm.org/D115493
2021-12-15 18:45:29 +09:00
Matthias Springer a5927737da [mlir][linalg][bufferize] Reimplementation of scf.if bufferization
Instead of modifying the existing scf.if op, create a new op with memref OpOperands/OpResults and delete the old op.

New allocations / other memrefs can now be yielded from the op. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.

Differential Revision: https://reviews.llvm.org/D115491
2021-12-15 18:40:54 +09:00
Javier Setoain a4830d14ed [mlir][RFC] Add scalable dimensions to VectorType
With VectorType supporting scalable dimensions, we don't need many of
the operations currently present in ArmSVE, like mask generation and
basic arithmetic instructions. Therefore, this patch also gets
rid of those.

Having built-in scalable vector support also simplifies the lowering of
scalable vector dialects down to LLVMIR.

Scalable dimensions are indicated with the scalable dimensions
between square brackets:

        vector<[4]xf32>

Is a scalable vector of 4 single precission floating point elements.

More generally, a VectorType can have a set of fixed-length dimensions
followed by a set of scalable dimensions:

        vector<2x[4x4]xf32>

Is a vector with 2 scalable 4x4 vectors of single precission floating
point elements.

The scale of the scalable dimensions can be obtained with the Vector
operation:

        %vs = vector.vscale

This change is being discussed in the discourse RFC:

https://llvm.discourse.group/t/rfc-add-built-in-support-for-scalable-vector-types/4484

Differential Revision: https://reviews.llvm.org/D111819
2021-12-15 09:31:37 +00:00
Matthias Springer 7161aa06ef [mlir][linalg][bufferize] Reimplementation of scf.for bufferization
Instead of modifying the existing scf.for op, create a new op with memref OpOperands/OpResults and delete the old op.

New allocations / other memrefs can now be yielded from the loop. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.

This change also introduces `replaceOp`, which will be utilized by all other `bufferize` implementations in future commits. Bufferization will then no longer rely on old (pre-bufferize) ops to DCE away. Instead old ops are deleted on the spot. This improves debuggability because there won't be any duplicate ops anymore (bufferized + not-yet-bufferized) when dumping IR during bufferization. It is also less fragile because unbufferized IR can no longer silently "hang around" due to an implementation bug.

Differential Revision: https://reviews.llvm.org/D114926
2021-12-15 18:29:22 +09:00
gysit 9912bed730 [mlir][linalg] Remove RangeOp and RangeType.
Remove the RangeOp and the RangeType that are not actively used anymore. After removing RangeType, the LinalgTypes header only includes the generated dialect header.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115727
2021-12-15 07:19:10 +00:00
Thomas Raoux 7d97678df7 [mlir][linalg] Break up linalg vectorization pre-condition
Break up the vectorization pre-condition into the part checking for
static shape and the rest checking if the linalg op is supported by
vectorization. This allows checking if an op could be vectorized if it
had static shapes.

Differential Revision: https://reviews.llvm.org/D115754
2021-12-14 13:38:14 -08:00
Lei Zhang 96130b5dc7 [mlir][spirv] Support size-1 vector/tensor constant during conversion
Reviewed By: ThomasRaoux, mravishankar

Differential Revision: https://reviews.llvm.org/D115518
2021-12-14 15:58:08 -05:00
Krzysztof Drewniak c57b2a0635 [MLIR][GPU] Make max flat work group size for ROCDL kernels configurable
While the default value for the amdgpu-flat-work-group-size attribute,
"1, 256", matches the defaults from Clang, some users of the ROCDL dialect,
namely Tensorflow, use larger workgroups, such as 1024. Therefore,
instead of hardcoding this value, we add a rocdl.max_flat_work_group_size
attribute that can be set on GPU kernels to override the default value.

Reviewed By: whchung

Differential Revision: https://reviews.llvm.org/D115741
2021-12-14 20:12:23 +00:00
Alexander Belyaev a82a19c137 [mlir] Add a missing pattern to bufferize tensor.rank.
Differential Revision: https://reviews.llvm.org/D115745
2021-12-14 20:04:57 +01:00
Aart Bik 03fe15cee6 [mlir][sparse] speed up sparse tensor file I/O by more than 2x
data point using the 3-dim tensor nell-2.tns

MLIR:
READ FILE INTO COO: 24424.369294 ms ---> improves to ----> 9638.501044 ms
SORT COO BEFORE PACK: 762.834831 ms
PACK COO TO TENSOR: 1243.376245 ms

TACO:
b file read: 13270.9 ms
b pack: 7137.74 ms
b size: (12092 x 9184 x 28818), 925300328 bytes

https://github.com/llvm/llvm-project/issues/52679

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D115696
2021-12-14 08:30:31 -08:00
Nikita Popov d733f2c68c [OpenMPIRBuilder] Support opaque pointers in reduction handling
Make the reduction handling in OpenMPIRBuilder compatible with
opaque pointers by explicitly storing the element type in ReductionInfo,
and also passing it to the atomic reduction callback, as at least
the ones in the test need the type there.

This doesn't make things fully compatible yet, there are other
uses of element types in this class. I also left one
getPointerElementType() call in mlir, because I'm not familiar
with that area.

Differential Revison: https://reviews.llvm.org/D115638
2021-12-14 14:07:47 +01:00
Matthias Springer 81eece7f26 [mlir][linalg][bufferize] Debug output as IR attributes
Instead of printing analysis debug information to stderr, annotate the IR. This makes it easier to understand decisions made by the analysis, especially in larger input IR.

Differential Revision: https://reviews.llvm.org/D115575
2021-12-14 21:29:43 +09:00
Alexander Belyaev 15f8f3e20a [mlir] Split std.rank into tensor.rank and memref.rank.
Move `std.rank` similarly to how `std.dim` was moved to TensorOps and MemRefOps.

Differential Revision: https://reviews.llvm.org/D115665
2021-12-14 10:15:55 +01:00
Markus Böck ef5be2bb16 [mlir] Implement `DataLayoutTypeInterface` for `LLVMArrayType`
Implementation of the interface allows querying the size and alignments of an LLVMArrayType as well as query the size and alignment of a struct containing an LLVMArrayType.
The implementation should yield the same results as llvm::DataLayout, including support for over aligned element types.
There is no customization point for adjusting an arrays alignment; it is simply taken from the element type.

Differential Revision: https://reviews.llvm.org/D115704
2021-12-14 09:35:45 +01:00
Benoit Jacob aba437ceb2 [mlir][Vector] Patterns flattening vector transfers to 1D
This is the second part of https://reviews.llvm.org/D114993 after slicing
into 2 independent commits.

This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.

For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.

The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.

The new patterns here are only enabled for now by
 -test-vector-transfer-flatten-patterns.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114993
2021-12-13 22:39:41 +00:00
Benoit Jacob 0aea49a730 [mlir][Vector] Patterns flattening vector transfers to 1D
This is the first part of https://reviews.llvm.org/D114993 which has been
split into small independent commits.

This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.

For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.

The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.

The new patterns here are only enabled for now by
 -test-vector-transfer-flatten-patterns.

Reviewed By: nicolasvasilache
2021-12-13 21:49:04 +00:00
Stella Laurenzo c10995a8ad Re-apply [NFC] Generalize a couple of passes so they can operate on any FunctionLike op.
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
  * Since these passes are now generic, they do not automatically nest with pass managers set up for implicit nesting.
  * The Detensorize pass must run on a FunctionLike, and this requires explicit nesting.
* Addressed missed comments from the original and per-suggestion removed the assert on FunctionLike in ElementwiseToLinalg and DropUnitDims.cpp, which also is what was causing the integration test to fail.

This reverts commit aa8815e42e.

Differential Revision: https://reviews.llvm.org/D115671
2021-12-13 13:33:00 -08:00
Nicolas Vasilache 1c4d9ae83d [mlir][ExecutionEngine] Fix native dependencies for AsmParser and Printer
This is a post-commit fix for https://reviews.llvm.org/D114338 which was landed as
https://reviews.llvm.org/rG050cc1cd6e6882eadba6e5ea7b588ca0b8aa1b12

Differential Revision: https://reviews.llvm.org/D115666
2021-12-13 21:12:12 +00:00
Mehdi Amini aa8815e42e Revert "[NFC] Generalize a couple of passes so they can operate on any FunctionLike op."
This reverts commit 34696e6542.

A test is crashing on the mlir-nvidia bot.
2021-12-13 20:41:25 +00:00
Bixia Zheng 2f49e6b0db Support sparse tensor output.
Add convertFromMLIRSparseTensor to the supporting C shared library to convert
SparseTensorStorage to COO-flavor format.

Add Python routine sparse_tensor_to_coo_tensor to convert sparse tensor storage
pointer to numpy values for COO-flavor format tensor.

Add a Python test for sparse tensor output.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D115557
2021-12-13 12:06:33 -08:00
Stella Laurenzo 34696e6542 [NFC] Generalize a couple of passes so they can operate on any FunctionLike op.
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
  * Since these passes are now generic, they do not automatically nest with pass managers set up for that.
  * If running them over nested functions, you must nest explicitly. Upstream has adopted this style but *-opt still has some uses of implicit pipelines via args. See tests for argument changes needed.

Differential Revision: https://reviews.llvm.org/D115645
2021-12-13 12:01:53 -08:00
gysit 8fc0525a15 [mlir][linalg] Stage application of pad tensor op vectoriztaion.
Adapt the LinalgStrategyVectorizationPattern pass to apply the vectorization patterns in two stages. The change ensures the generic pad tensor op vectorization pattern does not run too early. Additionally, the revision adds the transfer op canonicalization patterns to the set of applied patterns, since they are needed to enable efficient vectorization for rank-reduced convolutions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115627
2021-12-13 19:49:35 +00:00
Alexander Belyaev 206365bf8f [mlir] Update comments that mention `linalg.collapse/expand` shape. 2021-12-13 20:35:34 +01:00
Lei Zhang f56b1d813f [mlir][spirv] Use ScopedPrinter in deserialization debugging
This gives us better debugging print as it supports indent
levels and other nice features.

Reviewed By: Hardcode84

Differential Revision: https://reviews.llvm.org/D115583
2021-12-13 10:51:56 -05:00
Lei Zhang 5e55a20119 [mlir][spirv] Serialize selection with separate header block
The previous "optimization" that tries to reuse existing block for
selection header block can be problematic for deserialization
because it effectively pulls in previous ops in the selection op's
enclosing block into the selection op's header. When deserializing,
those ops will be placed in the selection op's region. If any of
the previous ops has usage after the section op, it will break. That
is, the following IR cannot round trip:

```mlir
^bb:
  %def = ...
  spv.mlir.selection { ... }
  %use = spv.SomeOp %def
```

This commit removes the "optimization" to always create new blocks
for the selection header.

Along the way, also made error reporting better in deserialization
by turning asserts into proper errors and add check of uses outside
of sinked structured control flow region blocks.

Reviewed By: Hardcode84

Differential Revision: https://reviews.llvm.org/D115582
2021-12-13 10:42:26 -05:00
gysit 6c85a49e22 [mlir][memref] Use current source type in getCanonicalSubViewResultType.
Use the current instead of the new source type to compute the rank-reduction map in getCanonicalSubViewResultType. Otherwise, the computation of the rank-reduction map fails when folding a cast into a subview since the strides of the new source type cannot be related to the strides of the current result type.

Depends On D115428

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115446
2021-12-13 14:50:41 +00:00
Markus Böck 664cc9312c [mlir] Implement `DataLayoutTypeInterface` for `LLVMStructType`
Using this implementation of the interface it is possible to query the size, ABI alignment as well as the preferred alignment of a struct. It should yield the same results as LLVMs `llvm::DataLayout` on an equivalent `llvm::StructType`, including for packed structs.

Additionally it is also possible to increase the ABI and preferred alignment using a data layout entry with the type `llvm.struct<()>, which serves the same functionality as the `a:` component in LLVMs data layout string.

Differential Revision: https://reviews.llvm.org/D115600
2021-12-13 15:09:16 +01:00
gysit db7a2e9176 [mlir][linalg] Only compose PadTensorOps if no ExtractSliceOp is rank-reducing.
Do not compose pad tensor operations if the extract slice of the outer pad tensor operation is rank reducing. The inner extract slice op cannot be rank-reducing since it source type must match the desired type of the padding.

Depends On D115359

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115428
2021-12-13 13:01:30 +00:00
gysit 6859f8ed1e [mlir][linalg] Adapt the PadTensorOpVectorizationWithInsertSlicePattern matching.
Tighten the matcher of the PadTensorOpVectorizationWithInsertSlicePattern pattern. Only match if the PadOp result is used by the InsertSliceOp source. Fail if the result is used by the InsertSliceOp dest.

Depends On D115336

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115359
2021-12-13 12:55:07 +00:00
gysit f895e95138 [mlir][linalg] Make padding work for rank-reducing slice ops.
Adapt the computation of a static bounding box to take rank-reducing slice operations into account by filtering out reduced size one dimensions. The revision is needed to make padding work for decomposed convolution operations. The decomposition introduces rank reducing extract slice operations that previously let padding fail.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115336
2021-12-13 12:34:20 +00:00
Jacques Pienaar b743ff161b [mlir] Relax restriction on name location parsing
We currently restrict parsing of location to not allow nameloc being
nested inside nameloc. This restriction may be historical as there
doesn't seem to be a reason for it anymore (locations like this can be
constructed in C++ and they print fine). Relax this restriction in the
parser to allow this nesting.

Differential Revision: https://reviews.llvm.org/D115581
2021-12-12 08:06:59 -08:00
Jacques Pienaar efb7727a96 [mlir] Flag near misses in file splitting
Flags some potential cases where splitting isn't happening and so could result
in confusing results. Also update some test files where there were near misses
in splitting that seemed unintentional.

Differential Revision: https://reviews.llvm.org/D109636
2021-12-12 08:03:30 -08:00
Nicolas Vasilache 408553dd96 [mlir][Vector] Support 0-D vectors in `CreateMaskOp`
The 0-D case gets lowered in almost the same way that the 1-D case does
in VectorCreateMaskOpConversion. I also had to slightly update the
verifier for the op to always require exactly 1 operand in the 0-D case.

Depends On D115220

Reviewed by: ftynse

Differential revision: https://reviews.llvm.org/D115221
2021-12-12 13:32:29 +00:00
Arjun P d7ec4d0be3 [MLIR] PresburgerSet subtraction: fix bug where the set `b` was not restored properly on return
When subtracting `b \ c`, when there are divisions in `c`, these division
constraints get added to `b`. `b` must be restored to its original state
when returning, but these added divisions constraints were not removed in
one of the return paths. This patch fixes this and deduplicates the
restoration logic by encapuslating it in a lambda `restoreState`. The patch
also includes a regression test for the bug fix.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D115577
2021-12-12 17:14:04 +05:30
Lei Zhang 731676b10d [mlir][spirv] Fix nested control flow serialization
If we have a `spv.mlir.selection` op nested in a `spv.mlir.loop`
op, when serializing the loop's block, we might need to jump
from the selection op's merge block, which might be different
than the immediate MLIR IR predecessor block. But we still need
to get the block argument from the MLIR IR predecessor block.

Also, if the `spv.mlir.selection` is in the `spv.mlir.loop`'s
header block, we need to make sure `OpLoopMerge` is emitted
in the current block before start processing the nested selection
op. Otherwise we'll see the LoopMerge in the wrong SPIR-V
basic block.

Reviewed By: Hardcode84

Differential Revision: https://reviews.llvm.org/D115560
2021-12-11 14:47:19 -05:00
Jacques Pienaar 1ab3efac41 [mlir][python] Add fused location 2021-12-11 10:16:13 -08:00
Groverkss c6a8bec4c5 [MLIR][FlatAffineConstraints] Add support for extracting divisions with tighter bounds
This patch adds support for extracting divisions when the set contains bounds
which are tighter than the division bounds. For example:

```
     3q - i + 2 >= 0                       <-- Lower bound for 'q'
    -3q + i - 1 >= 0                       <-- Tighter upper bound for 'q'
```

Here, the actual upper bound for division for `q` would be `-3q + i >= 0`, but
since this actual upper bound is implied by a tighter upper bound, which awe can still
extract the divison.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D115096
2021-12-11 16:23:54 +05:30
Lei Zhang 3ed47bcc96 [mlir][spirv] Propagate LogicalResult in (de)serialization
`(void)` was added when LogicalResult was marked as non
discard. This commit cleans them up to properly propagate
failures.

Reviewed By: scotttodd

Differential Revision: https://reviews.llvm.org/D115541
2021-12-10 19:20:49 -05:00
Lei Zhang 222d7fc7f8 [mlir][spirv] Avoid duplicated Block decoration during serialization
It's legal per the Vulkan / SPIR-V spec; still it's better to avoid
such duplication to have cleaner blob and reduce the binary size.

Reviewed By: scotttodd

Differential Revision: https://reviews.llvm.org/D115532
2021-12-10 19:20:49 -05:00
Lei Zhang b289266cb2 [mlir][spirv] Add serialization control to emit symbol name
In SPIR-V, symbol names are encoded as `OpName` instructions.
They are not semantic impacting and can be omitted, which can
reduce the binary size.

Reviewed By: scotttodd

Differential Revision: https://reviews.llvm.org/D115531
2021-12-10 19:20:49 -05:00
Nicolas Vasilache f2e945a393 Revert "[mlir][tensor] Fix insert_slice + tensor cast overflow"
This reverts commit 5601821dae.

The prefix + canonical complete behavior is actually obsolete and should not be reintroduced.
Reverting.
2021-12-10 22:53:52 +00:00
Arjun P d6f9bb0321 [MLIR] FlatAffineConstraints::isIntegerEmpty: fix bug in computation of duals
The method that was previously used for computing dual variables was incorrect.
This was used in the integer emptiness check algorithm, where this bug could lead to much longer running times. (Due to the way it is used, this never results in an incorrect emptiness check result.)

This patch fixes the dual computation and adds some additional asserts that catch this bug, along with regression test cases that trigger the asserts when the incorrect dual computation is used.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D113803
2021-12-11 03:48:40 +05:30
Arjun P 98db55f108 [MLIR] IntegerPolyhedron: introduce getNumIdKind to replace calls to assertAtMostNumIdKind
Introduce a function `getNumIdKind` that returns the number of ids of the
specified kind. Remove the function `assertAtMostNumIdKind` and instead just
directly assert the inequality with a call to `getNumIdKind`.
2021-12-11 03:42:46 +05:30
Thomas Raoux f56933b263 [mlir][vector] NFC move vector unroll/distribute patterns to their own file
Differential Revision: https://reviews.llvm.org/D115548
2021-12-10 14:00:13 -08:00
Uday Bondhugula bc657b2eef [MLIR][NFC] Move out affine scalar replacement utility to affine utils
NFC. Move out and expose affine scalar replacement utility through
affine utils. Renaming misleading forwardStoreToLoad ->
affineScalarReplace. Update a stale doc comment.

Differential Revision: https://reviews.llvm.org/D115495
2021-12-11 03:26:42 +05:30
Nicolas Vasilache 5601821dae [mlir][tensor] Fix insert_slice + tensor cast overflow
InsertSliceOp may have subprefix semantics where missing trailing dimensions
are automatically inferred directly from the operand shape.
This revision fixes an overflow that occurs in such cases when the impl is based on the op rank.

Differential Revision: https://reviews.llvm.org/D115549
2021-12-10 21:41:26 +00:00
River Riddle 233e9476d8 [mlir:PDL] Allow non-bound pdl.attribute/pdl.type operations that create constants
This allows for passing in these attributes/types to constraints/rewrites as arguments.

Differential Revision: https://reviews.llvm.org/D114817
2021-12-10 19:38:43 +00:00
River Riddle 06c3b9c7be [mlir:PDL] Fix bugs in PDLPatternModule merging
* Constraints/Rewrites registered before a pattern was added were dropped
* Constraints/Rewrites may be registered multiple times (if different pattern sets depend on them)
* ModuleOp no longer has a terminator, so we shouldn't be removing the terminator from it

Differential Revision: https://reviews.llvm.org/D114816
2021-12-10 19:38:43 +00:00
Mogball 0845635eda [mlir][ir] Custom ops' parse/print fall back to dialect hooks
Custom ops that have no parser or printer should fall back to the dialect's parser and/or printer hooks. This avoids the need to define parsers and printers that simply dispatch to the dialect hook.

Reviewed By: mehdi_amini, rriddle

Differential Revision: https://reviews.llvm.org/D115481
2021-12-10 19:34:25 +00:00
Alexander Belyaev b618880e7b [mlir] Move `linalg.tensor_expand/collapse_shape` to TensorDialect.
RFC: https://llvm.discourse.group/t/rfc-reshape-ops-restructuring/3310

linalg.fill gets a canonicalizer, because `FoldFillWithTensorReshape` cannot be moved to tensorops (it uses linalg::FillOp inside). Before it was listed as a canonicalization pattern for the reshape operations, now it became a canonicalization for FillOp.

Differential Revision: https://reviews.llvm.org/D115502
2021-12-10 12:11:48 +01:00
Mehdi Amini 79a0330a52 Fix crash from use of a temporary after its scope exit
Introduced in D110448 and broke some bots (reported by ASAN).

Differential Revision: https://reviews.llvm.org/D110448
2021-12-10 05:04:23 +00:00
Rob Suderman 46c96fca0e [mlir][tosa] Fix quantized type for tosa.conv2d canonicalization
Wrong type was used for the result type in the tosa.conv_2d canonicalization.
The type should match the result element type should match the result type
not the input element type.

Differential Revision: https://reviews.llvm.org/D115463
2021-12-09 12:39:23 -08:00
MaheshRavishankar 9cfd8d7c6c [mlir][Vector] Avoid infinite loop in InnerOuterDimReductionConversion.
This patterns tries to convert an inner (outer) dim reduction to an
outer (inner) dim reduction. Doing this on a 1D or 0D vector results
in an infinite loop since the converted op is same as the original
operation. Just returning failure when source rank <= 1 fixes the
issue.

Differential Revision: https://reviews.llvm.org/D115426
2021-12-09 09:30:05 -08:00
Krzysztof Drewniak e1da62910e [MLIR][GPU] Define gpu.printf op and its lowerings
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448
2021-12-09 15:54:31 +00:00
Eugene Zhulenev 49ce40e9ab [mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations
Depends On D115263

By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115436
2021-12-09 06:50:50 -08:00
Eugene Zhulenev 9f151b784b [mlir] AsyncParallelFor: sink constants into the parallel compute function
With complex recursive structure of async dispatch function LLVM can't always propagate constants to the parallel_compute_fn and it often prevents optimizations like loop unrolling and vectorization. We help LLVM by pushing known constants into the parallel_compute_fn explicitly.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115263
2021-12-09 06:48:23 -08:00
Matthias Springer cc45a13422 [mlir][linalg][bufferize] LinalgOps can bufferize inplace with input args
LinalgOp results usually bufferize inplace with output args. With this change, they may buffer inplace with input args if the value of the output arg is not used in the computation.

Differential Revision: https://reviews.llvm.org/D115022
2021-12-09 21:54:54 +09:00
Groverkss 6f9afad6d3 [MLIR] Move Presburger Math from FlatAffineConstraints to Presburger/IntegerPolyhedron
This patch factors out math functionality that is a subset of Presburger arithmetic and moves it from FlatAffineConstraints to Presburger/IntegerPolyhedron. This patch only moves some parts of the functionality planned to be moved, with subsequent patches moving more functionality. There are three main reasons for this:

1. This split makes the Presburger Library easier and more flexible to use
    across MLIR, by not depending on IR.
2. This split allows the Presburger library to be developed independently from
    Affine Analysis, with Affine Analysis using this library.
3. With more functionality being upstreamed to the Presburger Library, the
    mlir/Analysis directory will be cluttered with Presburger library components
    since they depend on math functionality from FlatAffineConstraints. Moving this
    functionality to the Presburger directory allows keeping the new functionality
    in the Presburger directory.

This patch is part of an ongoing effort to make the Presburger Library easier to use. The motivation for this effort is the feedback received at the LLVM conference from Mehdi and others.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D114674
2021-12-09 16:42:06 +05:30
Michel Weber 45ea542dd8 [MLIR] Introduce coalesce for PresburgerSet
This patch provides functionality for simplifying `PresburgerSet`s by checking if any `FlatAffineConstraints` in the set is contained in another, and removing such redundant FACs.

This is part of a series of patches to provide functionality for [integer set coalescing](http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf) in MLIR.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D110617
2021-12-09 15:46:31 +05:30
Shraiysh Vaishay d82c1f4e4b [MLIR][OpenMP] Added omp.atomic.update
This patch supports the atomic construct (update) following section 2.17.7 of OpenMP 5.0 standard. Also added tests and verifier for the same.

Reviewed By: kiranchandramohan, peixin

Differential Revision: https://reviews.llvm.org/D112982
2021-12-09 15:21:24 +05:30
Nicolas Vasilache d69f5e197c [mlir][memref] Fix subview offset verification.
Offset-specific verification seems to have been lost in one of the recent refactorings.
Also add proper tests that would have caught this omission.

This addresses the immediate issues discussed in:
https://llvm.discourse.group/t/memref-subview-affine-map-and-symbols/4851

Differential Revision: https://reviews.llvm.org/D115427
2021-12-09 07:44:51 +00:00
MaheshRavishankar 6d7c9c3d0e [mlir][Linalg] Bufferize the region of LinalgOps as well.
The region of `linalg.generic` might contain `tensor` operations. For
example, current lowering of `gather` uses a `tensor.extract` in the
body of the `LinalgOp`. Bufferize the ops within a `LinalgOp` region
as well to catch such cases.

Differential Revision: https://reviews.llvm.org/D115322
2021-12-08 22:36:01 -08:00
Rob Suderman 23149d522b [mlir] Added ctlz and cttz to math dialect and LLVM dialect
Count leading/trailing zeros are an existing LLVM intrinsic. Added LLVM
support for the intrinsics with lowerings from the math dialect to LLVM
dialect.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D115206
2021-12-08 14:32:15 -08:00
Butygin d8fce785de [mlir][spirv] math.erf OpenCL lowering
Differential Revision: https://reviews.llvm.org/D115335
2021-12-08 21:59:46 +03:00
Matthias Springer 847710f7b7 [mlir][linalg][bufferize] Add dialect filter to BufferizationOptions
This adds a new option `dialectFilter` to BufferizationOptions. Only ops from dialects that are allow-listed in the filter are bufferized. Other ops are left unbufferized. Note: This option requires `allowUnknownOps = true`.

To make use of `dialectFilter`, BufferizationOptions or BufferizationState must be passed to various helper functions.

The purpose of this change is to provide a better infrastructure for partial bufferization, which will be fully activated in a subsequent change.

Differential Revision: https://reviews.llvm.org/D114691
2021-12-08 23:51:18 +09:00
Mehdi Amini be0a7e9f27 Adjust "end namespace" comment in MLIR to match new agree'd coding style
See D115115 and this mailing list discussion:
https://lists.llvm.org/pipermail/llvm-dev/2021-December/154199.html

Differential Revision: https://reviews.llvm.org/D115309
2021-12-08 06:05:26 +00:00
Mehdi Amini ee0908703d Change the printing/parsing behavior for Attributes used in declarative assembly format
The new form of printing attribute in the declarative assembly is eliding the `#dialect.mnemonic` prefix to only keep the `<....>` part.

Differential Revision: https://reviews.llvm.org/D113873
2021-12-08 02:02:37 +00:00
Aart Bik bb8632c1ef [mlir][sparse] fix broken build
rebase and commit crossed the getFunc change

Reviewed By: Chia-hungDuan

Differential Revision: https://reviews.llvm.org/D115270
2021-12-07 11:14:21 -08:00
Aart Bik 4f2ec7f983 [mlir][sparse] finalize sparse output in the presence of reductions
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D115091
2021-12-07 10:54:29 -08:00
Rob Suderman e9fae0f19e [mlir][tosa] Disable tosa.depthwise_conv2d canonicalizer for quantized case
Quantized case needs to include zero-point corrections before the tosa.mul.
Disabled for the quantized use-case.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D115264
2021-12-07 10:16:12 -08:00
Lei Zhang 7709b23bef [mlir][scf] NFC: create dedicated files for affine utils
These functions are generic utility functions that operates on
affine ops within SCF regions. Moving them to their own files
for a better code structure, instead of mixing with loop
specialization logic.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115245
2021-12-07 10:55:32 -05:00
Nicolas Vasilache 61ba9f9110 [mlir][Linalg] NFC - Extend the TilingInterface to allow better composition with out-of-tree dialects.
Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D115233
2021-12-07 13:06:27 +00:00
Matthias Springer 958ae8b2d4 [mlir][linalg][bufferize] Bufferize Operation* instead of FuncOp
This change mainly changes the API. There is no mentioning of FuncOps in ComprehensiveBufferize anymore.

Also, bufferize methods of the op interface are called for ops without tensor operands/results if they have a region.

Differential Revision: https://reviews.llvm.org/D115212
2021-12-07 19:53:44 +09:00
Shraiysh Vaishay 31cf42bd9a [mlir][OpenMP] Added omp.atomic.read lowering
This patch adds lowering from omp.atomic.read to LLVM IR along with the
memory ordering clause. Tests for the same are also added.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D115134
2021-12-07 11:17:30 +05:30
wren romano d8731bfc93 [mlir][sparse] Requiring emitCInterface parameter to be explicit
Depends On D115004

Cleans up code legibility by requiring the `emitCInterface` parameter to be explicit at all call-sites, and defining boolean aliases for that parameter.

Reviewed By: aartbik, rriddle

Differential Revision: https://reviews.llvm.org/D115005
2021-12-06 20:50:08 -08:00
not-jenni 5911a29aa9 [mlir][tosa] Add tosa.depthwise_conv2d as tosa.mul canonicalization
For a 1x1 weight and stride of 1, the input/weight can be reshaped and
multiplied elementwise then reshaped back

Reviewed By: rsuderman, KoolJBlack

Differential Revision: https://reviews.llvm.org/D115207
2021-12-06 17:28:52 -08:00
Matthias Springer 7ce427e3bc [mlir][linalg][bufferize][NFC] Clean up BufferizationState
Make fields private and clean up the interface. In particular, BufferizableOpInterface::bufferize no longer has access to `aliasInfo`. This was potentially dangerous because some of the ops registered in BufferizationAliasInfo may have been deleted.

Differential Revision: https://reviews.llvm.org/D114931
2021-12-07 10:05:39 +09:00
Rob Suderman 05e33d846f [mlir][tosa] Resubmit add tosa.conv2d as tosa.fully_connected canonicalization
Fixed the tosa.conv2d to tosa.fully_connected canonicalization for incorrect
output channels. Included uptes to tests to include checks for the result
shapes during canonicalization.

This allows conv2d to transform to the simpler fully_connected operation.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D115170
2021-12-06 15:33:07 -08:00
wren romano f527fdf51e [mlir][sparse] Code cleanup for SparseTensorConversion
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D115004
2021-12-06 14:13:35 -08:00
Eugene Zhulenev 68a7c001ad [mlir] Improve async parallel for tests + fix typos
Do load and store to verify that we process each element of the iteration space once.

Reviewed By: cota

Differential Revision: https://reviews.llvm.org/D115152
2021-12-06 13:27:54 -08:00
Rob Suderman c5fef77bc3 [mlir] Add CtPop to MathOps with lowering to LLVM
math.ctpop maths to the llvm.ctpop intrinsic.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D114998
2021-12-06 11:54:20 -08:00
Alex Zinenko d64b3e47ba [mlir] Avoid needlessly converting LLVM named structs with compatible elements
Conversion of LLVM named structs leads to them being renamed since we cannot
modify the body of the struct type once it is set. Previously, this applied to
all named struct types, even if their element types were not affected by the
conversion. Make this behvaior only applicable when element types are changed.
This requires making the LLVM dialect type-compatibility check recursively look
at the element types (arguably, it should have been doing than since the moment
the LLVM dialect type system stopped being closed). In addition, have a more
lax check for outer types only to avoid repeated check when necessary (e.g.,
parser, verifiers that are going to also look at the inner type).

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D115037
2021-12-06 13:42:11 +01:00
Matthias Springer e761c49a14 [mlir][linalg][bufferize][NFC] Utilize isWritable for FuncOps
This is a cleanup of ModuleBufferization. Instead of storing information about writable function arguments in BufferizationAliasInfo, we can use isWritable and make the decision there, based on dialect-specifc bufferization state.

Differential Revision: https://reviews.llvm.org/D114930
2021-12-06 18:36:54 +09:00
Matthias Springer e9fb4dc9e9 [mlir][linalg][bufferize] Remove buffer equivalence from bufferize
Remove all function calls related to buffer equivalence from bufferize implementations.

Add a new PostAnalysisStep for scf.for that ensures that yielded values are equivalent to the corresponding BBArgs. (This was previously checked in `bufferize`.) This will be relaxed in a subsequent commit.

Note: This commit changes two test cases. These were broken by design
and should not have passed. With the new scf.for PostAnalysisStep, this
bug was fixed.

Differential Revision: https://reviews.llvm.org/D114927
2021-12-06 17:48:31 +09:00
Matthias Springer cb4d0bf997 [mlir][linalg][bufferize][NFC] Collect equivalent FuncOp BBArgs in PostAnalysisStep
Collect equivalent BBArgs right after the equivalence analysis of the FuncOp and before bufferizing. This is in preparation of decoupling bufferization from aliasInfo.

Also gather equivalence info for CallOps, which was missing in the
previous commit.

Differential Revision: https://reviews.llvm.org/D114847
2021-12-06 17:31:39 +09:00
Michal Terepeta caf89c0db6 [mlir][Vector] Support 0-D vectors in `ConstantMaskOp`
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:

- `vector.constant_mask [0] : vector<i1>` which gets lowered to
  `arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
  `arith.constant dense<true> : vector<i1>`

(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115023
2021-12-06 08:03:04 +00:00
gysit 69bcff46bf [mlir][linalg] Pad independent of application order (NFC).
This revision makes the padding pattern independent of the application order. It addresses the concern that we cannot rely on the execution order of the greedy rewriter (https://reviews.llvm.org/D114689). Instead, the pattern is updated to apply repeatedly till all operations are padded.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114851
2021-12-06 07:26:15 +00:00
Mehdi Amini afb0582325 Fix TOSA verifier to emit verbose errors
Also as a test for invalid ops which was missing.
2021-12-05 19:16:54 +00:00
Butygin 91072b74f8 [mlir] Add InlinerInterface to bufferization dialect
Differential Revision: https://reviews.llvm.org/D115080
2021-12-04 23:45:56 +03:00
Chia-hung Duan b8c6b15283 [mlir] Support collecting logs from notifyMatchFailure().
Let the user registers their own handler to processing the matching
failure information.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D110896
2021-12-04 04:35:24 +00:00
Mehdi Amini 4022152b35 Use LLVM_ATTRIBUTE_UNUSED to silent warning for static function used in assert only (NFC) 2021-12-04 04:23:21 +00:00
Matthias Springer 5fa0b3561a [mlir][linalg][bufferize] Implement equivalence analysis
Instead of checking buffer equivalence during bufferization, gather buffer equivalence information right after the analysis. This is in preparation of decoupling bufferization from BufferizationAliasInfo.

This change also fixes equivalence analysis for scf.if op results, which was not fully implemented. scf.if op results are equivalent to their corresponding yield values if both yield values are equivalent.

Differential Revision: https://reviews.llvm.org/D114774
2021-12-04 11:52:04 +09:00
Uday Bondhugula 2108ed0671 [MLIR] Fix affine.for unroll for multi-result upper bound maps
Fix affine.for unroll for multi-result upper bound maps: these can't be
unrolled/unroll-and-jammed in cases where the trip count isn't known to
be a multiple of the unroll factor.

Fix and clean up repeated/unnecessary checks/comments at helper callees.

Also, fix clang-tidy variable naming warnings and redundant includes.

Differential Revision: https://reviews.llvm.org/D114662
2021-12-04 07:20:26 +05:30
Matthias Springer 9e42f2aa0b [mlir][linalg][bufferize][NFC] Add inPlaceAnalysis overload
Differential Revision: https://reviews.llvm.org/D114773
2021-12-04 10:41:57 +09:00
River Riddle 7169996159 [mlir] Allow shape dimensions larger than 2^32
Internally we use int64_t to hold shapes, but for some
reason the parser was limiting shapes to unsigned. This
change updates the parser to properly handle int64_t shape
dimensions.

Differential Revision: https://reviews.llvm.org/D115086
2021-12-04 01:29:50 +00:00
Uday Bondhugula ecf458507e [MLIR] Improve error message on missing getArgument() override on pass
Improve error message while registering a pass with a missing getArgument() override.

Differential Revision: https://reviews.llvm.org/D114744
2021-12-04 06:54:52 +05:30
Matthias Springer 6db200736c [mlir][linalg][bufferize][NFC] Use same OpBuilder throughout bufferization
Also set insertion point right before calling `bufferize`. No need to put an InsertionGuard anymore.

Differential Revision: https://reviews.llvm.org/D114928
2021-12-04 09:57:26 +09:00
natashaknk e2d8b60742 Revert "[mlir][tosa] Add tosa.conv2d as fully_connected canonicalization"
This reverts commit 13bdb7ab4a. The commit introduced/uncovered an unintended bug in models containing Conv2D.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D115079
2021-12-03 14:35:48 -08:00
Matthias Springer e359a1e548 [mlir][linalg][bufferize][NFC] Map only tensors in BufferizationState
BufferizationState had map/lookup overloads for non-tensor values. This was necessary for IREE. There is now a better way to do this, so these overloads can be removed.

Differential Revision: https://reviews.llvm.org/D114929
2021-12-03 23:07:09 +09:00
Matthias Springer ed8c63115e [mlir][linalg][bufferize][NFC] Provide default implementation of getAliasingOpOperand
This simplifies op interface implementations.

Differential Revision: https://reviews.llvm.org/D115025
2021-12-03 22:36:22 +09:00
Adrian Kuegel 04d083b19e [mlir][NFC] Use const reference for loop variables. 2021-12-03 13:07:54 +01:00
Alex Zinenko 9dd1f8dfdd [mlir] support recursive type conversion of named LLVM structs
A previous commit added support for converting elemental types contained in
LLVM dialect types in case they were not compatible with the LLVM dialect. It
was missing support for named structs as they could be recursive, which was not
supported by the conversion infra. Now that it is, add support for converting
such named structs.

Depends On D113579

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D113580
2021-12-03 12:41:40 +01:00
Matthias Springer 5e1c038f7d [mlir][linalg][bufferize][NFC] Move FuncOp boundary bufferization to ModuleBufferization
Differential Revision: https://reviews.llvm.org/D114670
2021-12-03 20:29:39 +09:00
Matthias Springer ad1ba42f68 [mlir][linalg][bufferize] Allow unbufferizable ops in input
Allow ops that are not bufferizable in the input IR. (Deactivated by default.)

bufferization::ToMemrefOp and bufferization::ToTensorOp are generated at the bufferization boundaries.

Differential Revision: https://reviews.llvm.org/D114669
2021-12-03 20:20:46 +09:00
Matthias Springer 867cd948ac [mlir][linalg][bufferize][NFC] Move BufferizationOptions to op interface
Also store a reference to BufferizationOptions in BufferizationState. This is in preparation of adding support for partial bufferization.

Differential Revision: https://reviews.llvm.org/D114661
2021-12-03 19:51:34 +09:00
Michal Terepeta 1423e8bf5d [mlir][Vector] Support 0-D vectors in `BitCastOp`
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114854
2021-12-03 08:55:59 +00:00
Matthias Springer d30fcadf07 [mlir][linalg][bufferize] Op interface implementation for Bufferization dialect ops
This change provides `BufferizableOpInterface` implementations for ops from the Bufferization dialects. These ops are needed at the bufferization boundaries for partial bufferization.

Differential Revision: https://reviews.llvm.org/D114618
2021-12-03 16:25:44 +09:00
Ulysse Beaugnon e45705ad50 [MLIR] Use a shared uniquer for affine maps and integer sets.
Affine maps and integer sets previously relied on a single lock for creating unique instances. In a multi-threaded setting, this lock becomes a contention point. This commit updates AffineMap and IntegerSet to use StorageUniquer instead. StorageUniquer internally uses sharded locks and thread-local caches to reduce contention. It is already used for affine expressions, types and attributes. On my local machine, this gives me a 5X speedup for an application that manipulates a lot of affine maps and integer sets.

This commit also removes the integer set uniquer threshold. The threshold was used to avoid adding integer sets with a lot of constraints to the hash_map containing unique instances, but the constraints and the integer set were still allocated in the same allocator and never freed, thus not saving any space expect for the hash-map entry.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D114942
2021-12-02 23:49:32 +01:00
Groverkss d257f7c1bf [MLIR][FlatAffineConstraints] Remove duplicate divisions while merging local ids
This patch implements detecting duplicate local identifiers by extracting their
division representation while merging local identifiers.

For example, given the FACs A, B:

```
A: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 2)
B: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 5)
```

The intersection of A and B without this patch would lead to the following FAC:

```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4], d2 = [x / 4], d3 = [x / 4]: d0 <= s0, d1 <= s0, d2 <= s0, d3 <= s0, x + y >= 2, x + y >= 5)
```

after this patch, merging of local ids will detect that `d0 = d2` and `d1 = d3`,
and the intersection of these two FACs will be (after removing duplicate constraints):

```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4] : d0 <= s0, d1 <= s0, x + y >= 2, x + y >= 5)
```

This reduces the number of constraints by 2 (constraints) + 4 (2 constraints for each extra division) for this case.

This is used to reduce the output size representation of operations like
PresburgerSet::subtract, PresburgerSet::intersect which require merging local
variables.

Reviewed By: arjunp, bondhugula

Differential Revision: https://reviews.llvm.org/D112867
2021-12-03 03:44:47 +05:30
Groverkss cff427ee20 Revert changes that should have been sent as a patch
Revert changes that were meant to be sent as a single commit with
summary for the differential review, but were accidently sent directly.

This reverts commit 3bc5353fc6.
2021-12-03 03:42:37 +05:30
Groverkss c15724ab34 Address bondhugula's comments 2021-12-03 03:23:22 +05:30
Groverkss d82a676227 Addressed comments 2021-12-03 03:23:22 +05:30
Groverkss 76ad74a4a9 Address more comments. 2021-12-03 03:23:21 +05:30
Groverkss a8b79d116a Addressed more comments 2021-12-03 03:23:20 +05:30
Groverkss 1e0d7fd769 Fix asserts as suggested by Arjun 2021-12-03 03:23:20 +05:30
Groverkss 19352630c0 Fix clang-format errors 2021-12-03 03:23:19 +05:30
Groverkss b8ea299628 Update docs 2021-12-03 03:23:19 +05:30
Groverkss 8a0967481f Address arjun's comments 2021-12-03 03:23:18 +05:30
Groverkss c9cea1909f Move division representation to a common function 2021-12-03 03:23:18 +05:30
Groverkss 985789ce0b Update mergeLocalIds docs 2021-12-03 03:23:17 +05:30
Groverkss 06a119a3bd Update docs for mergeLocalIds 2021-12-03 03:23:17 +05:30
Groverkss 3bc5353fc6 Implement division merging 2021-12-03 03:23:16 +05:30
Matthias Springer 4479138de8 [mlir][linalg][bufferize] Bufferization of tensor.insert
This is a lightweight operation, useful for writing unit tests. It will be utilized for testing in subsequent commits.

Differential Revision: https://reviews.llvm.org/D114693
2021-12-02 11:58:01 +09:00
Kazu Hirata afe43e0713 [mlir] Remove extractVectorTypeFromShapedValue
This patch fixes the build by removing
extractVectorTypeFromShapedValue.  The last use was removed Dec 1,
2021 in commit extractVectorTypeFromShapedValue.
2021-12-01 13:43:17 -08:00
Mogball ca6bd9cd43 [mlir][ods] AttrOrTypeGen uses Class
AttrOrType def generator uses `Class` code gen helper,
instead of naked raw_ostream.

Depends on D113714 and D114807

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D113715
2021-12-01 16:53:23 +00:00
Nicolas Vasilache c537a94334 [mlir][Vector] Thread 0-d vectors through vector.transfer ops
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.

Reviewed By: ThomasRaoux, springerm

Differential Revision: https://reviews.llvm.org/D114803
2021-12-01 16:49:43 +00:00
Matthias Springer 2fd0ea960c [mlir][linalg][bufferize] CallOps do not bufferize to memory writes
However, since CallOps have no aliasing OpResults, their OpOperands always bufferize out-of-place.

This change removes `bufferizesToMemoryWrite` from `CallOpInterface`. This method was called, but its return value did not matter.

Differential Revision: https://reviews.llvm.org/D114616
2021-12-01 18:47:28 +09:00
Thomas Raoux 69a8a7cf2d [mlir] Make sure linearizeCollapsedDims doesn't drop input map dims
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.

This solves regression in IREE due to e4e4da86af

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D114838

This reverts commit 9a844c2a9b.
2021-11-30 22:51:56 -08:00
MaheshRavishankar 9a844c2a9b Revert "[mlir] Make sure linearizeCollapsedDims doesn't drop input map dims"
This reverts commit bc38673e4d.
2021-11-30 22:43:46 -08:00
MaheshRavishankar bc38673e4d [mlir] Make sure linearizeCollapsedDims doesn't drop input map dims
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.

This solves regression in IREE due to e4e4da86af

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D114838
2021-11-30 22:37:53 -08:00
Jacques Pienaar 62fea88bc5 [mlir] Update accessors prefixed form (NFC) 2021-11-30 19:42:37 -08:00
Stephen Neuendorffer 7386364889 Revert "[MLIR] Update Vector To LLVM conversion to be aware of assume_alignment"
This reverts commit 29a50c5864.

After LLVM lowering, the original patch incorrectly moved alignment
information across an unconstrained GEP operation.  This is only correct
for some index offsets in the GEP.  It seems that the best approach is,
in fact, to rely on LLVM to propagate information from the llvm.assume()
to users.

Thanks to Thomas Raoux for catching this.
2021-11-30 15:18:22 -08:00
Aart Bik 0e85232fa3 [mlir][sparse] refine simply dynamic sparse tensor outputs
Proper test for sparse tensor outputs is a single condition throughout
the whole tensor index expression (not a general conjunction, since this
may include other conditions that cause cancellation).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114810
2021-11-30 13:45:58 -08:00
Nicolas Vasilache a08b750ce9 [mlir][tensor] InsertSliceOp verification.
This revision reintroduces tensor.insert_slice verification which seems
to have vanished over time: a verifier was initially introduced in cf9503c1b7
but for some reason the invalid.mlir was not properly updated; as time passed the verifier was not called anymore and later the code was deleted.

As a consequence, a non-negligible portion of tests has run astray using invalid
tensor.insert_slice semantics and needed to be fixed.

Also, extract isRankReducedType from TensorOps for better reuse
Originally, this facility was used by both tensor and memref forms but
it got copied around as dialects were split.

Differential Revision: https://reviews.llvm.org/D114715
2021-11-30 20:37:06 +00:00
MaheshRavishankar 311dd55c9e [mlir][MemRef] Fix SubViewOp canonicalization when a subset of unit-dims are dropped.
The canonical type of the result of the `memref.subview` needs to make
sure that the previously dropped unit-dimensions are the ones dropped
for the canonicalized type as well. This means the generic
`inferRankReducedResultType` cannot be used. Instead the current
dropped dimensions need to be querried and the same need to be dropped.

Reviewed By: nicolasvasilache, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D114751
2021-11-30 20:37:06 +00:00
not-jenni 13bdb7ab4a [mlir][tosa] Add tosa.conv2d as fully_connected canonicalization
For a 1x1 weight and stride of 1, the input/weight can be reshaped and passed into a fully connected op then reshaped back

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D114757
2021-11-30 12:01:14 -08:00
gysit 98dbcff19c [mlir][linalg] Adapt the decompose patterns to use a filter (NFC).
The revision updates the convolution decomposition patterns to take a linalg transformation filter. The transformation filter in a later revision allows use the patterns from CodegenStrategy.

Depends On D114690

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114797
2021-11-30 15:46:10 +00:00
gysit 7f7103cd06 [mlir][linalg] Use top down traversal for padding.
Pad the operation using a top down traversal. The top down traversal unlocks folding opportunities and dim op canonicalizations due to the introduced extract slice operation after the padded operation.

Depends On D114585

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114689
2021-11-30 15:30:45 +00:00
gysit 1ae7342a7d [mlir][linalg] Fix windows build issue in hoist padding.
Iterating backwardSlice and removing elements at the same time can fail on windows for specific build configurations (the code was introduced in https://reviews.llvm.org/D114420). This revision introduces a second vector to collect all operations and removes them after finishing the reverse iteration.

Reviewed By: hpmorgan

Differential Revision: https://reviews.llvm.org/D114775
2021-11-30 15:21:53 +00:00
gysit 914e72d400 [mlir][linalg] Run CSE after every CodegenStrategy transformation.
Add CSE after every transformation. Transformations such as tiling introduce redundant computation, for example, one AffineMinOp for every operand dimension pair. Follow up transformations such as Padding and Hoisting benefit from CSE since comparing slice sizes simplifies to comparing SSA values instead of analyzing affine expressions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114585
2021-11-30 15:07:51 +00:00
Alexander Belyaev f910aa9105 [mlir] Fix BufferizationToMemRef build. 2021-11-30 13:10:54 +01:00
Julian Gross ae1ea0bead [mlir] Decompose Bufferization Clone operation into Memref Alloc and Copy.
This patch introduces a new conversion to convert bufferization.clone operations
into a memref.alloc and a memref.copy operation. This transformation is needed to
transform all remaining clones which "survive" all previous transformations, before
a given program is lowered further (to LLVM e.g.). Otherwise, these operations
cannot be handled anymore and lead to compile errors.
See: https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665

Differential Revision: https://reviews.llvm.org/D114233
2021-11-30 10:15:56 +01:00
Alexander Belyaev f89bb3c012 [mlir] Move bufferization-related passes to `bufferization` dialect.
[RFC](https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712)

Differential Revision: https://reviews.llvm.org/D114698
2021-11-30 09:58:47 +01:00
Stella Laurenzo bdc3183742 [mlir][python] Implement more SymbolTable methods.
* set_symbol_name, get_symbol_name, set_visibility, get_visibility, replace_all_symbol_uses, walk_symbol_tables
* In integrations I've been doing, I've been reaching for all of these to do both general IR manipulation and module merging.
* I don't love the replace_all_symbol_uses underlying APIs since they necessitate SYMBOL_COUNT walks and have various sharp edges. I'm hoping that whatever emerges eventually for this can still retain this simple API as a one-shot.

Differential Revision: https://reviews.llvm.org/D114687
2021-11-29 20:31:13 -08:00
Stella Laurenzo a6e7d024a9 [mlir][python] Add pyi stub files to enable auto completion.
There is no completely automated facility for generating stubs that are both accurate and comprehensive for native modules. After some experimentation, I found that MyPy's stubgen does the best at generating correct stubs with a few caveats that are relatively easy to fix:
  * Some types resolve to cross module symbols incorrectly.
  * staticmethod and classmethod signatures seem to always be completely generic and need to be manually provided.
  * It does not generate an __all__ which, from testing, causes namespace pollution to be visible to IDE code completion.

As a first step, I did the following:
  * Ran `stubgen` for `_mlir.ir`, `_mlir.passmanager`, and `_mlirExecutionEngine`.
  * Manually looked for all instances where unnamed arguments were being emitted (i.e. as 'arg0', etc) and updated the C++ side to include names (and re-ran stubgen to get a good initial state).
  * Made/noted a few structural changes to each `pyi` file to make it minimally functional.
  * Added the `pyi` files to the CMake rules so they are installed and visible.

To test, I added a `.env` file to the root of the project with `PYTHONPATH=...` set as per instructions. Then reload the developer window (in VsCode) and verify that completion works for various changes to test cases.

There are still a number of overly generic signatures, but I want to check in this low-touch baseline before iterating on more ambiguous changes. This is already a big improvement.

Differential Revision: https://reviews.llvm.org/D114679
2021-11-29 19:58:58 -08:00
Aart Bik 7d4da4e1ab [mlir][sparse] generalize sparse tensor output implementation
Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114399
2021-11-29 16:15:53 -08:00
Benjamin Kramer 8d474f1d15 [mlir] Handle an edge case when folding reshapes with multiple trailing 1 dimensions
We would exit early and miss this case.

Differential Revision: https://reviews.llvm.org/D114711
2021-11-29 18:31:43 +01:00
Stephan Herhut 95f34e318c [mlir][memref] Fix bug in verification of memref.collapse_shape
The verifier computed an illegal type with negative dimension size when collapsing partially static memrefs.

Differential Revision: https://reviews.llvm.org/D114702
2021-11-29 15:47:12 +01:00
Stella Laurenzo ace1d0ad3d [mlir][python] Normalize asm-printing IR behavior.
While working on an integration, I found a lot of inconsistencies on IR printing and verification. It turns out that we were:
  * Only doing "soft fail" verification on IR printing of Operation, not of a Module.
  * Failed verification was interacting badly with binary=True IR printing (causing a TypeError trying to pass an `str` to a `bytes` based handle).
  * For systematic integrations, it is often desirable to control verification yourself so that you can explicitly handle errors.

This patch:
  * Trues up the "soft fail" semantics by having `Module.__str__` delegate to `Operation.__str__` vs having a shortcut implementation.
  * Fixes soft fail in the presence of binary=True (and adds an additional happy path test case to make sure the binary functionality works).
  * Adds an `assume_verified` boolean flag to the `print`/`get_asm` methods which disables internal verification, presupposing that the caller has taken care of it.

It turns out that we had a number of tests which were generating illegal IR but it wasn't being caught because they were doing a print on the `Module` vs operation. All except two were trivially fixed:
  * linalg/ops.py : Had two tests for direct constructing a Matmul incorrectly. Fixing them made them just like the next two tests so just deleted (no need to test the verifier only at this level).
  * linalg/opdsl/emit_structured_generic.py : Hand coded conv and pooling tests appear to be using illegal shaped inputs/outputs, causing a verification failure. I just used the `assume_verified=` flag to restore the original behavior and left a TODO. Will get someone who owns that to fix it properly in a followup (would also be nice to break this file up into multiple test modules as it is hard to tell exactly what is failing).

Notes to downstreams:
  * If, like some of our tests, you get verification failures after this patch, it is likely that your IR was always invalid and you will need to fix the root cause. To temporarily revert to prior (broken) behavior, replace calls like `print(module)` with `print(module.operation.get_asm(assume_verified=True))`.

Differential Revision: https://reviews.llvm.org/D114680
2021-11-28 18:02:01 -08:00
Stanislav Funiak a19e163526 Fixed broken build under GCC 5.4.
This diff fixes broken build caused by D108550. Under GCC 5, auto lambdas that capture this require `this->` for member calls.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D114659
2021-11-27 09:03:27 +05:30
Kazu Hirata 803cec0268 [mlir] Fix a warning
This patch fixes:

  mlir/lib/IR/MLIRContext.cpp:1020:3: error: use of the 'nodiscard'
  attribute is a C++17 extension [-Werror,-Wc++17-extensions]
2021-11-26 12:27:11 -08:00
Arnab Dutta c2280b5517 [MLIR] Avoid creation of buggy affine maps when incorrect values of number of dimensions and number of symbols are provided.
We check whether the maximum index of dimensional identifier present
in the result expressions is less than dimCount (number of dimensional
identifiers) argument passed in the AffineMap::get() and the maximum index
of symbolic identifier present in the result expressions is less than
symbolCount (number of symbolic identifiers) argument passed in AffineMap::get().

Reviewed By: nicolasvasilache, bondhugula

Differential Revision: https://reviews.llvm.org/D114238
2021-11-27 00:37:08 +05:30
Arnab Dutta e4e4da86af [MLIR] Prevent creation of buggy affine map after linearizing collapsed dimensions of source map
Initially we were passing wrong numSymbols argument while calling
AffineMap::get() for creaating affine map with linearized result
expressions. The main problems was the number of symbols of the newly
to be created map may be different from that of the source map, as
new symbolic identifiers may be introduced while creating strided layout
linearized expressions.

Reviewed By: nicolasvasilache, bondhugula

Differential Revision: https://reviews.llvm.org/D114240
2021-11-27 00:32:58 +05:30
Chris Jones 344eee6f38 [MLIR] Allow `Idempotent` trait to be applied to binary ops.
Add `Idempotent` trait to `arith.{andi,ori}`.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114574
2021-11-26 18:22:49 +00:00
Michal Terepeta 7e65fc9a60 [mlir][Vector] Support 0-D vectors in `BroadcastOp`
This changes the op to produce `AnyVectorOfAnyRank` following mostly the code for 1-D vectors.

Depends On D114598

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114550
2021-11-26 17:17:18 +00:00
Michal Terepeta d0f927121e [mlir][Standard] Support 0-D vectors in `SplatOp`
This changes the op to produce `AnyVectorOfAnyRank` and implements this by just
inserting the element (skipping the shuffle that we do for the 1-D case).

Depends On D114549

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114598
2021-11-26 17:05:15 +00:00
Arjun P ad34ce94d5 [MLIR] Simplex: fix a bug when rolling back a Simplex with no solutions
Previously, when adding a constraint to a Simplex that is already marked
as having no solutions (marked empty), the Simplex would be marked empty again,
and a second UnmarkEmpty entry would be pushed to the undo log. When rolling
back, Simplex should be unmarked empty only after rolling back past the
creation of the first constraint that made it empty.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D114613
2021-11-26 22:33:48 +05:30
Arjun P f074bbb04a [MLIR] Simplex::pivot: also update the redundant rows when pivoting
Previously, the pivot function would only update the non-redundant rows when
pivoting. This is incorrect because in some cases, when rolling back past a
`detectRedundant` call, the basis being used could be different from that which
was used at the time of returning from the `detectRedundant` call. Therefore,
it is important to update the redundant rows as well during pivots. This could
also be triggered by pivots that occur when testing successive constraints for
being redundant in `detectRedundant` after some initial constraints are marked redundant.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D114614
2021-11-26 21:42:41 +05:30
Mats Petersson 30238c3676 [mlir][OpenMP] Add support for SIMD modifier
Add support for SIMD modifier in OpenMP worksharing loops.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D111051
2021-11-26 14:04:46 +00:00
Matthias Springer b62b21b980 [mlir][linalg][bufferize][NFC] InsertSliceOp no-copy detection as PostAnalysis
There is special logic for InsertSliceOp to check if a memcpy is needed. This change extracts that piece of code and makes it a PostAnalysisStep.

The purpose of this change is to untangle `bufferize` from BufferizationAliasInfo. (Not fully there yet.)

Differential Revision: https://reviews.llvm.org/D114513
2021-11-26 22:19:29 +09:00
Benjamin Kramer 8521850f20 Provide a definition for OperationPosition::kDown
This isn't necessary in C++17, but C++14 still requires it.
2021-11-26 14:11:59 +01:00
Benjamin Kramer 1b0312d280 [PDL] fix unused variable warning in Release builds 2021-11-26 14:11:58 +01:00
Stanislav Funiak d35f119094 Added line numbers to the debug output of PDL bytecode.
This is a small diff that splits out the debug output for PDL bytecode. When running bytecode with debug output on, it is useful to know the line numbers where the PDLIntepr operations are performed. Usually, these are in a single MLIR file, so it's sufficient to print out the line number rather than the entire location (which tends to be quite verbose). This debug output is gated by `LLVM_DEBUG` rather than `#ifndef NDEBUG` to make it easier to test.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D114061
2021-11-26 18:11:37 +05:30
Stanislav Funiak a76ee58f3c Multi-root PDL matching using upward traversals.
This is commit 4 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This PR integrates the various components (root ordering algorithm, nondeterministic execution of PDL bytecode) to implement multi-root PDL matching. The main idea is for the pattern to specify mulitple candidate roots. The PDL-to-PDLInterp lowering selects one of these roots and "hangs" the pattern from this root, traversing the edges downwards (from operation to its operands) when possible and upwards (from values to its uses) when needed. The root is selected by invoking the optimal matching multiple times, once for each candidate root, and the connectors are determined form the optimal matching. The costs in the directed graph are equal to the number of upward edges that need to be traversed when connecting the given two candidate roots. It can be shown that, for this choice of the cost function, "hanging" the pattern an inner node is no better than from the optimal root.

The following three main additions were implemented as a part of this PR:
1. OperationPos predicate has been extended to allow tracing the operation accepting a value (the opposite of operation defining a value).
2. Predicate checking if two values are not equal - this is useful to ensure that we do not traverse the edge back downwards after we traversed it upwards.
3. Function for for building the cost graph among the candidate roots.
4. Updated buildPredicateList, building the predicates optimal branching has been determined.

Testing: unit tests (an integration test to follow once the stack of commits has landed)

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108550
2021-11-26 18:11:37 +05:30
Stanislav Funiak 6df7cc7f47 Implementation of the root ordering algorithm
This is commit 3 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

We form a graph over the specified roots, provided in `pdl.rewrite`, where two roots are connected by a directed edge if the target root can be connected (via a chain of operations) in the underlying pattern to the source root. We place a restriction that the path connecting the two candidate roots must only contain the nodes in the subgraphs underneath these two roots. The cost of an edge is the smallest number of upward traversals (edges) required to go from the source to the target root, and the connector is a `Value` in the intersection of the two subtrees rooted at the source and target root that results in that smallest number of such upward traversals. Optimal root ordering is then formulated as the problem of finding a spanning arborescence (i.e., a directed spanning tree) of minimal weight.

In order to determine the spanning arborescence (directed spanning tree) of minimum weight, we use the [Edmonds' algorithm](https://en.wikipedia.org/wiki/Edmonds%27_algorithm). The worst-case computational complexity of this algorithm is O(_N_^3) for a single root, where _N_ is the number of specified roots. The `pdl`-to-`pdl_interp` lowering calls this algorithm as a subroutine _N_ times (once for each candidate root), so the overall complexity of root ordering is O(_N_^4). If needed, this complexity could be reduced to O(_N_^3) with a more efficient algorithm. However, note that the underlying implementation is very efficient, and _N_ in our instances tends to be very small (<10). Therefore, we believe that the proposed (asymptotically suboptimal) implementation will suffice for now.

Testing: a unit test of the algorithm

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108549
2021-11-26 18:11:37 +05:30
Stanislav Funiak 3eb1647af0 Introduced iterative bytecode execution.
This is commit 2 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This commit implements the features needed for the execution of the new operations pdl_interp.get_accepting_ops, pdl_interp.choose_op:
1. The implementation of the generation and execution of the two ops.
2. The addition of Stack of bytecode positions within the ByteCodeExecutor. This is needed because in pdl_interp.choose_op, we iterate over the values returned by pdl_interp.get_accepting_ops until we reach finalize. When we reach finalize, we need to return back to the position marked in the stack.
3. The functionality to extend the lifetime of values that cross the nondeterministic choice. The existing bytecode generator allocates the values to memory positions by representing the liveness of values as a collection of disjoint intervals over the matcher positions. This is akin to register allocation, and substantially reduces the footprint of the bytecode executor. However, because with iterative operation pdl_interp.choose_op, execution "returns" back, so any values whose original liveness cross the nondeterminstic choice must have their lifetime executed until finalize.

Testing: pdl-bytecode.mlir test

Reviewed By: rriddle, Mogball

Differential Revision: https://reviews.llvm.org/D108547
2021-11-26 18:11:37 +05:30
Stanislav Funiak 842b6861c0 Defines new PDLInterp operations needed for multi-root matching in PDL.
This is commit 1 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

These operations are:
* pdl.get_accepting_ops: Returns a list of operations accepting the given value or a range of values at the specified position. Thus if there are two operations `%op1 = "foo"(%val)` and `%op2 = "bar"(%val)` accepting a value at position 0, `%ops = pdl_interp.get_accepting_ops of %val : !pdl.value at 0` will return both of them. This allows us to traverse upwards from a value to operations accepting the value.
* pdl.choose_op: Iteratively chooses one operation from a range of operations. Therefore, writing `%op = pdl_interp.choose_op from %ops` in the example above will select either `%op1`or `%op2`.

Testing: Added the corresponding test cases to mlir/test/Dialect/PDLInterp/ops.mlir.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108543
2021-11-26 17:59:22 +05:30
Matthias Springer 8e2214aa60 [mlir][linalg][bufferize][NFC] Pass BufferizationState to PostAnalysisStep
Pass BufferizationStep instead of BufferizationAliasInfo. Note: BufferizationState contains BufferizationAliasInfo.

Differential Revision: https://reviews.llvm.org/D114512
2021-11-26 11:46:14 +09:00
Matthias Springer d62b4b08af [mlir][linalg][bufferize] Compose dialect-specific bufferization state
Use composition instead of inheritance for storing dialect-specific bufferization state. This is in preparation of adding "tensor dialect"-specific bufferization state.

Differential Revision: https://reviews.llvm.org/D114508
2021-11-26 11:35:45 +09:00
Matthias Springer c94b80b438 [mlir][linalg][bufferize][NFC] Allow returning arbitrary memrefs
If `allowReturnMemref` is set to true, arbitrary memrefs may be returned from FuncOps. Also remove allocation hoisting code, which is only partly implemented at the moment.

The purpose of this commit is to untangle `bufferize` from `aliasInfo`. (Even with this change, they are not fully untangled yet.)

Differential Revision: https://reviews.llvm.org/D114507
2021-11-26 11:26:46 +09:00
Matthias Springer c637e3ea9e [mlir][linalg][bufferize][NFC] Extract func boundary bufferization
Bufferization of function boundaries is extracted from ComprehensiveBufferize into a separate file. This will become its own build target in the future.

Differential Revision: https://reviews.llvm.org/D114226
2021-11-26 10:25:36 +09:00
Matthias Springer f32c3d9528 [mlir][linalg][bufferize][NFC] Move Affine interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the Affine dialect.

Differential Revision: https://reviews.llvm.org/D114222
2021-11-26 09:27:47 +09:00
Michal Terepeta cc311a155a [mlir][Vector] Support 0-D vectors in `VectorPrintOpConversion`
Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114549
2021-11-25 20:12:18 +00:00
Uday Bondhugula c89fc1eec3 [MLIR] NFC. Rename MLIR CAPI ExecutionEngine target for consistency
Rename MLIR CAPI ExecutionEngine target for consistency:
MLIRCEXECUTIONENGINE -> MLIRCAPIExecutionEngine in line with other
targets.

Differential Revision: https://reviews.llvm.org/D114596
2021-11-26 00:23:17 +05:30
Alexander Belyaev 57470abc41 [mlir] Move memref.[tensor_load|buffer_cast|clone] to "bufferization" dialect.
https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712

Differential Revision: https://reviews.llvm.org/D114552
2021-11-25 11:50:39 +01:00
Tobias Gysi 4b03906346 [mlir][linalg] Perform checks early in hoist padding.
Instead of checking for unexpected operations (any operation with a region except for scf::For and `padTensorOp` or operations with a memory effect) while cloning the packing loop nest perform the checks early. Update `dropNonIndexDependencies` to check for unexpected operations. Additionally, check all of these operations have index type operands only.

Depends On D114428

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114438
2021-11-25 10:37:12 +00:00
Tobias Gysi fd723eaa92 [mlir][linalg] Limit hoist padding to constant paddings.
Limit hoist padding to pad tensor ops that depend only on a constant value. Supporting arbitrary padding values that depend on computations part of the backward slice to hoist require complex analysis to ensure the computation can be hoisted.

Depends On D114420

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114428
2021-11-25 10:31:39 +00:00
Tobias Gysi ed7c1fb9b0 [mlir][linalg] Add backward slice filtering in hoist padding.
Adapt hoist padding to filter the backward slice before cloning the packing loop nest. The filtering removes all operations that are not used to index the hoisted pad tensor op and its extract slice op. The filtering is needed to support the more complex loop nests created after fusion. For example, fusing the producer of an output operand can added linalg ops and pad tensor ops to the backward slice. These operations have regions and currently prevent hoisting.

The following example demonstrates the effect of the newly introduced `dropNonIndexDependencies` method that filters the backward slice:
```
%source = linalg.fill(%cst, %arg0)
scf.for %i
  %unrelated = linalg.fill(%cst, %arg1)    // not used to index %source!
  scf.for %j (%arg2 = %unrelated)
    scf.for %k                             // not used to index %source!
      %ubi = affine.min #map(%i)
      %ubj = affine.min #map(%j)
      %slice = tensor.extract_slice %source [%i, %j] [%ubi, %ubj]
      %padded_slice = linalg.pad_tensor %slice
```
dropNonIndexDependencies(%padded_slice, %slice)
removes [scf.for %k, linalg.fill(%cst, %arg1)] from backwardSlice.

Depends On D114175

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114420
2021-11-25 10:30:10 +00:00
Matthias Springer 48107eaa07 [mlir][linalg][bufferize][NFC] Move SCF interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the SCF dialect.

Differential Revision: https://reviews.llvm.org/D114221
2021-11-25 19:00:17 +09:00
Alexander Belyaev 3c228573bc Revert "[mlir][SCF] Further simplify affine maps during `for-loop-canonicalization`"
This reverts commit ee1bf18672.

It breaks IREE lowering. Reverting the commit for now while we
investigate what's going on.
2021-11-25 10:54:52 +01:00
Butygin 8dae0b6b6c [mlir][spirv] arith::RemSIOp OpenCL lowering
Differential Revision: https://reviews.llvm.org/D114524
2021-11-25 12:44:06 +03:00
Matthias Springer a5c2f78287 [mlir][interfaces] Add insideMutuallyExclusiveRegions helper
Add a helper function to ControlFlowInterfaces for checking if two ops
are in mutually exclusive regions according to RegionBranchOpInterface.

Utilize this new helper in Linalg ComprehensiveBufferize. This makes the
analysis independent of the SCF dialect and generalizes it to other ops
that implement RegionBranchOpInterface.

Differential Revision: https://reviews.llvm.org/D114220
2021-11-25 17:44:39 +09:00
Matthias Springer ee1bf18672 [mlir][SCF] Further simplify affine maps during `for-loop-canonicalization`
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.

Differential Revision: https://reviews.llvm.org/D114138
2021-11-25 12:44:19 +09:00
Matthias Springer 8a8c655fe7 [mlir][SCF] Fix off-by-one bug in affine analysis
This change is NFC. There were two issues when passing/reading upper bounds into/from FlatAffineConstraints that negate each other, so the bug was not apparent. However, it made debugging harder because some constraints in the FlatAffineConstraints were off by one when dumping all constraints.

Differential Revision: https://reviews.llvm.org/D114137
2021-11-25 12:37:02 +09:00
Uday Bondhugula 23d505571d [NFC] Improve debug message in getAsIntegerSet
Improve debug message in getAsIntegerSet. Add missing trailing new line
and position info.

Differential Revision: https://reviews.llvm.org/D114511
2021-11-25 08:50:21 +05:30
Matthias Springer d3bb4fec2a [mlir][linalg][bufferize][NFC] Move arith interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the arith dialect.

Differential Revision: https://reviews.llvm.org/D114219
2021-11-25 10:21:02 +09:00
bakhtiyar 7bd87a03fd Promote readability by factoring out creation of min/max operation. Remove unnecessary divisions.
Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D110680
2021-11-24 16:17:23 -08:00
Lei Zhang cb395f66ac [mlir][spirv] Change the return type for {Min|Max}VersionBase
For synthesizing an op's implementation of the generated interface
from {Min|Max}Version, we need to define an `initializer` and
`mergeAction`. The `initializer` specifies the initial version,
and `mergeAction` specifies how version specifications from
different parts of the op should be merged to generate a final
version requirements.

Previously we use the specified version enum as the type for both
the initializer and thus the final return type. This means we need
to perform `static_cast` over some hopefully not used number (`~0u`)
as the initializer. This is quite opaque and sort of not guaranteed
to work. Also, there are ops that have an enum attribute where some
values declare version requirements (e.g., enumerant `B` requires
v1.1+) but some not (e.g., enumerant `A` requires nothing). Then a
concrete op instance with `A` will still declare it implements the
version interface (because interface implementation is static for
an op) but actually theirs no requirements for version.

So this commit changes to use an more explicit `llvm::Optional`
to wrap around the returned version enum.  This should make it
more clear.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D108312
2021-11-24 17:33:01 -05:00
Tobias Gysi 86f186efea [mlir][linalg] Add makeComposedPadHighOp.
Add the makeComposedPadHighOp method which creates a new PadTensorOp if necessary. If the source to pad is actually the result of a sequence of padded LinalgOps, the method checks if padding is needed or if we can use the padded result of the padded LinalgOp sequence directly.

Example:
```
%0 = tensor.extract_slice %arg0 [%iv0, %iv1] [%sz0, %sz1]
%1 = linalg.pad_tensor %0 low[0, 0] high[...] { linalg.yield %cst }
%2 = linalg.matmul ins(...) outs(%1)
%3 = tensor.extract_slice %2 [0, 0] [%sz0, %sz1]
```
when padding %3 return %2 instead of introducing
```
%4 = linalg.pad_tensor %3 low[0, 0] high[...] { linalg.yield %cst }
```

Depends On D114161

Reviewed By: nicolasvasilache, pifon2a

Differential Revision: https://reviews.llvm.org/D114175
2021-11-24 19:18:59 +00:00
Tobias Gysi a4fd8cb76f [mlir][linalg] Update failure conditions for padOperandToSmallestStaticBoundingBox.
Change the failure condition of padOperandToSmallestStaticBoundingBox to never fail if the operand is already statically sized.

In particular:
- if the padding value computation fails -> return failure if the operand shape is dynamic and success if it is static.
- if there is no extract slice op -> return failure if the operand shape is dynamic and success if it is static.

The latter change prevents padding from failure if the output operand passed by iteration argument is statically sized since in this case the extract / insert slice pairs are removed by canonicalization.

Depends On D114153

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114161
2021-11-24 19:10:50 +00:00
MaheshRavishankar 0a58982b08 [mlir][Linalg] Remove alloc/dealloc pair as a callback.
The alloc dealloc pair generation callback is really central to the
bufferization algorithm, it modifies the state in a way that affects
correctness. This is not really a configurable option. Moving it to
BufferizationState removes what was probably the reason it was added
as a callback.

Differential Revision: https://reviews.llvm.org/D114417
2021-11-24 10:36:34 -08:00
Nicolas Vasilache 1cfa9b4d70 [mlir][Vector] NFC - Apply some clangd suggested fixes. 2021-11-24 15:55:58 +00:00
Matthias Springer ca9d149e07 [mlir][linalg][bufferize][NFC] Move vector interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the vector dialect.

Differential Revision: https://reviews.llvm.org/D114218
2021-11-24 19:36:12 +09:00
Matthias Springer bb273a35a0 [mlir][linalg][bufferize][NFC] Move tensor interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the tensor dialect.

Differential Revision: https://reviews.llvm.org/D114217
2021-11-24 18:25:17 +09:00
Butygin 7f5d9bf13a [mlir][scf] Canonicalize scf.while with unused results
Differential Revision: https://reviews.llvm.org/D114291
2021-11-24 11:11:22 +03:00
Bixia Zheng 02710413a3 Accept symmetric sparse matrix in Matrix Market Exchange Format.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D114402
2021-11-23 19:53:17 -08:00
Uday Bondhugula 8bd08a9fd7 [MLIR] Remove duplicate `Pass` suffix from ViewOpGraph class name
Remove duplicate `Pass` suffix from view-op-graph pass class name. The
extra suffix would lead to methods like registerViewOpGraphPassPass
being generated.

Differential Revision: https://reviews.llvm.org/D114459
2021-11-24 08:00:16 +05:30
wren romano d7d7ffe254 [mlir][sparse] Adding wrappers for constantOverheadTypeEncoding
Minor code cleanup

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D114392
2021-11-23 18:30:06 -08:00
Butygin 75a1bee05d [mlir][spirv] Add math to OpenCL conversion
Differential Revision: https://reviews.llvm.org/D113780
2021-11-24 02:31:21 +03:00
Rob Suderman 0f1e52afa9 [mlir][tosa] Materialize tosa.pad value and fold noop pads
Padding now can explicitly specify the padding value when non-zero is wanted.
This also includes bypassing pads when the pad does nothing.

Differential Revision: https://reviews.llvm.org/D113611
2021-11-23 12:23:42 -08:00
Rob Suderman 54eec7cafc [mlir][tosa] Separate tosa.transpose_conv decomposition and added stride support
Transpose convolution decomposition is now performed in a separate pass. This
allows padding / constant propagation to be performed at the TOSA level. It
also adds support for striding when there is no dilation.

Differential Revision: https://reviews.llvm.org/D114409
2021-11-23 12:16:44 -08:00
MaheshRavishankar b57e2f071a [mlir][Linalg] Add pad vectorization patterns into LinalgStrategyVectorize passes.
Add an option to control whether these patterns are added to the
pattern list or not.

Differential Revision: https://reviews.llvm.org/D114290
2021-11-23 11:47:54 -08:00
Nicolas Vasilache 3ff4e5f2a4 [mlir][Vector] Thread 0-d vectors through InsertElementOp.
This revision makes concrete use of 0-d vectors to extend the semantics of
InsertElementOp.

Reviewed By: dcaballe, pifon2a

Differential Revision: https://reviews.llvm.org/D114388
2021-11-23 12:55:11 +00:00
Nicolas Vasilache e7026aba00 [mlir][Vector] Thread 0-d vectors through ExtractElementOp.
This revision starts making concrete use of 0-d vectors to extend the semantics of
ExtractElementOp.
In the process a new VectorOfAnyRank Tablegen OpBase.td is added to allow progressive transition to supporting 0-d vectors by gradually opting in.

Differential Revision: https://reviews.llvm.org/D114387
2021-11-23 12:39:44 +00:00
Matthias Springer f24d9313cc [mlir][linalg][bufferize][NFC] Specify bufferize traversal in `bufferize`
The interface method `bufferize` controls how (and it what order) nested ops are traversed. This simplifies bufferization of scf::ForOps and scf::IfOps, which used to need special rules in scf::YieldOp.

Differential Revision: https://reviews.llvm.org/D114057
2021-11-23 21:33:19 +09:00
Alexander Belyaev c7cc70c8f8 Revert "Revert "[mlir] Move AllocationOpInterface to Bufferize/IR/AllocationOpInterface.td.""
This reverts and fixes commit de18b7dee6.
2021-11-23 10:49:26 +01:00
Nicolas Vasilache b2729fda60 [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Differential Revision: https://reviews.llvm.org/D114393
2021-11-23 07:31:22 +00:00
Sandeep Dasgupta e5a8c8c883 [mlir] Refactoring a few Parser APIs
Refactored two new parser APIs parseGenericOperationAfterOperands and
 parseCustomOperationName out of parseGenericOperation and parseCustomOperation.

Motivation: Sometimes an op can be printed in a special way if certain criteria
is met. While parsing, we need to handle all the versions.
`parseGenericOperationAfterOperands` is handy in situation where we already
parsed the operands and decide to fall back to default parsing.

`parseCustomOperationName` is useful when we need to know details (dialect,
operation name etc.) about a parsed token meant to be an mlir operation.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D113719
2021-11-23 06:11:01 +00:00
Matthias Springer fb99686bfd [mlir][linalg][bufferize] Limited support for scf.execute_region
Add support for analysis only.

Differential Revision: https://reviews.llvm.org/D114055
2021-11-23 12:20:39 +09:00
Matthias Springer 26c0dd83ab [mlir][linalg][bufferize][NFC] Move helper function to op interface
This is in preparation of changing the op traversal during bufferization.

Differential Revision: https://reviews.llvm.org/D114040
2021-11-23 11:59:47 +09:00
Matthias Springer 8d0994ed21 [mlir][linalg][bufferize][NFC] Remove special casing of CallOps
Differential Revision: https://reviews.llvm.org/D113966
2021-11-23 11:14:10 +09:00
Matthias Springer b1083830d6 [mlir][linalg][bufferize][NFC] Clean up headers and function visibility
Differential Revision: https://reviews.llvm.org/D113964
2021-11-23 10:29:26 +09:00
Benjamin Kramer 966b720983 [mlir][memref] Fix expanded shape ops memref.cast folding with changed type
`memref.expand_shape` has verification logic to make sure
result dim must be static if all the collapsing src dims are static.

This can be relaxed once expand_shape supports more dynamism.

Differential Revision: https://reviews.llvm.org/D114391
2021-11-22 22:56:15 +01:00
Christian Ulmann f6718fc6d3 [mlir] FlatAffineConstraint parsing for unit tests
This patch adds functionality to parse FlatAffineConstraints from a
StringRef with the intention to be used for unit tests. This should
make the construction of FlatAffineConstraints easier for testing
purposes.

The patch contains an example usage of the functionality in a unit test that
uses FlatAffineConstraints.

Reviewed By: bondhugula, grosser

Differential Revision: https://reviews.llvm.org/D113275
2021-11-23 03:04:30 +05:30
Groverkss 98daa4e425 [MLIR] Fix incorrect removal of source loop in loop fusion
This patch fixes a bug in loop fusion pass where the source loop was removed
even when the fused loop did not cover all iterations of the source loop.

This was because the fast hueristic check for checking if source loop and
fused loop have same iterations did not take into account steps in loop.

Reviewed By: dcaballe, bondhugula

Differential Revision: https://reviews.llvm.org/D114164
2021-11-23 02:54:09 +05:30
Alexander Belyaev de18b7dee6 Revert "[mlir] Move AllocationOpInterface to Bufferize/IR/AllocationOpInterface.td."
This reverts commit 3028bca6a9.
For some reason using FallbackModel works with CMake and does not work
with bazel. Using `ExternalModel` works. I will check what's going on
and resubmit tomorrow.
2021-11-22 21:35:20 +01:00
Alexander Belyaev 3028bca6a9 [mlir] Move AllocationOpInterface to Bufferize/IR/AllocationOpInterface.td.
Remove the interface from op defs in MemRefOps.td and make it an external model.

This is the first PR of many that will move bufferization-related ops, interfaces, passes to Dialect/Bufferize.
RFC: https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712
It is still debated if the comprehensive bufferization has to be moved there as well, so for now I am just moving the "gradual" bufferization.

Differential Revision: https://reviews.llvm.org/D114147
2021-11-22 21:00:59 +01:00
Mehdi Amini e0b7bee7cf Revert "[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)"
This reverts commit a9e236bed8.
This broke the Windows build:

mlir\include\mlir/Dialect/X86Vector/Transforms.h(28): error C2061: syntax error: identifier 'uint'
2021-11-22 19:23:18 +00:00
Lei Zhang 93284120f2 [mlir][vector] Fix TransferOpReduceRank for 0-D tensors
We cannot unconditionally generate memref.load ops for such cases;
need to check the source's type.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114376
2021-11-22 12:30:46 -05:00
Alex Zinenko 9c5982ef8e [mlir] support recursive types in type conversion infra
MLIR supports recursive types but they could not be handled by the conversion
infrastructure directly as it would result in infinite recursion in
`convertType` for elemental types. Support this case by keeping the "call
stack" of nested type conversions in the TypeConverter class and by passing it
as an optional argument to the individual conversion callback. The callback can
then check if a specific type is present on the stack more than once to detect
and handle the recursive case.

This approach is preferred to the alternative approach of having a separate
callback dedicated to handling only the recursive case as the latter was
observed to introduce ~3% time overhead on a 50MB IR file even if it did not
contain recursive types.

This approach is also preferred to keeping a local stack in type converters
that need to handle recursive types as that would compose poorly in case of
out-of-tree or cross-project extensions.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D113579
2021-11-22 18:16:02 +01:00
Tobias Gysi 247a1a55eb [mlir][linalg] Use getAsOpFoldResult in padding (NFC).
After padding, we introduce a ExtractSliceOp to get the final unpadded result. This revision uses getAsOpFoldResult to compute the size of the unpadded result, which guarantees the result type has a partially static shape if some of the sizes of the unpadded result are statically known. At the moment, we rely on canonicalization to cleanup the types after padding.

Depends On D114085

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114153
2021-11-22 13:15:19 +00:00
Tobias Gysi 32c43241e7 [mlir][linalg] Always generate an extract/insert slice pair when tiling output tensors.
Adapt tiling to always generate an extract/insert slice pair for output tensors even if the tensor is not tiled. Having an explicit extract/insert slice pair simplifies followup transformations such as padding and bufferization. In particular, it makes read and written iteration argument slices explicit.

Depends On D114067

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114085
2021-11-22 13:12:43 +00:00
Tres Popp 106f307499 Rename MlirExecutionEngine lookup to lookupPacked
The purpose of the change is to make clear whether the user is
retrieving the original function or the wrapper function, in line with
the invoke commands. This new functionality is useful for users that
already have defined their own packed interface, so they do not want the
extra layer of indirection, or for users wanting to the look at the
resulting primary function rather than the wrapper function.

All locations, except the python bindings now have a `lookupPacked`
method that matches the original `lookup` functionality. `lookup`
still exists, but with new semantics.

- `lookup` returns the function with a given name. If `bool f(int,int)`
is compiled, `lookup` will return a reference to `bool(*f)(int,int)`.
- `lookupPacked` returns the packed wrapper of the function with the
given name. If `bool f(int,int)` is compiled, `lookupPacked` will return
`void(*mlir_f)(void**)`.

Differential Revision: https://reviews.llvm.org/D114352
2021-11-22 14:12:09 +01:00
Tobias Gysi f7751a3a42 [mlir][linalg] Remove tile and fuse test pass (NFC).
Remove the tile and fuse test pass that has been replaced by codegen strategy.

Depends On D114067

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114068
2021-11-22 12:33:31 +00:00
Nicolas Vasilache 050cc1cd6e [mlir] Add InitializeNativeTargetAsmParser to ExecutionEngine.
This is required to allow python to work with lowerings that use inline_asm.

Differential Revision: https://reviews.llvm.org/D114338
2021-11-22 11:28:14 +00:00
Tobias Gysi e3d386ea27 [mlir][linalg] Add a tile and fuse on tensors pattern.
Add a pattern to apply the new tile and fuse on tensors method. Integrate the pattern into the CodegenStrategy and use the CodegenStrategy to implement the tests.

Depends On D114012

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114067
2021-11-22 11:13:21 +00:00
Nicolas Vasilache 789c88e80e [mlir] Fix unintentional mutation by VectorType/RankedTensorType::Builder dropDim
Differential Revision: https://reviews.llvm.org/D113933
2021-11-22 10:51:50 +00:00
Tobias Gysi 0ccc44cec0 [mlir][linalg] Fix tile and fuse for outermost reduction.
Tile and fuse failed if the outermost tile loop is a reduction dimension. Add the necessary check to handle outermost reductions and introduce a test case to verify the change.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114012
2021-11-22 10:44:15 +00:00
Nicolas Vasilache a9e236bed8 [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Reviewed By: ftynse, dcaballe

Differential Revision: https://reviews.llvm.org/D114335
2021-11-22 10:32:34 +00:00
Jacques Pienaar e5a4d0f149 [mlir] Fix unused function warning (NFC)
Delete function no longer needed as all derived classes override
printer.
2021-11-21 15:06:08 -08:00
Jacques Pienaar 6f9cceb775 [mlir] Move trait to InferTypeOpInterface
Step towards removing the hard coded behavior for this trait and to instead use common interface.

Differential Revision: https://reviews.llvm.org/D114208
2021-11-21 14:41:12 -08:00
Arjun P ad48ef1e31 [MLIR][NFC] Simplex::restoreRow: improve documentation 2021-11-21 19:23:55 +05:30
Arnab Dutta ec7b0d4d34 [MLIR] Simplify Semi-affine expressions by rule based matching and replacing "expr - q * (expr floordiv q)" with "expr mod q" expression.
Add rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.

Reviewed By: bondhugula, dcaballe

Differential Revision: https://reviews.llvm.org/D112985
2021-11-20 21:05:36 +05:30
Arnab Dutta 1f9ca5adba [MLIR] Avoid creation of buggy affine maps while replacing dimension and symbol
Initially before appending the newly composed dimension and symbols
to the dimension and symbol list whose size is to be passed in
AffineMap::get(), the call to the AffineMap::get() was made, resulting
in wrong dimCount and symbolCount being passed as argument. We move the
call to the AffineMap::get() after the diimension and symbol list are
updated.

Differential Revision: https://reviews.llvm.org/D114237
2021-11-20 12:01:29 +05:30
Krzysztof Drewniak a6f53afbcb [MLIR][GPU] Link in device libraries during HSA compilation if needed
To perform some operations, such as sin() or printf(), code compiled
for AMD GPUs must be linked to a series of device libraries. This
commit adds support for linking in these libraries.

However, since these device libraries are delivered as LLVM bitcode,
raising the possibility of version incompatibilities, this commit only
links in libraries when the functions from those libraries are called
by the code being compiled.

This code also sets the math flags to their most conservative values,
as MLIR doesn't have a `-ffast-math` equivalent.

Depends on D114114

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114117
2021-11-19 22:29:37 +00:00
rdzhabarov d729f4c38f [mlir] Bug fix. Stream must outlive the pass manager.
Bug fix. Stream must outlive the pass manager.

Reviewed By: Chia-hungDuan

Differential Revision: https://reviews.llvm.org/D114277
2021-11-19 21:45:43 +00:00
Krzysztof Drewniak 20f79f8caa [MLIR][GPU] Make the path to ROCm a runtime option
Our current build assumes that the path to ROCm we find at build time
will be the path at which ROCm is located when the built code is
executed. This commit adds a --rocm-path option to SerializeToHsaco,
and removes the HIP dependency that the SerializeToHsaco previously had.

Depends on D114113

(though the dependency is to ensure the diffs apply cleanly and to capture the dependency on D114107)

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114114
2021-11-19 20:51:54 +00:00
Krzysztof Drewniak bd22554af0 [MLIR][GPU] Run generic LLVM optimizations when serializing (on AMD)
- Adds hooks that allow SerializeTo* passes to arbitrarily transform
the produced LLVM Module before it is passed to the code generation
passes.

- Uses these hooks within the SerializeToHsaco pass in order to run
LLVM optimizations and to set the optimization level on the
TargetMachine.

- Adds an optLevel parameter to SerializeToHsaco

Future work may include moving much of what's been added to
SerializeToHsaco to SerializeToBlob, but that would require
confirmation from the NVVM backend maintainers that it would be
appropriate to do so.

Depends on D114107

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114113
2021-11-19 19:21:24 +00:00
Thomas Raoux 47555d73f6 [mlir][gpu] Extend shuffle op modes and add nvvm lowering
Add up, down and idx modes to gpu shuffle ops, also change the mode from
string to enum

Differential Revision: https://reviews.llvm.org/D114188
2021-11-19 11:14:31 -08:00