Commit Graph

5334 Commits

Author SHA1 Message Date
Frederik Gossen 630afc61a8 [MLIR][Shape] Canonicalize casted dynamic extent tensor
Differential Revision: https://reviews.llvm.org/D99161
2021-03-29 13:59:19 +02:00
Alexander Belyaev 883912abe6 Revert "[mlir] Introduce CloneOp and adapt test cases in BufferDeallocation."
This reverts commit 06b03800f3.
Until some kind of support for region args is added.
2021-03-29 12:47:59 +02:00
Julian Gross 06b03800f3 [mlir] Introduce CloneOp and adapt test cases in BufferDeallocation.
Add a new clone operation to the memref dialect. This operation implicitly
copies data from a source buffer to a new buffer. In contrast to the linalg.copy
operation, this operation does not accept a target buffer as an argument.
Instead, this operation performs a conceptual allocation which does not need to
be performed manually.

Furthermore, this operation resolves the dependency from the linalg-dialect
in the BufferDeallocation pass. In addition, we also extended the canonicalization
patterns to fold clone operations. The copy removal pass has been removed.

Differential Revision: https://reviews.llvm.org/D99172
2021-03-29 10:19:10 +02:00
KareemErgawy-TomTom c52a5f2aa7 MLIR][STD] Fold trunci (sexti).
This patch folds the following pattern:

```
%arg0 = ...
%0 = sexti %arg0 : i1 to i8
%1 = trunci %0 : i8 to i1
```

into just `%arg0`.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D99464
2021-03-29 08:34:08 +02:00
KareemErgawy-TomTom e5f2898bc7 [MLIR][STD] Fold trunci (zexti).
This patch folds the following pattern:

```
  %arg0 = ...
  %0 = zexti %arg0 : i1 to i8
  %1 = trunci %0 : i8 to i1
```

into just `%arg0`.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D99453
2021-03-27 19:40:10 +01:00
Alex Zinenko d68ba1fe50 [mlir] Register Linalg passes in C API and Python Bindings
Provide a registration mechanism for Linalg dialect-specific passes in C
API and Python bindings. These are being built into the dialect library
but exposed in separate headers (C) or modules (Python).

Differential Revision: https://reviews.llvm.org/D99431
2021-03-27 09:57:56 +01:00
Jacques Pienaar 7ce07c6494 [mlir] Remove unneeded ShapeFunctionLibraryTerminatorOp
Now that NoTerminator is possible this op can be removed/it was only
needed structurally before. NFC.
2021-03-26 16:03:51 -07:00
Nicolas Vasilache 69d01e0e40 [mlir][python] NFC - Fix stale path in doc
Differential Revision: https://reviews.llvm.org/D99345
2021-03-26 15:27:12 +00:00
Stella Laurenzo ec294eb87b [mlir][linalg] Add an InitTensorOp python builder.
* This has the API I want but I am not thrilled with the implementation. There are various things that could be improved both about the way that Python builders are mapped and the way the Linalg ops are factored to increase code sharing between C++/Python.
* Landing this as-is since it at least makes the InitTensorOp usable with the right API. Will refactor underneath in follow-ons.

Differential Revision: https://reviews.llvm.org/D99000
2021-03-25 15:17:48 -07:00
Mehdi Amini fcdf142ed5 Remove unused function, fix warning (NFC)
The `mayNotHaveTerminator` was initially on Block but moved to the
verifier before landing and wasn't removed from its original place
where it is unused.
2021-03-25 18:37:57 +00:00
Alexander Belyaev 7f2236cf58 [mlir][linalg] Add output tensor args folding for linalg.tiled_loop.
Folds away TiledLoopOp output tensors when the following conditions are met:
* result of `linalg.tiled_loop` has no uses
* output tensor is the argument of `linalg.yield`

Example:

```
%0 = linalg.tiled_loop ...  outs (%out, %out_buf:tensor<...>, memref<...>) {
  ...
  linalg.yield %out : tensor ...
}
```

Becomes

```
linalg.tiled_loop ...  outs (%out_buf:memref<...>) {
  ...
  linalg.yield
}
```

Differential Revision: https://reviews.llvm.org/D99333
2021-03-25 18:11:05 +01:00
Uday Bondhugula 0b20413ef6 Revert "[Canonicalizer] Process regions top-down instead of bottom up & reuse existing constants."
This reverts commit 361b7d125b by Chris
Lattner <clattner@nondot.org> dated Fri Mar 19 21:22:15 2021 -0700.

The change to the greedy rewriter driver picking a different order was
made without adequate analysis of the trade-offs and experimentation. A
change like this has far reaching consequences on transformation
pipelines, and a major impact upstream and downstream. For eg., one
can’t be sure that it doesn’t slow down a large number of cases by small
amounts or create other issues. More discussion here:
https://llvm.discourse.group/t/speeding-up-canonicalize/3015/25

Reverting this so that improvements to the traversal order can be made
on a clean slate, in bigger steps, and higher bar.

Differential Revision: https://reviews.llvm.org/D99329
2021-03-25 22:17:26 +05:30
Vladislav Vinogradov 70b6f16e07 [mlir] Support MemRefType with multiple AffineMaps in getStridesAndOffset
Compose multiple AffineMaps into single map before strides extraction.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D99166
2021-03-25 12:18:49 +03:00
Jean Perier ffa455d4d4 [mlir] Translate global initializers after creating all LLVM IR globals
In case an operation in a global initializer region refers to another
global variable defined afterwards in the module of itself, translation
to LLVM IR was currently crashing because it could not find the LLVM IR global
when going through the initializer block.

To solve this problem, split global conversion to LLVM IR into two passes. A
first pass that creates LLVM IR global variables, and a second one that converts
the initializer, if any, and adds it to the llvm global.

Differential Revision: https://reviews.llvm.org/D99246
2021-03-25 09:53:58 +01:00
Mehdi Amini 973ddb7d6e Define a `NoTerminator` traits that allows operations with a single block region to not provide a terminator
In particular for Graph Regions, the terminator needs is just a
historical artifact of the generalization of MLIR from CFG region.
Operations like Module don't need a terminator, and before Module
migrated to be an operation with region there wasn't any needed.

To validate the feature, the ModuleOp is migrated to use this trait and
the ModuleTerminator operation is deleted.

This patch is likely to break clients, if you're in this case:

- you may iterate on a ModuleOp with `getBody()->without_terminator()`,
  the solution is simple: just remove the ->without_terminator!
- you created a builder with `Builder::atBlockTerminator(module_body)`,
  just use `Builder::atBlockEnd(module_body)` instead.
- you were handling ModuleTerminator: it isn't needed anymore.
- for generic code, a `Block::mayNotHaveTerminator()` may be used.

Differential Revision: https://reviews.llvm.org/D98468
2021-03-25 03:59:03 +00:00
Rob Suderman f5ba3eea67 [mlir][tosa] Add tosa.bitwise_not lowering to constant and xor
Lowering of bitwise_not to linalg dialect using a xor operation with a constant
of all-bits-one.

Differential Revision: https://reviews.llvm.org/D99221
2021-03-24 17:27:27 -07:00
Lei Zhang 19435d3863 [mlir][linalg] Fold fill -> tensor_reshape chain
For such op chains, we can create new linalg.fill ops
with the result type of the linalg.tensor_reshape op.

Differential Revision: https://reviews.llvm.org/D99116
2021-03-24 18:17:58 -04:00
Lei Zhang c241e1c2f5 [mlir][linalg] Support dropping unit dimensions for init tensors
init tensor operands also has indexing map and generally follow
the same constraints we expect for non-init-tensor operands.

Differential Revision: https://reviews.llvm.org/D99115
2021-03-24 18:17:58 -04:00
Lei Zhang 7f28d27cb6 [mlir][linalg] Allow controlling folding unit dim reshapes
This commit exposes an option to the pattern
FoldWithProducerReshapeOpByExpansion to allow
folding unit dim reshapes. This gives callers
more fine-grained controls.

Differential Revision: https://reviews.llvm.org/D99114
2021-03-24 18:17:57 -04:00
Lei Zhang f66120a357 [mlir][affine] Add canonicalization to merge affine min/max ops
This identifies a pattern where the producer affine min/max op
is bound to a dimension/symbol that is used as a standalone
expression in the consumer affine op's map. In that case the
producer affine min/max op can be merged into its consumer.

For example, a pattern like the following:

```
  %0 = affine.min affine_map<()[s0] -> (s0 + 16, s0 * 8)> ()[%sym1]
  %1 = affine.min affine_map<(d0)[s0] -> (s0 + 4, d0)> (%0)[%sym2]
```

Can be turned into:

```
  %1 = affine.min affine_map<
         ()[s0, s1] -> (s0 + 4, s1 + 16, s1 * 8)> ()[%sym2, %sym1]
```

Differential Revision: https://reviews.llvm.org/D99016
2021-03-24 18:17:57 -04:00
Lei Zhang 23fd26608c [mlir][affine] Deduplicate affine min/max op expressions
If there are multiple identical expressions in an affine
min/max op's map, we can just keep one.

Differential Revision: https://reviews.llvm.org/D99015
2021-03-24 18:17:57 -04:00
Lei Zhang e58597ee1c [mlir][linalg] Fuse producers with non-permutation indexing maps
Until now Linalg fusion only allow fusing producers whose operands
are all permutation indexing maps. It's easier to deduce the
subtensor/subview but it is an unnecessary constraint, as in tiling
we have more advanced logic to deduce the subranges even when the
operand is not of permutation indexing maps, e.g., the input operand
for convolution ops.

This patch uses the logic on tiling side to deduce subranges for
fusion. This enables fusing convolution with its consumer ops
when possible.

Along the way, we are now generating proper affine.min ops to guard
against size boundaries, if we cannot be certain they won't be
out of bounds.

Differential Revision: https://reviews.llvm.org/D99014
2021-03-24 18:17:57 -04:00
Lei Zhang ddf93abf49 [mlir][linalg] NFC: Move makeTiledShapes into Utils.{h|cpp}
This is a preparation step to reuse makeTiledShapes in tensor
fusion. Along the way, did some lightweight cleanups.

Differential Revision: https://reviews.llvm.org/D99013
2021-03-24 18:17:57 -04:00
Jacques Pienaar 5d6b4aa80d [mlir] Compare elements directly rather than creating pair first
This avoided some conversion overhead on a model in TypeUniquer when
converting from ArrayRef -> TypeRange.

Differential Revision: https://reviews.llvm.org/D99300
2021-03-24 14:39:11 -07:00
Tobias Gysi 880822255e [mlir][linalg] Do not call region builder during vectorization.
All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete.

Differential Revision: https://reviews.llvm.org/D99168
2021-03-24 14:55:11 +00:00
Alex Zinenko b3386a734e [mlir] introduce data layout entry for index type
Index type is an integer type of target-specific bitwidth present in many MLIR
operations (loops, memory accesses). Converting values of this type to
fixed-size integers has always been problematic. Introduce a data layout entry
to specify the bitwidth of `index` in a given layout scope, defaulting to 64
bits, which is a commonly used assumption, e.g., in constants.

Port builtin-to-LLVM type conversion to use this data layout entry when
converting `index` type and untie it from pointer size. This is particularly
relevant for GPU targets. Keep a possibility to forcibly override the index
type in lowerings.

Depends On D98525

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D98937
2021-03-24 15:13:42 +01:00
Alex Zinenko 842d243508 [mlir] forward data layout query to scoping op in absence of specification
Even if the layout specification is missing from an op that supports it, the op
is still expected to provide meaningful responses to data layout queries.
Forward them to the op instead of directly calling the default implementation.

Depends On D98524

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D98525
2021-03-24 15:13:41 +01:00
Alex Zinenko f9cdc61d11 [mlir] provide a version of data layout size hooks in bits
This is useful for bit-packing types such as vectors and tuples as well as for
exotic architectures that have non-8-bit bytes.

Depends On D98500

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D98524
2021-03-24 15:13:40 +01:00
Alex Zinenko 1916b0e098 [mlir] support data layout specs on ModuleOp
ModuleOp is a natural place to provide scoped data layout information. However,
it is undesirable for ModuleOp to implement the entirety of
DataLayoutOpInterface because that would require either pushing the interface
inside the IR library instead of a separate library, or putting the default
implementation of the interface as inline functions in headers leading to
binary bloat. Instead, ModuleOp accepts an arbitrary data layout spec attribute
and has a dedicated hook to extract it, and DataLayout is modified to know
about ModuleOp particularities.

Reviewed By: herhut, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98500
2021-03-24 15:13:38 +01:00
Nicolas Vasilache 7716e5535c [mlir] Fixes to hoist padding
Fix the BlockAndValueMapping update that was missing entries for scf.for op's blockIterArgs.
Skip cloning subtensors of the padded tensor as the logic for these is separate.
Add a filter to drop side-effecting ops.

Tests are beefed up to verify the IR is sound in all hoisting configurations for 2-level 3-D tiled matmul.

Differential Revision: https://reviews.llvm.org/D99255
2021-03-24 11:51:28 +00:00
Vladislav Vinogradov 18a2f479bf [mlir][NFC] Replace `getMemorySpaceAsInt` with `getMemorySpace` where possible
Use new `MemRefType::getMemorySpace` method with generic Attribute
in cases, where there is no specific logic around the memory space.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D99154
2021-03-24 13:23:59 +03:00
Mehdi Amini d905c10353 Add a mechanism for Dialects to provide a fallback for OpInterface
This mechanism makes it possible for a dialect to not register all
operations but still answer interface-based queries.
This can useful for dialects that are "open" or connected to an external
system and still interoperate with the compiler. It can also open up the
possibility to have a more extensible compiler at runtime: the compiler
does not need a pre-registration for each operation and the dialect can
inject behavior dynamically.

Reviewed By: rriddle, jpienaar

Differential Revision: https://reviews.llvm.org/D93085
2021-03-24 08:41:40 +00:00
Rob Suderman 28e6420744 [mlir][tosa] Add tosa.argmax to linalg lowering
Tosa's argmax lowering is representable as a linalg.indexed_generic
operation. Include the lowering to this type for both integer and
floating point types.

Differential Revision: https://reviews.llvm.org/D99137
2021-03-23 16:06:55 -07:00
Rob Suderman 4157a079af [mlir][tosa] Add tosa.pad to linalg.pad operation
Lowers from tosa's pad op to the linalg equivalent for floating,
integer, and quantized values.

Differential Revision: https://reviews.llvm.org/D98990
2021-03-23 14:15:48 -07:00
River Riddle 76f3c2f3f3 [mlir][Pattern] Add better support for using interfaces/traits to match root operations in rewrite patterns
To match an interface or trait, users currently have to use the `MatchAny` tag. This tag can be quite problematic for compile time for things like the canonicalizer, as the `MatchAny` patterns may get applied to  *every* operation. This revision adds better support by bucketing interface/trait patterns based on which registered operations have them registered. This means that moving forward we will only attempt to match these patterns to operations that have this interface registered. Two simplify defining patterns that match traits and interfaces, two new utility classes have been added: OpTraitRewritePattern and OpInterfaceRewritePattern.

Differential Revision: https://reviews.llvm.org/D98986
2021-03-23 14:05:33 -07:00
Chris Lattner 782c534117 [ODS] Implement a new 'hasCanonicalizeMethod' bit for cann patterns.
This provides a simplified way to implement 'matchAndRewrite' style
canonicalization patterns for ops that don't need the full power of
RewritePatterns.  Using this style, you can implement a static method
with a signature like:

```
LogicalResult AssertOp::canonicalize(AssertOp op, PatternRewriter &rewriter) {
  return success();
}
```

instead of dealing with defining RewritePattern subclasses.  This also
adopts this for a few canonicalization patterns in the std dialect to
show how it works.

Differential Revision: https://reviews.llvm.org/D99143
2021-03-23 13:45:45 -07:00
Rob Suderman 2d72b675d5 [mlir][tosa] Add tosa.tile to linalg.generic lowering
Tiling operations are generic operations with modified indexing. Updated to to
linalg lowerings to perform this lowering.

Differential Revision: https://reviews.llvm.org/D99113
2021-03-23 13:13:54 -07:00
natashaknk e20911b5c0 [mlir][tosa] Add tosa.matmul and tosa.fully_connected lowering
Adds lowerings for matmul and fully_connected. Only supports 2D tensors for inputs and weights, and 1D tensors for bias.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D99211
2021-03-23 13:09:53 -07:00
Alex Zinenko 20c68d9441 [mlir] silence -Wunused-variable in release mode in Linalg transforms 2021-03-23 18:59:12 +01:00
Nicolas Vasilache 2240568579 [MLIR][Linalg] Hoist padding across multiple levels of tiling
This revision introduces proper backward slice computation during the hoisting of
PadTensorOp. This allows hoisting padding even across multiple levels of tiling.
Such hoisting requires the proper handling of loop bounds that may depend on enclosing
loop variables.

Differential revision: https://reviews.llvm.org/D98965
2021-03-23 17:47:32 +00:00
Alex Zinenko 5fac87d1bc [mlir] verify that operand/result_segment_sizes attributes have i32 element
This is an assumption that is made in numerous places in the code. In
particular, in the code generated by mlir-tblgen for operand/result accessors
in ops with attr-sized operand or result lists. Make sure to verify this
assumption.

Note that the operation traits are verified before running the custom op
verifier, which can expect the trait verifier to have passed, but some traits
may be verified before the AttrSizedOperand/ResultTrait and should not make
such assumptions.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D99183
2021-03-23 18:26:31 +01:00
Frederik Gossen 94ef248d7b Revert "[MLIR] Canonicalize `shape.assuming` op to yield only inner values"
This reverts commit 5f8acd4fd2.
2021-03-23 16:05:55 +01:00
Frederik Gossen 5f8acd4fd2 [MLIR] Canonicalize `shape.assuming` op to yield only inner values
Differential Revision: https://reviews.llvm.org/D99156
2021-03-23 12:34:50 +01:00
Frederik Gossen f368b3a029 [MLIR][Shape] Canonicalize duplicate operands in `shape.cstr_broadcastable`
Differential Revision: https://reviews.llvm.org/D99159
2021-03-23 12:23:22 +01:00
Frederik Gossen d78374b2d3 [MLIR] Add callback builder for `shape.assuming` op
Differential Revision: https://reviews.llvm.org/D99153
2021-03-23 11:46:01 +01:00
Sean Silva 0524a09cc7 [mlir] Tune error message for assertion.
This assertion can fire in the case of different contexts as well, which
is not difficult to do from Python bindings, for example.
2021-03-22 18:10:18 -07:00
Chris Lattner 79d7f618af Rename FrozenRewritePatternList -> FrozenRewritePatternSet; NFC.
This nicely aligns the naming with RewritePatternSet.  This type isn't
as widely used, but we keep a using declaration in to help with
downstream consumption of this change.

Differential Revision: https://reviews.llvm.org/D99131
2021-03-22 17:40:45 -07:00
Mehdi Amini a0c776fc94 Add a mechanism for Dialects to customize printing/parsing operations when they are unregistered
Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D99007
2021-03-23 00:40:03 +00:00
Chris Lattner dc4e913be9 [PatternMatch] Big mechanical rename OwningRewritePatternList -> RewritePatternSet and insert -> add. NFC
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names.  We'll keep the old names around for a
couple weeks to help transitions.

Differential Revision: https://reviews.llvm.org/D99127
2021-03-22 17:20:50 -07:00
Chris Lattner 549e190236 [PatternRewriter] Rename OwningRewritePatternList -> RewritePatternSet and insert -> add
This maintains the old name to have minimal source impact on downstream codes, and
does not do the huge mechanical patch.  I expect the huge mechanical patch to land
sometime this week, but we can keep around the old names for a couple weeks to reduce
impact on downstream projects.

Differential Revision: https://reviews.llvm.org/D99119
2021-03-22 16:33:18 -07:00
Chris Lattner 6874726610 [PatternMatching] Add convenience insert method to OwningRewritePatternList. NFC.
This allows adding a C function pointer as a matchAndRewrite style pattern, which
is a very common case.  This adopts it in ExpandTanh to show how it reduces a level
of nesting.

We could allow C++ lambdas here, but that doesn't work as well with type inference
in the common case.  Instead of:

  patterns.insert(convertTanhOp);

you need to specify:

  patterns.insert<math::TanhOp>(convertTanhOp);

which is boilerplate'y.  Capturing state like this is very uncommon, so we choose
to require clients to define their own structs and use the non-convenience method
when they need to do so.

Differential Revision: https://reviews.llvm.org/D99039
2021-03-22 11:18:21 -07:00
Rob Suderman d7c44a5c78 [mlir][tosa] Fix tosa.mul to use tosa.apply_scale
Multiply-shift requires wider compute types or CPU specific code to avoid
premature truncation, apply_shift fixes this issue

Also, Tosa's mul op supports different input / output types. Added path that
sign-extends input values to int-32 values before multiplying.

Differential Revision: https://reviews.llvm.org/D99011
2021-03-22 11:01:35 -07:00
Nicolas Vasilache bcd6424f9b [mlir][Linalg] Fix linalg on tensor fusion
- Drop unnecessary occurrences of rewriter.eraseOp: dead linalg ops on tensors should be cleaned up by DCE.
- reimplement the part of Linalg on fusion that constructs the body and block arguments: the previous implementation had too much magic. Instead this spells out all cases explicitly and asserts / introduces TODOs for incorrect cases.

As a consequence, we can use the default traversal order for this pattern.

Differential Revision: https://reviews.llvm.org/D99070
2021-03-22 13:29:40 +00:00
Adrian Kuegel c691b9686b [mlir] Add an option to still use bottom-up traversal
GreedyPatternRewriteDriver was changed from bottom-up traversal to top-down traversal. Not all passes work yet with that change for traversal order. To give some time for fixing, add an option to allow to switch back to bottom-up traversal. Use this option in FusionOfTensorOpsPass which fails otherwise.

Differential Revision: https://reviews.llvm.org/D99059
2021-03-22 09:49:44 +01:00
Chris Lattner 1d909c9a35 Remove the extraneous MLIRContext argument from populateWithGenerated. NFC. 2021-03-21 10:38:35 -07:00
Chris Lattner ffde3acb1b [ShapeDialect] Silence a build warning, NFC
mlir/lib/Dialect/Shape/IR/Shape.cpp:573:26: warning: loop variable 'shape' is always a copy because the range of type '::mlir::Operation::operand_range' (aka 'mlir::OperandRange') does not return a reference [-Wrange-loop-analysis]
        for (const auto &shape : shapes()) {
                         ^
2021-03-21 10:10:38 -07:00
Chris Lattner 3a506b31a3 Change OwningRewritePatternList to carry an MLIRContext with it.
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters.  There are many many more to be removed.

Differential Revision: https://reviews.llvm.org/D99028
2021-03-21 10:06:31 -07:00
Chris Lattner 361b7d125b [Canonicalizer] Process regions top-down instead of bottom up & reuse existing constants.
This reapplies b5d9a3c / https://reviews.llvm.org/D98609 with a one line fix in
processExistingConstants to skip() when erasing a constant we've already seen.

Original commit message:

 1) Change the canonicalizer to walk the function in top-down order instead of
    bottom-up order.  This composes well with the "top down" nature of constant
    folding and simplification, reducing iterations and re-evaluation of ops in
    simple cases.
 2) Explicitly enter existing constants into the OperationFolder table before
    canonicalizing.  Previously we would "constant fold" them and rematerialize
    them, wastefully recreating a bunch fo constants, which lead to pointless
    memory traffic.

Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.

One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.

Differential Revision: https://reviews.llvm.org/D99006
2021-03-20 16:30:15 -07:00
Butygin 7219b31d40 [mlir] Additional folding for SelectOp
* Fold SelectOp when both true and false args are same SSA value
* Fold some cmp + select patterns

Differential Revision: https://reviews.llvm.org/D98576
2021-03-20 13:40:42 +03:00
Butygin 5657f93e78 [mlir] Canonicalize IfOp with trivial `then` and `else` bodies to list of SelectOp's
* Do we need a threshold on maximum number of Yeild arguments processed (maximum number of SelectOp's to be generated)?
* Had to modify some old IfOp tests to not get optimized by this pattern

Differential Revision: https://reviews.llvm.org/D98592
2021-03-20 12:18:49 +03:00
Rob Suderman e990fa2170 [mlir][tosa] Add tosa.reverse lowering to linalg.generic
Reverse lowers to a linalg.generic op by reversing the read order
in the index map.

Differential Revision: https://reviews.llvm.org/D98997
2021-03-19 21:46:47 -07:00
Mehdi Amini cdb6eb7e83 Update syntax for amx.tile_muli to use two Unit attr to mark the zext case
This makes the annotation tied to the operand and the use of a keyword
more explicit/readable on what it means.

Differential Revision: https://reviews.llvm.org/D99001
2021-03-20 04:12:24 +00:00
Stella Laurenzo 8d05a28887 [mlir][python] Adapt to `segment_sizes` attribute type change.
* Broken by https://reviews.llvm.org/rG1a75be0023cd80fd8560d689999a63d4368c90e6
2021-03-19 18:47:00 -07:00
Stella Laurenzo d9343e6153 [mlir][python] Function decorator for capturing a FuncOp from a python function.
* Moves this out of a test case where it was being developed to good effect and generalizes it.
* Having tried a number of things like this, I think this balances concerns reasonably well.

Differential Revision: https://reviews.llvm.org/D98989
2021-03-19 18:27:21 -07:00
River Riddle d75a611afb [mlir] Update `simplifyRegions` to use RewriterBase for erasure notifications
This allows for notifying callers when operations/blocks get erased, which is especially useful for the greedy pattern driver. The current greedy pattern driver "throws away" all information on constants in the operation folder because it doesn't know if they get erased or not. By passing in RewriterBase, we can directly track this and prevent the need for the pattern driver to rediscover all of the existing constants. In some situations this cuts the compile time of the canonicalizer in half.

Differential Revision: https://reviews.llvm.org/D98755
2021-03-19 16:33:54 -07:00
River Riddle cde203e0f9 [mlir][Pass] Coalesce dynamic pass pipelines before running
This was missed when dynamic pass pipelines were added, and is necessary for maximizing the performance/parallelism potential of the pass pipeline.
2021-03-19 14:35:42 -07:00
Stella Laurenzo 436c6c9c20 NFC: Break up the mlir python bindings into individual sources.
* IRModules.cpp -> (IRCore.cpp, IRAffine.cpp, IRAttributes.cpp, IRTypes.cpp).
* The individual pieces now compile in the 5-15s range whereas IRModules.cpp was starting to approach a minute (didn't capture a before time).
* More fine grained splitting is possible, but this represents the most obvious.

Differential Revision: https://reviews.llvm.org/D98978
2021-03-19 13:33:51 -07:00
Benjamin Kramer 6327a7cfd7 [mlir][Linalg] Make LLVM_DEBUG region bigger to avoid warnings in Release builds
Transforms.cpp:586:16: error: unused variable 'v' [-Werror,-Wunused-variable]
    for (Value v : operands)
               ^
2021-03-19 20:56:59 +01:00
Rob Suderman 47286fc530 [mlir][tosa] Add tosa.cast to linalg lowering
Handles lowering from the tosa CastOp to the equivalent linalg lowering. It
includes support for interchange between bool, int, and floating point.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D98828
2021-03-19 11:48:37 -07:00
Rob Suderman 1b7498120d [mlir][tosa] Add tosa.logical_* to linalg lowerings
Adds lowerings for logical_* boolean operations. Each of these ops only operate
on booleans allowing simple lowerings.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D98910
2021-03-19 11:30:42 -07:00
Stella Laurenzo d4cba4a188 [mlir][linalg] Add structured op builders from python opdsl.
* Makes the wrapped functions of the `@linalg_structured_op` decorator callable such that they emit IR imperatively when invoked.
* There are numerous TODOs that I will keep working through to achieve generality.
* Will true up exception handling tests as the feature progresses (for things that are actually errors once everything is implemented).
* Includes the addition of an `isinstance` method on concrete types in the Python API.

Differential Revision: https://reviews.llvm.org/D98754
2021-03-19 11:20:36 -07:00
Nicolas Vasilache 5b2d8503d1 [mlir][Linalg] NFC - Expose helper function `substituteMin`. 2021-03-19 16:26:52 +00:00
Christian Sigg a5f9cda173 [mlir] Rename gpu-to-llvm pass implementation file
Also remove populate patterns function and binary annotation name option.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D98930
2021-03-19 13:58:13 +01:00
Alexander Belyaev 628f5c9da2 [mlir] Add a roundtrip test for 'linalg.tiled_loop' on buffers.
https://llvm.discourse.group/t/rfc-add-linalg-tileop/2833

Differential Revision: https://reviews.llvm.org/D98900
2021-03-19 09:38:20 +01:00
Christian Sigg 74ffe8dc59 [mlir] Remove ConvertKernelFuncToBlob
All users have been converted to gpu::SerializeToBlobPass.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D98928
2021-03-19 09:33:47 +01:00
Christian Sigg a825fb2c07 [mlir] Remove mlir-rocm-runner
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.

I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.

Reviewed By: whchung

Differential Revision: https://reviews.llvm.org/D98447
2021-03-19 00:24:10 -07:00
Andrew Young f178c13fa8
[mlir] Support use-def cycles in graph regions during regionDCE
When deleting operations in DCE, the algorithm uses a post-order walk of
the IR to ensure that value uses were erased before value defs. Graph
regions do not have the same structural invariants as SSA CFG, and this
post order walk could delete value defs before uses.  This problem is
guaranteed to occur when there is a cycle in the use-def graph.

This change stops DCE from visiting the operations and blocks in any
meaningful order.  Instead, we rely on explicitly dropping all uses of a
value before deleting it.

Reviewed By: mehdi_amini, rriddle

Differential Revision: https://reviews.llvm.org/D98919
2021-03-18 23:06:45 -07:00
Rob Suderman 286a9d467e [mlir][tosa] Add lowering for tosa.rescale to linalg.generic
This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.

We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.

Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.

Reviewed By: silvas

Differential Revision: https://reviews.llvm.org/D98753
2021-03-18 16:14:05 -07:00
Rob Suderman 5627564fe0 [mlir][tosa] Add tosa.concat to subtensor inserts lowering
Includes lowering for tosa.concat with indice computation with subtensor insert
operations. Includes tests along two different indices.

Differential Revision: https://reviews.llvm.org/D98813
2021-03-18 15:59:07 -07:00
thomasraoux 44f24f3996 [mlir] Fix build failure due to 1a572f4 2021-03-18 14:58:32 -07:00
Lei Zhang fcc1ce0093 Revert "Revert "[mlir] Add linalg.fill bufferization conversion""
This reverts commit c69550c132 with
proper fix applied.
2021-03-18 17:21:58 -04:00
Mehdi Amini c69550c132 Revert "[mlir] Add linalg.fill bufferization conversion"
This reverts commit 32a744ab20.

CI is broken:

test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input
 // CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32>
           ^
2021-03-18 21:18:07 +00:00
Eugene Zhulenev 32a744ab20 [mlir] Add linalg.fill bufferization conversion
`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98671
2021-03-18 13:41:16 -07:00
thomasraoux 1a572f4509 [mlir] Add vector op support to cuda-runner including vector.print
Differential Revision: https://reviews.llvm.org/D97346
2021-03-18 13:03:08 -07:00
thomasraoux 16947650d5 [mlir][linalg] Extend linalg vectorization to support non-identity input maps
This propagates the affine map to transfer_read op in case it is not a
minor identity map.

Differential Revision: https://reviews.llvm.org/D98523
2021-03-18 12:32:35 -07:00
lorenzo chelini 4c782a24d9 [mlir] Fix typo in SCF.cpp (NFC) 2021-03-18 19:15:33 +01:00
Alexander Belyaev 283799157e [mlir][linalg] Add support for memref inputs/outputs for `linalg.tiled_loop`.
Also use `ArrayAttr` to pass iterator pass to the TiledLoopOp builder.

Differential Revision: https://reviews.llvm.org/D98871
2021-03-18 16:11:03 +01:00
David Truby de155f4af2 [MLIR][OpenMP] Pretty printer and parser for omp.wsloop
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D92327
2021-03-18 13:37:01 +00:00
Vladislav Vinogradov 02834e1bd9 [mlir][ODS] Get rid of limitations in rewriters generator
Do not limit the number of arguments in rewriter pattern.

Introduce separate `FmtStrVecObject` class to handle
format of variadic `std::string` array.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D97839
2021-03-18 12:21:06 +03:00
Frederik Gossen 1ce70c15ed [MLIR] Canonicalize broadcast operations on single shapes
This covers cases that are not folded away because the extent tensor type
becomes more concrete in the process.

Differential Revision: https://reviews.llvm.org/D98782
2021-03-18 08:59:50 +01:00
Rob Suderman f4bb076a44 [mlir][tosa] Add tosa.slice to std.subtensor lowering
Lowering to subtensor is added for tosa.slice operator.

Differential Revision: https://reviews.llvm.org/D98825
2021-03-17 17:28:18 -07:00
River Riddle d70185ec48 [mlir][IR] Support parsing hex float values in the DialectSymbolParser
This has been a TODO for a while, and prevents breakages for attributes/types that contain floats that can't roundtrip outside of the hex format.

Differential Revision: https://reviews.llvm.org/D98808
2021-03-17 13:52:32 -07:00
Vladislav Vinogradov fee9054232 [mlir][ODS] Support specialized Attribute class for Enums
Add a feature to `EnumAttr` definition to generate
specialized Attribute class for the particular enumeration.

This class will inherit `StringAttr` or `IntegerAttr` and
will override `classof` and `getValue` methods.

With this class the enumeration predicate can be checked with simple
RTTI calls (`isa`, `dyn_cast`) and it will return the typed enumeration
directly instead of raw string/integer.

Based on the following discussion:
https://llvm.discourse.group/t/rfc-add-enum-attribute-decorator-class/2252

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D97836
2021-03-17 16:44:24 +03:00
lorenzo chelini 0a74a7161b [mlir] scf::ForOp: Drop iter arguments (and corresponding result) with no use
'ForOpIterArgsFolder' can now remove iterator arguments (and corresponding
results) with no use.

Example:

```
%cst = constant 32 : i32

%0:2 = scf.for %arg1 = %lb to %ub step %step iter_args(%arg2 = %arg0, %arg3 = %cst)
  -> (i32, i32) {
  %1 = addu %arg2, %cst : i32
  scf.yield %1, %1 : i32, i32
}

use(%0#0)

```

%arg3 is not used in the block, and its corresponding result `%0#1` has no use,
thus remove the iter argument.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98711
2021-03-17 12:06:17 +00:00
Stephan Herhut 5837fdc4cc [mlir][llvm] Pass struct results as parameter in c wrapper
Returning structs directly in LLVM does not necessarily align with the C ABI of
the platform. This might happen to work on Linux but for small structs this
breaks on Windows. With this change, the wrappers work platform independently.

Differential Revision: https://reviews.llvm.org/D98725
2021-03-17 12:58:52 +01:00
Gaurav Shukla 8e3075c2b0 [MLIR] Fix lowering of Affine IfOp in the presence of yield values.
This commit fixes the lowering of `Affine.IfOp` to `SCF.IfOp` in the
presence of yield values. These changes have been made as a part of
`-lower-affine` pass.

Differential Revision: https://reviews.llvm.org/D98760
2021-03-17 16:33:32 +05:30
River Riddle caa7038a89 [mlir][IR] Move the remaining builtin attributes to ODS.
With this revision, all builtin attributes and types will have been moved to the ODS generator.

Differential Revision: https://reviews.llvm.org/D98474
2021-03-16 16:31:53 -07:00
River Riddle 425e11eea1 [mlir][AttrTypeDefGen] Add support for custom parameter comparators
Some parameters to attributes and types rely on special comparison routines other than operator== to ensure equality. This revision adds support for those parameters by allowing them to specify a `comparator` code block that determines if `$_lhs` and `$_rhs` are equal. An example of one of these paramters is APFloat, which requires `bitwiseIsEqual` for bitwise comparison (which we want for attribute equality).

Differential Revision: https://reviews.llvm.org/D98473
2021-03-16 16:31:53 -07:00
River Riddle 1f13963ec1 [mlir][pdl] Cast the OperationPosition to Position to fix MSVC miscompile
If we don't cast, MSVC picks an overload that hasn't been defined yet(not sure why) and miscompiles.
2021-03-16 16:11:14 -07:00
Eugene Zhulenev 74f6138bd9 [mlir] Add lowering from math::Log1p to LLVM
[mlir] Add lowering from math::Log1p to LLVM

Reviewed By: cota

Differential Revision: https://reviews.llvm.org/D98662
2021-03-16 15:59:09 -07:00
River Riddle 85ab413b53 [mlir][PDL] Add support for variadic operands and results in the PDL byte code
Supporting ranges in the byte code requires additional complexity, given that a range can't be easily representable as an opaque void *, as is possible with the existing bytecode value types (Attribute, Type, Value, etc.). To enable representing a range with void *, an auxillary storage is used for the actual range itself, with the pointer being passed around in the normal byte code memory. For type ranges, a TypeRange is stored. For value ranges, a ValueRange is stored. The above problem represents a majority of the complexity involved in this revision, the rest is adapting/adding byte code operations to support the changes made to the PDL interpreter in the parent revision.

After this revision, PDL will have initial end-to-end support for variadic operands/results.

Differential Revision: https://reviews.llvm.org/D95723
2021-03-16 13:20:19 -07:00
River Riddle 3a833a0e0e [mlir][PDL] Add support for variadic operands and results in the PDL Interpreter
This revision extends the PDL Interpreter dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl_interp.check_types : Compare a range of types with a known range.
* pdl_interp.create_types : Create a constant range of types.
* pdl_interp.get_operands : Get a range of operands from an operation.
* pdl_interp.get_results : Get a range of results from an operation.
* pdl_interp.switch_types : Switch on a range of types.

This revision handles adding support in the interpreter dialect and the conversion from PDL to PDLInterp. Support for variadic operands and results in the bytecode will be added in a followup revision.

Differential Revision: https://reviews.llvm.org/D95722
2021-03-16 13:20:19 -07:00
River Riddle 1eb6994d6a [mlir][PDL] Add support for variadic operands and results in PDL
This revision extends the PDL dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl.operands : Define a range of input operands.
* pdl.results : Extract a result group from an operation.
* pdl.types : Define a handle to a range of types.

Support for these in the pdl interpreter dialect and byte code will be added in followup revisions.

Differential Revision: https://reviews.llvm.org/D95721
2021-03-16 13:20:18 -07:00
River Riddle 02c4c0d5b2 [mlir][pdl] Remove CreateNativeOp in favor of a more general ApplyNativeRewriteOp.
This has a numerous amount of benefits, given the overly clunky nature of CreateNativeOp:
* Users can now call into arbitrary rewrite functions from inside of PDL, allowing for more natural interleaving of PDL/C++ and enabling for more of the pattern to be in PDL.
* Removes the need for an additional set of C++ functions/registry/etc. The new ApplyNativeRewriteOp will use the same PDLRewriteFunction as the existing RewriteOp. This reduces the API surface area exposed to users.

This revision also introduces a new PDLResultList class. This class is used to provide results of native rewrite functions back to PDL. We introduce a new class instead of using a SmallVector to simplify the work necessary for variadics, given that ranges will require some changes to the structure of PDLValue.

Differential Revision: https://reviews.llvm.org/D95720
2021-03-16 13:20:18 -07:00
River Riddle 242762c9a3 [mlir][pdl] Restructure how results are represented.
Up until now, results have been represented as additional results to a pdl.operation. This is fairly clunky, as it mismatches the representation of the rest of the IR constructs(e.g. pdl.operand) and also isn't a viable representation for operations returned by pdl.create_native. This representation also creates much more difficult problems when factoring in support for variadic result groups, optional results, etc. To resolve some of these problems, and simplify adding support for variable length results, this revision extracts the representation for results out of pdl.operation in the form of a new `pdl.result` operation. This operation returns the result of an operation at a given index, e.g.:

```
%root = pdl.operation ...
%result = pdl.result 0 of %root
```

Differential Revision: https://reviews.llvm.org/D95719
2021-03-16 13:20:18 -07:00
Nicolas Vasilache b661788b77 [mlir] NFC - Expose GlobalCreator so it can be reused. 2021-03-16 12:29:04 +00:00
Adrian Kuegel 2995e161b0 [mlir]: Add canonicalization for dim of 1D alloc of size rank.
Differential Revision: https://reviews.llvm.org/D97542
2021-03-16 10:38:57 +01:00
Lorenzo Chelini fd7eee64c5 scf::ForOp: Fold away iterator arguments with no use and for which the corresponding input is yielded
Enhance 'ForOpIterArgsFolder' to remove unused iteration arguments in a
scf::ForOp. If the block argument corresponding to the given iterator has no
use and the yielded value equals the input, we fold it away.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98503
2021-03-16 07:01:25 +00:00
Aart Bik 6ad7b97e20 [mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect)
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98470
2021-03-15 17:59:05 -07:00
Alex Zinenko e82a30bdce [mlir] enable Python bindings for the MemRef dialect
A previous commit moved multiple ops from Standard to MemRef dialect.
Some of these ops are exercised in Python bindings. Enable bindings for
the newly created MemRef dialect and update a test accordingly.
2021-03-15 14:07:51 +01:00
Alex Zinenko 0fb4a201c0 [mlir] fix shared-lib build fallout of e2310704d8
The patch in question broke the build with shared libraries due to
missing dependencies, one of which would have been circular between
MLIRStandard and MLIRMemRef if added. Fix this by moving more code
around and swapping the dependency direction. MLIRMemRef now depends on
MLIRStandard, but MLIRStandard does _not_ depend on MLIRMemRef.
Arguably, this is the right direction anyway since numerous libraries
depend on MLIRStandard and don't necessarily need to depend on
MLIRMemref.

Other otable changes include:
- some EDSC code is moved inline to MemRef/EDSC/Intrinsics.h because it
  creates MemRef dialect operations;
- a utility function related to shape moved to BuiltinTypes.h/cpp
  because it only realtes to shaped types and not any particular dialect
  (standard dialect is erroneously believed to contain MemRefType);
- a Python test for the standard dialect is disabled completely because
  the ops it tests moved to the new MemRef dialect, but it is not
  exposed to Python bindings, and the change for that is non-trivial.
2021-03-15 13:41:38 +01:00
Julian Gross e2310704d8 [MLIR] Create memref dialect and move dialect-specific ops from std.
Create the memref dialect and move dialect-specific ops
from std dialect to this dialect.

Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
AssumeAlignmentOp -> MemRef_AssumeAlignmentOp
DeallocOp -> MemRef_DeallocOp
DimOp -> MemRef_DimOp
MemRefCastOp -> MemRef_CastOp
MemRefReinterpretCastOp -> MemRef_ReinterpretCastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
LoadOp -> MemRef_LoadOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
SubViewOp -> MemRef_SubViewOp
TransposeOp -> MemRef_TransposeOp
TensorLoadOp -> MemRef_TensorLoadOp
TensorStoreOp -> MemRef_TensorStoreOp
TensorToMemRefOp -> MemRef_BufferCastOp
ViewOp -> MemRef_ViewOp

The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667

Differential Revision: https://reviews.llvm.org/D98041
2021-03-15 11:14:09 +01:00
Alex Zinenko 40d8e4d3f9 Revert "[Canonicalizer] Process regions top-down instead of bottom up & reuse existing constants."
This reverts commit b5d9a3c923.

The commit introduced a memory error in canonicalization/operation
walking that is exposed when compiled with ASAN. It leads to crashes in
some "release" configurations.
2021-03-15 10:27:55 +01:00
Frederik Gossen b55f424ffc [MLIR] Add canonicalization for `shape.broadcast`
Remove redundant operands and fold if only one left.

Differential Revision: https://reviews.llvm.org/D98402
2021-03-15 10:11:28 +01:00
Frederik Gossen 2a71f95767 [MLIR] Allow compatible shapes in `Elementwise` operations
Differential Revision: https://reviews.llvm.org/D98186
2021-03-15 09:56:20 +01:00
Chris Lattner 91a6ad5ad8 [m_Constant] Check #operands/results before hasTrait()
We know that all ConstantLike operations have one result and no operands,
so check this first before doing the trait check.  This change speeds up
Canonicalize on a CIRCT testcase by ~5%.

Differential Revision: https://reviews.llvm.org/D98615
2021-03-14 20:14:19 -07:00
Chris Lattner b5d9a3c923 [Canonicalizer] Process regions top-down instead of bottom up & reuse existing constants.
Two changes:
 1) Change the canonicalizer to walk the function in top-down order instead of
    bottom-up order.  This composes well with the "top down" nature of constant
    folding and simplification, reducing iterations and re-evaluation of ops in
    simple cases.
 2) Explicitly enter existing constants into the OperationFolder table before
    canonicalizing.  Previously we would "constant fold" them and rematerialize
    them, wastefully recreating a bunch fo constants, which lead to pointless
    memory traffic.

Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.

One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.

Differential Revision: https://reviews.llvm.org/D98609
2021-03-14 18:21:42 -07:00
Aart Bik e7ee4eaaf7 [mlir][sparse] disable nonunit stride dense vectorization
This is a temporary work-around to get our all-annotations-all-flags
stress testing effort run clean. In the long run, we want to provide
efficient implementations of strided loads and stores though

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D98563
2021-03-12 16:49:32 -08:00
Eugene Zhulenev 39b2cd4009 [mlir] Annotate functions used only in debug mode with LLVM_ATTRIBUTE_UNUSED
Functions used only in `assert` cause warnings in release mode

Reviewed By: mehdi_amini, dcaballe, ftynse

Differential Revision: https://reviews.llvm.org/D98476
2021-03-12 11:25:46 -08:00
Alex Zinenko 4affd0c40e [mlir] fix a memory leak in NestedPattern
NestedPattern uses a BumpPtrAllocator to store child (nested) pattern
objects to decrease the overhead of dynamic allocation. This assumes all
allocations happen inside the allocator that will be freed as a whole.
However, NestedPattern contains `std::function` as a member, which
allocates internally using `new`, unaware of the BumpPtrAllocator. Since
NestedPattern only holds pointers to the nested patterns allocated in
the BumpPtrAllocator, it never calls their destructors, so the
destructor of the `std::function`s they contain are never called either,
leaking the allocated memory.

Make NestedPattern explicitly call destructors of nested patterns. This
additionally requires to actually copy the nested patterns in
copy-construction and copy-assignment instead of just sharing the
pointer to the arena-allocated list of children to avoid double-free. An
alternative solution would be to add reference counting to the list of
arena-allocated list of children.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98485
2021-03-12 18:52:14 +01:00
Alex Zinenko be5b844a35 [mlir] fix memory leak on failure path in parser
Forward references to blocks lead to `Block`s being allocated in the
parser, but they are not necessarily included into a region if parsing
fails, leading to a leak. Clean them up in parser destructor.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D98403
2021-03-12 09:24:08 +01:00
Marius Brehler 849f8183fb [mlir] Fix ConstantOp verifier
This restricts the attributes to integers for constants of type
IndexType. So far an attribute like StringAttr as in

  %c1 = constant "" : index

is valid.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98216
2021-03-12 08:49:25 +01:00
Sergei Grechanik fd2b08969b [mlir][Vector] Lowering of transfer_read/write to vector.load/store
This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.

Reviewed By: dcaballe, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97822
2021-03-11 18:17:51 -08:00
Mehdi Amini e1364f1068 Replace use of OperationState with builder::create in GPU Kernel Outlining (NFC)
OperationState is a low level API that is rarely indicated, the builder
API convenient wrapper is preferred when possible.
2021-03-12 00:14:02 +00:00
Diego Caballero 0fd0fb5329 Reland: [mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer.
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97892
2021-03-12 01:08:28 +02:00
Diego Caballero 96891f0418 Reland: [mlir][Vector][Affine] Improve affine vectorizer algorithm
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
  * Removed tracking of root and terminal ops. Existing vectorization
    functionality is preserved and extended so that loop nests without
    root-terminal chains can be vectorized.
  * Vectorizing a loop nest now only requires a single topological traversal.
  * A new vector loop nest is incrementally built along the vectorization
    process. The original scalar loop is kept intact. No cloning guard is needed
    to recover the scalar loop if vectorization fails. This approach also
    simplifies the challenging task of replacing a loop operation amid the
    vectorization process without invalidating the analysis information that
    depends on the original loop.
  * Vectorization of specific operations has been implemented as independent,
    preparing them to be moved to a potential vectorization interface.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97442
2021-03-12 00:19:50 +02:00
River Riddle 31bb8efd69 [mlir][StorageUniquer] Properly call the destructor on non-trivially destructible storage instances
This allows for storage instances to store data that isn't uniqued in the context, or contain otherwise non-trivial logic, in the rare situations that they occur. Storage instances with trivial destructors will still have their destructor skipped. A consequence of this is that the storage instance definition must be visible from the place that registers the type.

Differential Revision: https://reviews.llvm.org/D98311
2021-03-11 11:35:32 -08:00
Diego Caballero ed193bce9d [mlir][Vector][Affine] Fix heap-use-after-free in vectorizer
This patch fixes a heap-use-after-free introduced by the recent changes
in the vectorizer: https://reviews.llvm.org/rG95db7b4aeaad590f37720898e339a6d54313422f
The problem is due to the way candidate loops are visited. All candidate loops
are pattern-matched beforehand using the 'NestedMatch' utility. These matches may
intersect with each other so it may happen that we try to vectorize a loop that
was previously vectorized. The new vectorization algorithm replaces the original
loops that are vectorized with new loops and, therefore, any reference to the
original loops in the pre-computed matches becomes invalid.

This patch fixes the problem by classifying the candidate matches into buckets
before vectorization. Each bucket contains all the matches that intersect. The
vectorizer uses these buckets to make sure that we only vectorize *one* match from
each bucket, at most.

Differential Revision: https://reviews.llvm.org/D98382
2021-03-11 20:44:07 +02:00
Nikita Popov f3f0c6cd47 [mlir] Remove uses of type-less CreateLoad() APIs (NFC)
For the use in LLVMOps.td I used the getPointerElementType()
escape hatch, as it's not obvious to me how the load type
should be properly obtained here.
2021-03-11 18:39:20 +01:00
Alex Zinenko 3ba14fa0ce [mlir] Introduce data layout modeling subsystem
Data layout information allows to answer questions about the size and alignment
properties of a type. It enables, among others, the generation of various
linear memory addressing schemes for containers of abstract types and deeper
reasoning about vectors. This introduces the subsystem for modeling data
layouts in MLIR.

The data layout subsystem is designed to scale to MLIR's open type and
operation system. At the top level, it consists of attribute interfaces that
can be implemented by concrete data layout specifications; type interfaces that
should be implemented by types subject to data layout; operation interfaces
that must be implemented by operations that can serve as data layout scopes
(e.g., modules); and dialect interfaces for data layout properties unrelated to
specific types. Built-in types are handled specially to decrease the overall
query cost.

A concrete default implementation of these interfaces is provided in the new
Target dialect. Defaults for built-in types that match the current behavior are
also provided.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D97067
2021-03-11 16:54:47 +01:00
Arpith C. Jacob b4a516cc43 [mlir] Add LLVM loop codegen options to control software pipelining
Support specifying the II and disabling pipelining.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D98420
2021-03-11 16:46:44 +01:00
Tres Popp 25a20b8aa6 [mlir] Correct verifyCompatibleShapes
verifyCompatibleShapes is not transitive. Create an n-ary version and
update SameOperandShapes and SameOperandAndResultShapes traits to use
it.

Differential Revision: https://reviews.llvm.org/D98331
2021-03-11 13:04:10 +01:00
Julian Gross 2aef202981 [mlir] Fix invalid hoisting of dependent allocs in buffer hoisting pass.
Buffer hoisting moves allocs upwards although it has dependency within its
nested region. This patch fixes this issue.

https://bugs.llvm.org/show_bug.cgi?id=49142

Differential Revision: https://reviews.llvm.org/D98248
2021-03-11 11:46:16 +01:00
Frederik Gossen b975e3b5aa [MLIR] Add canoncalization for `shape.is_broadcastable`
Canonicalize `is_broadcastable` to constant true if fewer than 2 unique shape
operands. Eliminate redundant operands, otherwise.

Differential Revision: https://reviews.llvm.org/D98361
2021-03-11 10:10:34 +01:00
Christian Sigg 2224221fb3 [mlir] Add NVVM to CUBIN conversion to mlir-opt
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.

The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.

Depends On D98279

Reviewed By: herhut, rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D98203
2021-03-11 10:07:11 +01:00
River Riddle 4e02eb8014 [mlir] Optimize the implementation of RegionDCE
The current implementation has some inefficiencies that become noticeable when running on large modules. This revision optimizes the code, and updates some out-dated idioms with newer utilities. The main components of this optimization include:

* Add an overload of Block::eraseArguments that allows for O(N) erasure of disjoint arguments.
* Don't process entry block arguments given that we don't erase them at this point.
* Don't track individual operation results, given that we don't erase them. We can just track the parent operation.

Differential Revision: https://reviews.llvm.org/D98309
2021-03-10 16:39:50 -08:00
Emilio Cota c0891706bc [mlir] Add polynomial approximation for math::Log2
```
name                     old cpu/op  new cpu/op  delta
BM_mlir_Log2_f32/10       134ns ±15%    45ns ± 4%  -66.39%  (p=0.000 n=20+17)
BM_mlir_Log2_f32/100     1.03µs ±16%  0.12µs ±10%  -88.78%  (p=0.000 n=20+18)
BM_mlir_Log2_f32/1k      10.3µs ±16%   0.7µs ± 5%  -93.24%  (p=0.000 n=20+17)
BM_mlir_Log2_f32/10k      104µs ±15%     7µs ±14%  -93.25%  (p=0.000 n=20+20)
BM_eigen_s_Log2_f32/10   95.3ns ±17%  90.9ns ± 6%     ~     (p=0.228 n=20+18)
BM_eigen_s_Log2_f32/100   907ns ± 3%   911ns ± 6%     ~     (p=0.539 n=16+20)
BM_eigen_s_Log2_f32/1k   9.88µs ± 4%  9.85µs ± 3%     ~     (p=0.790 n=16+17)
BM_eigen_s_Log2_f32/10k   105µs ±10%   110µs ±16%     ~     (p=0.459 n=16+20)
BM_eigen_v_Log2_f32/10   32.5ns ±31%  33.9ns ±14%   +4.31%  (p=0.028 n=17+20)
BM_eigen_v_Log2_f32/100   176ns ± 8%   180ns ± 7%   +2.19%  (p=0.045 n=16+17)
BM_eigen_v_Log2_f32/1k   1.44µs ± 4%  1.50µs ± 9%   +3.91%  (p=0.001 n=16+17)
BM_eigen_v_Log2_f32/10k  14.5µs ±10%  15.0µs ± 8%   +3.92%  (p=0.002 n=16+19)
```

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D98282
2021-03-10 14:49:22 -08:00
Christian Sigg 6a291ed0f0 [mlir] Remove unnecessary copying of pass options
I missed a comment in D98279 that you don't need to copy pass options.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D98366
2021-03-10 21:55:28 +01:00
Alex Zinenko 79da91c59a Revert "[mlir][Vector][Affine] Improve affine vectorizer algorithm"
This reverts commit 95db7b4aea.

This breaks vectorize_2d.mlir and vectorize_3d.mlir test under ASAN (use
after free).
2021-03-10 20:25:49 +01:00
Alex Zinenko ed715536f1 Revert "[mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer."
This reverts commit 77a9d1549f.

Parent commit is broken.
2021-03-10 20:25:32 +01:00
Diego Caballero 77a9d1549f [mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer.
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97892
2021-03-10 20:40:21 +02:00
Diego Caballero 95db7b4aea [mlir][Vector][Affine] Improve affine vectorizer algorithm
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
  * Removed tracking of root and terminal ops. Existing vectorization
    functionality is preserved and extended so that loop nests without
    root-terminal chains can be vectorized.
  * Vectorizing a loop nest now only requires a single topological traversal.
  * A new vector loop nest is incrementally built along the vectorization
    process. The original scalar loop is kept intact. No cloning guard is needed
    to recover the scalar loop if vectorization fails. This approach also
    simplifies the challenging task of replacing a loop operation amid the
    vectorization process without invalidating the analysis information that
    depends on the original loop.
  * Vectorization of specific operations has been implemented as independent,
    preparing them to be moved to a potential vectorization interface.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97442
2021-03-10 20:29:58 +02:00
Vladislav Vinogradov b599f464d4 [mlir][CMAKE] Fix build with BUILD_SHARED_LIBS=ON
Link `MLIRStandardToLLVM` to `MLIRAVX512Transforms`, since
the latter uses `LLVMTypeConverter` defined in the first one.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D98336
2021-03-10 14:52:36 +01:00
Alex Zinenko 78f3fb4f46 [mlir] Update comments in ArmNeon dialect. NFC
These were not updated when squashing LLVMArmNeon and ArmNeon dialects.
2021-03-10 13:35:57 +01:00
Alex Zinenko a776942ba1 [mlir] squash LLVM_AVX512 dialect into AVX512
The dialect separation was introduced to demarkate ops operating in different
type systems. This is no longer the case after the LLVM dialect has migrated to
using built-in vector types, so the original reason for separation is no longer
valid. Squash the two dialects into one.

The code size decrease isn't quite large: the ops originally in LLVM_AVX512 are
preserved because they match LLVM IR intrinsics specialized for vector element
bitwidth. However, it is still conceptually beneficial to have only one
dialect. I originally considered to use Tablegen multiclasses to define both
the type-polymorphic op and its two intrinsic-related instantiations, but
decided against it given both the complexity of the required Tablegen input and
its dissimilarity with the rest of ODS-defined ops, both potentially resulting
in very poor maintainability.

Depends On D98327

Reviewed By: nicolasvasilache, springerm

Differential Revision: https://reviews.llvm.org/D98328
2021-03-10 13:07:26 +01:00
Inho Seo 2ce4caf414 Moved getStaticLoopRanges and getStaticShape methods to LinalgInterfaces.td to add static shape verification
It is to use the methods in LinalgInterfaces.cpp for additional static shape verification to match the shaped operands and loop on linalgOps. If I used the existing methods, I would face circular dependency linking issue. Now we can use them as methods of LinalgOp.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D98163
2021-03-10 04:06:22 -08:00
Christian Sigg 4d295cf5b5 [mlir] Add base class for GpuKernelToBlobPass
Instead of configuring kernel-to-cubin/rocdl lowering through callbacks, introduce a base class that target-specific passes can derive from.

Put the base class in GPU/Transforms, according to the discussion in D98203.

The mlir-cuda-runner will go away shortly, and the mlir-rocdl-runner as well at some point. I therefore kept the existing code path working and will remove it in a separate step.

Depends On D98168

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D98279
2021-03-10 12:14:43 +01:00
Vladislav Vinogradov f3bf5c053b [mlir] Model MemRef memory space as Attribute
Based on the following discussion:
https://llvm.discourse.group/t/rfc-memref-memory-shape-as-attribute/2229

The goal of the change is to make memory space property to have more
expressive representation, rather then "magic" integer values.

It will allow to have more clean ASM form:

```
gpu.func @test(%arg0: memref<100xf32, "workgroup">)

// instead of

gpu.func @test(%arg0: memref<100xf32, 3>)
```

Explanation for `Attribute` choice instead of plain `string`:

* `Attribute` classes allow to use more type safe API based on RTTI.
* `Attribute` classes provides faster comparison operator based on
  pointer comparison in contrast to generic string comparison.
* `Attribute` allows to store more complex things, like structs or dictionaries.
  It will allows to have more complex memory space hierarchy.

This commit preserve old integer-based API and implements it on top
of the new one.

Depends on D97476

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D96145
2021-03-10 12:57:27 +03:00
River Riddle a776ecb6c2 [mlir][IR] Add an Operation::eraseOperands that supports batch erasure
This method allows for removing multiple disjoint operands at once, reducing the need to erase operands individually (which results in shifting the operand list).

Differential Revision: https://reviews.llvm.org/D98290
2021-03-09 15:07:53 -08:00
River Riddle 4a7aed4ee7 [mlir][IR] Add a new SymbolUserMap class
This class provides efficient implementations of symbol queries related to uses, such as collecting the users of a symbol, replacing all uses, etc. This provides similar benefits to use related queries, as SymbolTableCollection did for lookup queries.

Differential Revision: https://reviews.llvm.org/D98071
2021-03-09 15:07:52 -08:00