Commit Graph

3319 Commits

Author SHA1 Message Date
Alex Zinenko f5c7c031e2 [mlir] Add C API for IntegerSet
Depends On D95357

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D95368
2021-01-25 20:16:22 +01:00
Diego Caballero c8fc5c0385 [mlir][Affine] Add support for multi-store producer fusion
This patch adds support for producer-consumer fusion scenarios with
multiple producer stores to the AffineLoopFusion pass. The patch
introduces some changes to the producer-consumer algorithm, including:

* For a given consumer loop, producer-consumer fusion iterates over its
producer candidates until a fixed point is reached.

* Producer candidates are gathered beforehand for each iteration of the
consumer loop and visited in reverse program order (not strictly guaranteed)
to maximize the number of loops fused per iteration.

In general, these changes were needed to simplify the multi-store producer
support and remove some of the workarounds that were introduced in the past
to support more fusion cases under the single-store producer limitation.

This patch also preserves the existing functionality of AffineLoopFusion with
one minor change in behavior. Producer-consumer fusion didn't fuse scenarios
with escaping memrefs and multiple outgoing edges (from a single store).
Multi-store producer scenarios will usually (always?) have multiple outgoing
edges so we couldn't fuse any with escaping memrefs, which would greatly limit
the applicability of this new feature. Therefore, the patch enables fusion for
these scenarios. Please, see modified tests for specific details.

Reviewed By: andydavis1, bondhugula

Differential Revision: https://reviews.llvm.org/D92876
2021-01-25 20:31:17 +02:00
Nicolas Vasilache 05d5125d8a [mlir] Generalize OpFoldResult usage in ops with offsets, sizes and operands.
This revision starts evolving the APIs to manipulate ops with offsets, sizes and operands towards a ValueOrAttr abstraction that is already used in folding under the name OpFoldResult.

The objective, in the future, is to allow such manipulations all the way to the level of ODS to avoid all the genuflexions involved in distinguishing between values and attributes for generic constant foldings.

Once this evolution is accepted, the next step will be a mechanical OpFoldResult -> ValueOrAttr.

Differential Revision: https://reviews.llvm.org/D95310
2021-01-25 14:17:03 +00:00
Nicolas Vasilache 68eee55ce6 [mlir][Linalg] Address missed review item
This revision addresses a remaining comment that was overlooked in https://reviews.llvm.org/D95243:
the pad hoisting transformation is made to additionally bail out on side effecting ops other than LoopLikeOps.
2021-01-25 13:47:44 +00:00
Nicolas Vasilache dbf9bedf40 [mlir][Linalg] Add a hoistPaddingOnTensors transformation
This transformation anchors on a padding op whose result is only used as an input
to a Linalg op and pulls it out of a given number of loops.
The result is a packing of padded tailes of ops that is amortized just before
the outermost loop from which the pad operation is hoisted.

Differential revision: https://reviews.llvm.org/D95243
2021-01-25 12:41:18 +00:00
Nicolas Vasilache 3747eb9c85 [mlir][Linalg] Add a padding option to Linalg tiling
This revision allows the base Linalg tiling pattern to optionally require padding to
a constant bounding shape.
When requested, a simple analysis is performed, similar to buffer promotion.
A temporary `linalg.simple_pad` op is added to model padding for the purpose of
connecting the dots. This will be replaced by a more fleshed out `linalg.pad_tensor`
op when it is available.
In the meantime, this temporary op serves the purpose of exhibiting the necessary
properties required from a more fleshed out pad op, to compose with transformations
properly.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D95149
2021-01-25 09:17:30 +00:00
Stella Laurenzo fd226c9b02 [mlir][Python] Roll up of python API fixes.
* As discussed, fixes the ordering or (operands, results) -> (results, operands) in various `create` like methods.
* Fixes a syntax error in an ODS accessor method.
* Removes the linalg example in favor of a test case that exercises the same.
* Fixes FuncOp visibility to properly use None instead of the empty string and defaults it to None.
* Implements what was documented for requiring that trailing __init__ args `loc` and `ip` are keyword only.
* Adds a check to `InsertionPoint.insert` so that if attempting to insert past the terminator, an exception is raised telling you what to do instead. Previously, this would crash downstream (i.e. when trying to print the resultant module).
* Renames `_ods_build_default` -> `build_generic` and documents it.
* Removes `result` from the list of prohibited words and for single-result ops, defaults to naming the result `result`, thereby matching expectations and what is already implemented on the base class.
* This was intended to be a relatively small set of changes to be inlined with the broader support for ODS generating the most specific builder, but it spidered out once actually testing various combinations, so rolling up separately.

Differential Revision: https://reviews.llvm.org/D95320
2021-01-24 19:02:59 -08:00
Stella Laurenzo 52586c46b0 [mlir][CAPI] Add result type inference to the CAPI.
* Adds a flag to MlirOperationState to enable result type inference using the InferTypeOpInterface.
* I chose this level of implementation for a couple of reasons:
  a) In the creation flow is naturally where generated and custom builder code will be invoking such a thing
  b) it is a bit more efficient to share the data structure and unpacking vs having a standalone entry-point
  c) we can always decide to expose more of these interfaces with first-class APIs, but that doesn't preclude that we will always want to use this one in this way (and less API surface area for common things is better for API stability and evolution).
* I struggled to find an appropriate way to test it since we don't link the test dialect into anything CAPI accessible at present. I opted instead for one of the simplest ops I found in a regular dialect which implements the interface.
* This does not do any trait-based type selection. That will be left to generated tablegen wrappers.

Differential Revision: https://reviews.llvm.org/D95283
2021-01-23 14:30:51 -08:00
MaheshRavishankar 6e8ef3b76a [mlir][Linalg] Make Fill operation work on tensors.
Depends on D95109
2021-01-22 14:39:27 -08:00
MaheshRavishankar 430d43e010 [mlir][Linalg] Disable fusion of tensor_reshape op by expansion when unit-dims are involved
Fusion of generic/indexed_generic operations with tensor_reshape by
expansion when the latter just adds/removes unit-dimensions is
disabled since it just adds unit-trip count loops.

Differential Revision: https://reviews.llvm.org/D94626
2021-01-22 12:55:25 -08:00
River Riddle 29d420e0bf [mlir][OpFormatGen] Add support for anchoring optional groups with types
This revision adds support for using either operand or result types to anchor an optional group. It also removes the arbitrary restriction that type directives must refer to variables in the same group, which is overly limiting for a declarative format syntax.

Fixes PR#48784

Differential Revision: https://reviews.llvm.org/D95109
2021-01-22 12:07:27 -08:00
MaheshRavishankar 01defcc8d7 [mlir][Linalg] Extend tile+fuse to work on Linalg operation on tensors.
Differential Revision: https://reviews.llvm.org/D93086
2021-01-22 11:33:35 -08:00
MaheshRavishankar bce318f58d [mlir][Linalg] NFC: Refactor LinalgDependenceGraphElem to allow
representing dependence from producer result to consumer.

With Linalg on tensors the dependence between operations can be from
the result of the producer to the consumer. This change just does a
NFC refactoring of the LinalgDependenceGraphElem to allow representing
both OpResult and OpOperand*.

Differential Revision: https://reviews.llvm.org/D95208
2021-01-22 11:19:59 -08:00
Lei Zhang e27197f360 [mlir][spirv] Define spv.IsNan/spv.IsInf and add lowerings
spv.Ordered/spv.Unordered are meant for OpenCL Kernel capability.
For Vulkan Shader capability, we should use spv.IsNan to check
whether a number is NaN.

Add a new pattern for converting `std.cmpf ord|uno` to spv.IsNan
and bumped the pattern converting to spv.Ordered/spv.Unordered
to a higher benefit. The SPIR-V target environment will properly
select between these two patterns.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D95237
2021-01-22 13:09:33 -05:00
Lei Zhang 167fb9b4b4 [mlir][spirv] Fix script for availability autogen and refresh ops
Previously we only autogen the availability for ops that are
direct instantiating `SPV_Op` and expected other subclasses of
`SPV_Op` to define aggregated availability for all ops. This is
quite error prone and we can miss capabilities for certain ops.
Also it's arguable to have multiple levels of subclasses and try
to deduplicate too much: having the availability directly in the
op can be quite explicit and clear. A few extra lines of
declarative code is fine.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D95236
2021-01-22 13:07:36 -05:00
Eugene Zhulenev cc77a2c768 [mlir] Add coro intrinsics operations to LLVM dialect
This PR only has coro intrinsics needed for the Async to LLVM lowering. Will add other intrinsics as needed in the followup PRs.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D95143
2021-01-22 10:01:45 -08:00
Hanhan Wang 2cb130f766 [mlir][StandardToSPIRV] Add support for lowering uitofp to SPIR-V
- Extend spirv::ConstantOp::getZero/One to handle float, vector of int, and vector of float.
- Refactor ZeroExtendI1Pattern to use getZero/One methods.
- Add one more test for lowering std.zexti which extends vector<4xi1> to vector<4xi64>.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D95120
2021-01-21 22:20:32 -08:00
Hanhan Wang 16d4bbef30 [mlir][Linalg] Introduce linalg.pad_tensor op.
`linalg.pad_tensor` is an operation that pads the `source` tensor
with given `low` and `high` padding config.

Example 1:

```mlir
  %pad_value = ... : f32
  %1 = linalg.pad_tensor %0 low[1, 2] high[2, 3] {
  ^bb0(%arg0 : index, %arg1 : index):
    linalg.yield %pad_value : f32
  } : tensor<?x?xf32> to tensor<?x?xf32>
```

Example 2:
```mlir
  %pad_value = ... : f32
  %1 = linalg.pad_tensor %arg0 low[2, %arg1, 3, 3] high[3, 3, %arg1, 2] {
  ^bb0(%arg2: index, %arg3: index, %arg4: index, %arg5: index):
    linalg.yield %pad_value : f32
  } : tensor<1x2x2x?xf32> to tensor<6x?x?x?xf32>
```

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D93704
2021-01-21 22:09:28 -08:00
Mehdi Amini 922b26cde4 Add Python bindings for the builtin dialect
This includes some minor customization for FuncOp and ModuleOp.

Differential Revision: https://reviews.llvm.org/D95022
2021-01-21 22:44:44 +00:00
Christian Sigg bd3a387ee7 Revert [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers (cf50f4f764)
There are cmake failures that I do not know how to fix.

Differential Revision: https://reviews.llvm.org/D95162
2021-01-21 22:38:59 +01:00
MaheshRavishankar 615167c9f7 [mlir]][SPIRV] Define OrderedOp and UnorderedOp and add lowerings from Standard.
Define OrderedOp and UnorderedOp instructions in SPIR-V and convert
cmpf operations with `ord` and `uno` tag to these instructions
respectively.

Differential Revision: https://reviews.llvm.org/D95098
2021-01-21 07:56:44 -08:00
Frederik Gossen 4ef38f9c12 Add log1p lowering from standard to ROCDL intrinsics
Differential Revision: https://reviews.llvm.org/D95129
2021-01-21 14:02:48 +01:00
Frederik Gossen 294e2544c9 Add log1p lowering from standard to NVVM intrinsics
Differential Revision: https://reviews.llvm.org/D95130
2021-01-21 14:00:38 +01:00
Alexander Belyaev fc58bfd02f [mlir] Remove complex ops from Standard dialect.
`complex` dialect should be used instead.
https://llvm.discourse.group/t/rfc-split-the-complex-dialect-from-std/2496/2

Differential Revision: https://reviews.llvm.org/D95077
2021-01-21 10:34:26 +01:00
River Riddle c78219f644 [mlir] Add a new builtin `unrealized_conversion_cast` operation
An `unrealized_conversion_cast` operation represents an unrealized conversion
from one set of types to another, that is used to enable the inter-mixing of
different type systems. This operation should not be attributed any special
representational or execution semantics, and is generally only intended to be
used to satisfy the temporary intermixing of type systems during the conversion
of one type system to another.

This operation was discussed in the following RFC(and ODM):

https://llvm.discourse.group/t/open-meeting-1-14-dialect-conversion-and-type-conversion-the-question-of-cast-operations/

Differential Revision: https://reviews.llvm.org/D94832
2021-01-20 16:28:18 -08:00
Diego Caballero 735a07f047 Revert "[mlir][Affine] Add support for multi-store producer fusion"
This reverts commit 7dd198852b.

ASAN issue.
2021-01-21 00:37:23 +02:00
Nicolas Vasilache 866cb26039 [mlir] Fix SubTensorInsertOp semantics
Like SubView, SubTensor/SubTensorInsertOp are allowed to have rank-reducing/expanding semantics. In the case of SubTensorInsertOp , the rank of offsets/sizes/strides should be the rank of the destination tensor.

Also, add a builder flavor for SubTensorOp to return a rank-reduced tensor.

Differential Revision: https://reviews.llvm.org/D95076
2021-01-20 20:16:01 +00:00
Frederik Gossen cc4244d55f [MLIR][Standard] Add log1p operation to std
Differential Revision: https://reviews.llvm.org/D95041
2021-01-20 18:56:20 +01:00
Diego Caballero 7dd198852b [mlir][Affine] Add support for multi-store producer fusion
This patch adds support for producer-consumer fusion scenarios with
multiple producer stores to the AffineLoopFusion pass. The patch
introduces some changes to the producer-consumer algorithm, including:

* For a given consumer loop, producer-consumer fusion iterates over its
producer candidates until a fixed point is reached.

* Producer candidates are gathered beforehand for each iteration of the
consumer loop and visited in reverse program order (not strictly guaranteed)
to maximize the number of loops fused per iteration.

In general, these changes were needed to simplify the multi-store producer
support and remove some of the workarounds that were introduced in the past
to support more fusion cases under the single-store producer limitation.

This patch also preserves the existing functionality of AffineLoopFusion with
one minor change in behavior. Producer-consumer fusion didn't fuse scenarios
with escaping memrefs and multiple outgoing edges (from a single store).
Multi-store producer scenarios will usually (always?) have multiple outgoing
edges so we couldn't fuse any with escaping memrefs, which would greatly limit
the applicability of this new feature. Therefore, the patch enables fusion for
these scenarios. Please, see modified tests for specific details.

Reviewed By: andydavis1, bondhugula

Differential Revision: https://reviews.llvm.org/D92876
2021-01-20 19:03:07 +02:00
Christian Sigg cba1ca9025 Fix cuda-runner tests. 2021-01-20 13:14:27 +01:00
Julian Gross 43f34f5834 Added check if there are regions that do not implement the RegionBranchOpInterface.
Add a check if regions do not implement the RegionBranchOpInterface. This is not
allowed in the current deallocation steps. Furthermore, we handle edge-cases,
where a single region is attached and the parent operation has no results.

This fixes: https://bugs.llvm.org/show_bug.cgi?id=48575

Differential Revision: https://reviews.llvm.org/D94586
2021-01-20 12:15:28 +01:00
Christian Sigg cf50f4f764 [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers.
The runtime-wrappers depend on LLVMSupport, pulling in static initialization code (e.g. command line arguments). Dynamically loading multiple such libraries results in ODR violoations.

So far this has not been an issue, but in D94421, I would like to load both the async-runtime and the cuda-runtime-wrappers as part of a cuda-runner integration test. When doing this, code that asserts that an option category is only registered once fails (note that I've only experienced this in Google's bazel where the async-runtime depends on LLVMSupport, but a similar issue would happen in cmake if more than one runtime-wrapper starts to depend on LLVMSupport).

The underlying issue is that we have a mix of static and dynamic linking. If all dependencies were loaded as shared objects (i.e. if LLVMSupport was linked dynamically to the runtime wrappers), each dependency would only get loaded once. However, linking dependencies dynamically would require special attention to paths (one could dynamically load the dependencies first given explicit paths). The simpler approach seems to be to link all dependencies statically into a single shared object.

This change basically applies the same logic that we have in the c_runner_utils: we have a shared object target that can be loaded dynamically, and we have a static library target that can be linked to other runtime-wrapper shared object targets.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D94399
2021-01-20 12:10:16 +01:00
Aart Bik b5c542d64b [mlir][sparse] add narrower choices for pointers/indices
Use cases with 16- or even 8-bit pointer/index structures have been identified.

Reviewed By: penpornk

Differential Revision: https://reviews.llvm.org/D95015
2021-01-19 20:20:38 -08:00
Stella Laurenzo b62c7e0474 [mlir][python] Swap shape and element_type order for MemRefType.
* Matches how all of the other shaped types are declared.
* No super principled reason fro this ordering beyond that it makes the one that was different be like the rest.
* Also matches ordering of things like ndarray, et al.

Reviewed By: ftynse, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D94812
2021-01-19 16:03:19 -08:00
Eric Christopher 22eb1cf89f Remove unused functions. 2021-01-19 14:44:34 -08:00
Sean Silva be7352c00d [mlir][splitting std] move 2 more ops to `tensor`
- DynamicTensorFromElementsOp
- TensorFromElements

Differential Revision: https://reviews.llvm.org/D94994
2021-01-19 13:49:25 -08:00
Stella Laurenzo 71b6b010e6 [mlir][python] Factor out standalone OpView._ods_build_default class method.
* This allows us to hoist trait level information for regions and sized-variadic to class level attributes (_ODS_REGIONS, _ODS_OPERAND_SEGMENTS, _ODS_RESULT_SEGMENTS).
* Eliminates some splicey python generated code in favor of a native helper for it.
* Makes it possible to implement custom, variadic and region based builders with one line of python, without needing to manually code access to the segment attributes.
* Needs follow-on work for region based callbacks and support for SingleBlockImplicitTerminator.
* A follow-up will actually add ODS support for generating custom Python builders that delegate to this new method.
* Also includes the start of an e2e sample for constructing linalg ops where this limitation was discovered (working progressively through this example and cleaning up as I go).

Differential Revision: https://reviews.llvm.org/D94738
2021-01-19 09:29:57 -08:00
Lei Zhang 3a56a96664 [mlir][spirv] Define spv.GLSL.Fma and add lowerings
Also changes some rewriter.create + rewriter.replaceOp calls
into rewriter.replaceOpWithNewOp calls.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D94965
2021-01-19 09:14:21 -05:00
Nicolas Vasilache 93a873dfc9 [mlir][Affine] Revisit and simplify composeAffineMapAndOperands.
In prehistorical times, AffineApplyOp was allowed to produce multiple values.
This allowed the creation of intricate SSA use-def chains.
AffineApplyNormalizer was originally introduced as a means of reusing the AffineMap::compose method to write SSA use-def chains.
Unfortunately, symbols that were produced by an AffineApplyOp needed to be promoted to dims and reordered for the mathematical composition to be valid.

Since then, single result AffineApplyOp became the law of the land but the original assumptions were not revisited.

This revision revisits these assumptions and retires AffineApplyNormalizer.

Differential Revision: https://reviews.llvm.org/D94920
2021-01-19 13:52:07 +00:00
Alexander Belyaev 11f4c58c15 [mlir] Add `complex.abs`, `complex.div` and `complex.mul` to ComplexOps.
Differential Revision: https://reviews.llvm.org/D94911
2021-01-19 12:09:59 +01:00
River Riddle 2a27a9819a [mlir][AsmPrinter] Properly escape strings when printing locations
This fixes errors when location strings contains newlines, or other non-ascii characters.

Differential Revision: https://reviews.llvm.org/D94847
2021-01-15 17:14:57 -08:00
MaheshRavishankar d7bc3b7ce2 [mlir][Linalg] Add missing check to canonicalization of GenericOp that are identity ops.
The operantion is an identity if the values yielded by the operation
is the argument of the basic block of that operation. Add this missing check.

Differential Revision: https://reviews.llvm.org/D94819
2021-01-15 13:55:35 -08:00
Alexander Belyaev d0cb0d30a4 [mlir] Add Complex dialect.
Differential Revision: https://reviews.llvm.org/D94764
2021-01-15 19:58:10 +01:00
Valentin Clement cf0173de69 [mlir] Add better support for f80 and f128
Add builtin f80 and f128 following @schweitz proposition
https://llvm.discourse.group/t/rfc-adding-better-support-for-higher-precision-floating-point/2526/5

Reviewed By: ftynse, rriddle

Differential Revision: https://reviews.llvm.org/D94737
2021-01-15 10:29:48 -05:00
Lei Zhang 0acc260b57 [mlir][linalg] Support generating builders for named op attributes
This commit adds support to generate an additional builder for
each named op that has attributes. This gives better experience
when creating the named ops.

Along the way adds support for i64.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D94733
2021-01-15 09:00:30 -05:00
Aart Bik 5508516b06 [mlir][sparse] retry sparse-only for cyclic iteration graphs
This is a very minor improvement during iteration graph construction.
If the first attempt considering the dimension order of all tensors fails,
a second attempt is made using the constraints of sparse tensors only.
Dense tensors prefer dimension order (locality) but provide random access
if needed, enabling the compilation of more sparse kernels.

Reviewed By: penpornk

Differential Revision: https://reviews.llvm.org/D94709
2021-01-14 22:39:29 -08:00
MaheshRavishankar 42444d0cf0 [mlir][Linalg] NFC: Verify tiling on linalg.generic operation on tensors.
With the recent changes to linalg on tensor semantics, the tiling
operations works out-of-the-box for generic operations. Add a test to
verify that and some minor refactoring.

Differential Revision: https://reviews.llvm.org/D93077
2021-01-14 16:17:08 -08:00
MaheshRavishankar 774c9c6ef3 [mlir][Linalg] Add canonicalization of linalg op -> dim op.
Add canonicalization to replace use of the result of a linalg
operation on tensors in a dim operation, to use one of the operands of
the linalg operations instead. This allows the linalg op itself to be
deleted when all its non-dim uses are removed (say through tiling, etc.)

Differential Revision: https://reviews.llvm.org/D93076
2021-01-14 16:17:08 -08:00
MaheshRavishankar 722ae10907 [mlir][Linalg] Add canonicalization to remove no-op linalg operations.
linalg.generic/indexed_generic operations on tensors whose body is
just yielding the (non-induction variable) arguments of the operation
can be canonicalized by replacing uses of the result with the
corresponding arguments.

Differential Revision: https://reviews.llvm.org/D94581
2021-01-14 14:59:24 -08:00
Andrew Young a55a0a3056
[mlir] Remove over specified memory effects
The standard and gpu dialect both have `alloc` operations which use the
memory effect `MemAlloc`.  In both cases, it is specified on both  the
operation itself and on the result.  This results in two memory effects
being created for these operations.  When `MemAlloc` is defined on an
operation, it represents some background effect which the compiler
cannot reason about, and  inhibits the ability of the compiler to
remove dead `std.alloc` operations.  This change removes the uneeded
`MemAlloc` effect from these operations and leaves the effect on the
result, which allows dead allocs to be erased.

There is the same problem, but to a lesser extent, with MemFree, MemRead
and MemWrite. Over-specifying these traits is not currently inhibiting
any optimization.

Differential Revision: https://reviews.llvm.org/D94662
2021-01-14 14:49:41 -08:00