Commit Graph

9640 Commits

Author SHA1 Message Date
bixia1 9409bbb2e0 [mlir][sparse] Implement insertion sort for the stable sort operator.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D135182
2022-10-06 09:48:39 -07:00
bixia1 330d48c4aa [mlir][sparse] Add rewrite rules for sparse-to-sparse reshape operators.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D135077
2022-10-06 08:50:30 -07:00
Tobias Gysi f47d5dce61 [mlir][llvmir] Simpler error handling in ConvertFromLLVMIR (nfc).
The revision renames some methods of the Importer and changes
the error handling to be closer the ModuleTranslation. In particular,
processValue -> lookupValue and processType -> convertType
now fail if the translation fails (instead of returning an error),
which simplifies the error handling.

The revision prepares a follow up commit that will import
LLVMIR intrinsics using tablegen.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D135349
2022-10-06 17:33:09 +03:00
Matthias Springer 6cdd34b973 [mlir][tensor][bufferize] Bufferize inserts into equivalent tensors in-place
Inserting a tensor into an equivalent tensor is a no-op after bufferization. No alloc is needed.

Differential Revision: https://reviews.llvm.org/D132662
2022-10-06 15:06:33 +09:00
Christopher Bate ea2ed80e6d [mlir][nvgpu] NFC - move NVGPU conversion helpers to NvGpu utils library
The ConvertVectorToGpu pass implementation contained a small private
support library for performing various calculations during conversion
between `vector` and `nvgpu.mma.sync` and `nvgpu.ldmatrix` operations.
The support library is moved under `Dialect/NVGPU/Utils` because the
functions have wider utility. Some documentation comments are added or
improved.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D135303
2022-10-05 20:21:27 -06:00
Peiming Liu 01dffc5ae8 [mlir][sparse] Favors defined dimension when optimize lattice points.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D135337
2022-10-06 01:16:30 +00:00
wren romano 933fefb6a8 [mlir][sparse] Adjusting DimLevelType numeric values for faster predicates
This differential adjusts the numeric values for DimLevelType values: using the low-order two bits for recording the "No" and "Nu" properties, and the high-order bits for the formats per se.  (The choice of encoding may seem a bit peculiar, since the bits are mapped to negative properties rather than positive properties.  But this was done in order to preserve the collation order of DimLevelType values.  If we don't care about collation order, then we may prefer to flip the semantics of the property bits, so that they're less surprising to readers.)

Using distinguished bits for the properties and formats enables faster implementation for the predicates detecting those properties/formats, which matters because this is in the runtime library itself (rather than on the codegen side of things).  This differential pushes through the changes to the enum values, and optimizes the basic predicates.  However it does not optimize all the places where we check compound predicates (e.g., "is compressed or singleton"), to help reduce rebasing conflict with D134933.  Those optimizations will be done after this differential and D134933 are landed.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D135004
2022-10-05 17:40:38 -07:00
Murali Vijayaraghavan 9c3d3eeb51 [mlir] vector.multi_reduction canonicalizes to vector.shape_cast (or
vector.extract, if the result is a scalar) only if all reduction
dimensions are of size 1.

Differential Revision: https://reviews.llvm.org/D135333
2022-10-06 00:11:31 +00:00
Rob Suderman bba48dfe4a [mlir][tosa] tosa.resize canonicalizer for trivial noop
If the scaling factor is by 1 with no offset or border, then the
resize is a no-op.

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D135329
2022-10-05 16:28:25 -07:00
wren romano 1b27484a49 [mlir][sparse] further implement singleton dimension level type
Handle more cases of singleton DLT including direct sparse2sparse conversion.  (Followup to D134096)

Depends On D134926

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D134933
2022-10-05 16:14:52 -07:00
Nathaniel McVicar ff7a2b6055 [mlir][sparse] Case coverage fix no errorhandling
Restores the fix from D134925 for MSVC without breaking cpu runner.

Differential Revision: https://reviews.llvm.org/D135304
2022-10-05 15:35:00 -07:00
Aart Bik 779dcd2ecc [mlir][sparse] move sparse tensor rewriting into its own pass
Makes individual testing and debugging easier.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D135319
2022-10-05 14:52:55 -07:00
Murali Vijayaraghavan 617ca92bf1 Revert "Added canonicalization for vector.multi_reduction"
This reverts commit c16f3260a9.

There's a bug in the commit creates a scalar result with `ShapeCastOp`.
Reverting till that fix is done.
2022-10-05 21:43:51 +00:00
TatWai Chong ff23599a0d [mlir][tosa] Update TOSA resize to match specification
Attribute stride and shift are removed, and has new scale and border.

Signed-off-by: TatWai Chong <tatwai.chong@arm.com>
Change-Id: I6cdbeb3978f5ee540bc6cf59eb7c273eb0131430

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D131629
2022-10-05 13:18:00 -07:00
River Riddle 54cdc03dfa [mlir:Parser] Always splice parsed operations to the end of the parsed block
The current splicing behavior dates back to when all blocks had terminators,
so we would "helpfully" splice before the terminator. This doesn't make sense
anymore, and leads to somewhat unexpected results when parsing multiple
pieces of IR into the same block.

Differential Revision: https://reviews.llvm.org/D135096
2022-10-05 13:11:38 -07:00
Ivan Butygin a93ec06ae6 [mlir][gpu] Introduce `host_shared` flag to `gpu.alloc`
Motivation: we have lowering pipeline based on upstream gpu and spirv dialects and and we are using host shared gpu memory to transfer data between host and device.
Add `host_shared` flag to `gpu.alloc` to distinguish between shared and device-only gpu memory allocations.

Differential Revision: https://reviews.llvm.org/D133533
2022-10-05 22:01:30 +02:00
Jakub Kuderski e99e8ad24d [mlir][arith] Add shli support to WIE
Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D135234
2022-10-05 15:09:58 -04:00
Diego Caballero 2cdd246a39 [mlir][NFC] Make 'printOp' public in AsmPrinter
This patch moves the 'printOp' functionality to the public API of
AsmPrinter and rename it to 'printCustomOrGenericOp'. No 'parseOp'
is needed at this time as existing APIs are able to parse operations
producing results where results are omitted in the textual form
(the LHS of an operation is redundant when it comes to building the
operation itself as it only contains the result names).

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D135006
2022-10-05 19:00:53 +00:00
Murali Vijayaraghavan c16f3260a9 Added canonicalization for vector.multi_reduction
If there are reductions only along unit dimensions, then they are folded

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D134996
2022-10-05 18:43:33 +00:00
Jakub Kuderski 40126e66b6 [mlir][arith] Add andi, ori, and xori support to WIE
Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D135204
2022-10-05 14:34:42 -04:00
Jacques Pienaar 88f07a736b [mlir] Make UnitAttr's default val in unwrapped builder
UnitAttr is optional but unwrapped builders require it. Make Change onstructing
from bool as required for when not set at moment (for UnitAttr nothing needs to
be constructed, this is true for others here too and can be addressed
together).

Differential Revision: https://reviews.llvm.org/D135058
2022-10-05 10:40:58 -07:00
Mahesh Ravishankar a0ef8af8d5 [mlir][Linalg] Expose vectorization precondition check as a utility function.
This patch exposes the method to check if an op can be vectorized or
not for downstream uses. Also adds a check to mark elementwise operations
that have non-vectorizable ops (like `tensor.extract`) as non vectorizable.

Reviewed By: nicolasvasilache, dcaballe, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D135201
2022-10-05 17:24:52 +00:00
Vitaly Buka 0df37528eb Revert "[mlir][sparse] Restore case coverage warning fix"
Breaks https://lab.llvm.org/buildbot/#/builders/168/builds/9288

This reverts commit 83839700c3.
2022-10-05 09:53:58 -07:00
Aart Bik c48e90877f [mlir][sparse] introduce a higher-order tensor mapping
This extension to the sparse tensor type system in MLIR
opens up a whole new set of sparse storage schemes, such as
block sparse storage (e.g. BCSR) and ELL (aka jagged diagonals).

This revision merely introduces the type extension and
initial documentation. The actual interpretation of the type
(reading in tensors, lowering to code, etc.) will follow.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D135206
2022-10-05 09:40:51 -07:00
Alexandre Ganea 083617afc2 [mlir][unittest] Fix crash when building with MSVC 2022
The test Dialect/Affine/ops.mlir was failing when building with
Visual Studio 2022 version 17.3.5. This was caused by a bad MSVC codegen, when
capturing a `constexpr` in a lambda. The bug was reported to Microsoft, see
differential for more information.

Differential revision: https://reviews.llvm.org/D134227
2022-10-05 12:16:54 -04:00
Guray Ozen e68a7bed59 [mlir][transform] Add failing test for GPU transform dialect
The GPU transform dialect currently has restrictions and several situations where we can't use transform dialect.

This update includes a method to test a failing cases in GPU transform dialect.

Differential Revision: https://reviews.llvm.org/D135063
2022-10-05 13:10:13 +02:00
Guray Ozen 78305720f3 [mlir][transform][nfc] typo fix
fix typo

Reviewed By: nicolasvasilache, ftynse

Differential Revision: https://reviews.llvm.org/D135242
2022-10-05 13:05:46 +02:00
Nicolas Vasilache 5fc28ebbaf [mlir][Linalg] NFC - Add bbarg pretty printing to linalg::generic
Differential Revision: https://reviews.llvm.org/D135151
2022-10-05 00:59:42 -07:00
Nicolas Vasilache 05fa8e88f4 [mlir][Linalg] Retire LinalgStrategyLowerVectorsPass and filter-based patterns
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785

Depends on D135200

Differential Revision: https://reviews.llvm.org/D135222
2022-10-05 00:55:27 -07:00
Nicolas Vasilache 27c634aed6 [mlir][Linalg] Retire LinalgStrategyPeelPass and filter-based pattern.
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785

Differential Revision: https://reviews.llvm.org/D135200
2022-10-05 00:50:13 -07:00
Matthias Springer 129420df51 [mlir][bufferization][NFC] Move EmptyTensorToAllocTensorPass
This change moves the pass from the Linalg dialect to the bufferization dialect.

Differential Revision: https://reviews.llvm.org/D135130
2022-10-05 09:57:22 +09:00
Stella Laurenzo e28b15b572 Add APFloat and MLIR type support for fp8 (e5m2).
(Re-Apply with fixes to clang MicrosoftMangle.cpp)

This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.

This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf

As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.

Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:

* `F8M<N>` : For FP8 types that can be conceived of as following the
  same rules as FP16 but with a smaller number of mantissa/exponent
  bits. Including the number of mantissa bits in the type name is enough
  to fully specify the type. This naming scheme is used to represent
  the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
  values.

The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).

Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.

MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.

(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)

Differential Revision: https://reviews.llvm.org/D133823
2022-10-04 17:18:17 -07:00
bixia1 8c02ca1da5 [mlir][sparse] Add an attribute to the sort operator for stable sorting.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D135181
2022-10-04 15:14:03 -07:00
Benjamin Kramer 11d75076aa [sparse] Make GenericOpSparsifier not crash on multi-output dense linalg.generic
The actual transformation doesn't support multi-output GenericOps, but
if we encounter one without sparse annotations we can just leave it
alone.

Differential Revision: https://reviews.llvm.org/D135176
2022-10-04 21:48:18 +02:00
Jakub Kuderski b39b805ad5 [mlir][arith] Mark unknown types legal in WIE
Allow unknown types to pass through without being marked as illegal.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D135123
2022-10-04 15:00:51 -04:00
Christian Sigg 63022c4810 [mlir][gpu] Fix GCC -Wparenthesis warning 2022-10-04 20:58:02 +02:00
Peiming Liu 1ab2bd0aab [mlir][sparse] support singleton in loop emitter.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D135185
2022-10-04 18:42:54 +00:00
Nathaniel McVicar 83839700c3 [mlir][sparse] Restore case coverage warning fix
This restores the fix from D134925 to make MSVC and clang happy.

Reviewed By: stella.stamenova

Differential Revision: https://reviews.llvm.org/D135126
2022-10-04 09:59:20 -07:00
Nicolas Vasilache e3f439ea20 [mlir][Linalg] NFC - Add result and bbArg pretty printing to linalg.reduce
Differential Revision: https://reviews.llvm.org/D135152
2022-10-04 09:27:18 -07:00
Nicolas Vasilache 54a4e9685d [mlir][Tensor] NFC - Add result pretty printing to TensorOps
Differential Revision: https://reviews.llvm.org/D135135
2022-10-04 09:16:51 -07:00
Dominik Adamski 6842d35012 [OpenMP][OMPIRBuilder] Add support for order(concurrent) to OMPIRBuilder for SIMD directive
If 'order(concurrent)' clause is specified, then the iterations of SIMD loop
can be executed concurrently.

This patch adds support for LLVM IR codegen via OMPIRBuilder for SIMD loop
with 'order(concurrent)' clause. The functionality added to OMPIRBuilder is
similar to the functionality implemented in 'CodeGenFunction::EmitOMPSimdInit'.

Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D134046

Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>
2022-10-04 08:30:00 -05:00
Denys Shabalin e3fd612e99 [mlir] Add fully dynamic constructor to StridedLayoutAttr bindings
Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D135139
2022-10-04 13:02:55 +00:00
Uday Bondhugula ddff3766b7 [MLIR] Simplify affine maps + operands exploiting IV info
Simplify affine expressions and maps while exploiting simple range and
step info of any IVs that are operands. This simplification is local,
O(1) and practically useful in several scenarios. Accesses with
floordiv's and mod's where the LHS is non-negative and bounded or is a
known multiple of a constant can often be simplified. This is
implemented as a canonicalization for all affine ops in a generic way:
all affine.load/store, vector_load/store, affine.apply, affine.min/max,
etc. ops.

Eg: For tiled loop nests accessing buffers this way:

affine.for %i = 0 to 1024 step 32 {
  affine.for %ii = 0 to 32 {
    affine.load [(%i + %ii) floordiv 32, (%i + %ii) mod 32]
  }
}

// Note that %i is a multiple of 32 and %ii < 32, hence:

(%i + %ii) floordiv 32 is the same as %i floordiv 32
(%i + %ii) mod 32 is the same as %ii mod 32.

The simplification leads to simpler index/subscript arithmetic for
multi-dimensional arrays and also in turn enables detection of spatial
locality (for vectorization for eg.), temporal locality or loop
invariance for hoisting or scalar replacement.

Differential Revision: https://reviews.llvm.org/D135085
2022-10-04 18:18:34 +05:30
Alex Zinenko 3dfea727a4 [mlir] relax transform dialect multi-handle restriction
Relax the restriction in the transform dialect interpreter utilities
that expected a payload IR op to be assocaited with at most one
transform IR handle value. This was useful during the initial
bootstrapping to avoid use-after-free error equivalents when a payload
IR op could be erased through one of the handles associated with it and
then accessed through another. It was, however, possible to erase an
ancestor of the payload IR operation in question. The expensive-checks
mode of interpretation is able to detect both cases and has proven
sufficiently robust in debugging use-after-free errors.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D134964
2022-10-04 11:57:49 +00:00
Guray Ozen 89bb0cae46 [mlir][transform] Create GPU transform dialect
This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect.

MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows:

`transform.structured.map_nested_foreach_thread_to_gpu_blocks`  -> `transform.gpu.map_foreach_to_blocks`
This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate `gpu_launch`.

`transform.structured.map_nested_foreach_thread_to_gpu_threads` -> `transform.gpu.map_nested_foreach_to_threads`
This op parallelizes nested foreach_thread that are inside `gpu_launch` across GPU threads.

It doesn't add new functionality, but there are some minor refactoring of the code.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D134800
2022-10-04 13:09:08 +02:00
Matthias Springer 81ca5aa452 [mlir][tensor][NFC] Rename linalg.init_tensor to tensor.empty
tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`).

This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect.

RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101

Differential Revision: https://reviews.llvm.org/D135129
2022-10-04 17:25:35 +09:00
Nicolas Vasilache 46869eebdc [mlir][Memref] NFC - Addresult pretty printing to MemrefOps
Differential Revision: https://reviews.llvm.org/D134968
2022-10-04 00:05:16 -07:00
changkaiyan c4cc755c72 [mlir][mlir-translation] patch for standalone-translation command line description missing.
Differential Revision: https://reviews.llvm.org/D134696

	modified:   mlir/examples/standalone/standalone-translate/standalone-translate.cpp
	modified:   mlir/include/mlir/Tools/mlir-translate/Translation.h
	modified:   mlir/lib/Target/Cpp/TranslateRegistration.cpp
	modified:   mlir/lib/Target/LLVMIR/ConvertFromLLVMIR.cpp
	modified:   mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp
	modified:   mlir/lib/Target/SPIRV/TranslateRegistration.cpp
	modified:   mlir/lib/Tools/mlir-translate/Translation.cpp
2022-10-04 09:14:40 +08:00
Jeff Niu d67def8704 [mlir][analysis] Remove empty files (NFC) 2022-10-03 16:52:53 -07:00
Thomas Raoux b9a0eb6106 [mlir][arithmetic] Add tests for IndexCast folding ops and fix assert
Fix assert in IndexCastUI folding and add tests for both IndexCastOp and
IndexCastUIOp folding

Differential Revision: https://reviews.llvm.org/D135098
2022-10-03 20:28:09 +00:00