Commit Graph

6986 Commits

Author SHA1 Message Date
Arpith C. Jacob b4a516cc43 [mlir] Add LLVM loop codegen options to control software pipelining
Support specifying the II and disabling pipelining.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D98420
2021-03-11 16:46:44 +01:00
Tres Popp 25a20b8aa6 [mlir] Correct verifyCompatibleShapes
verifyCompatibleShapes is not transitive. Create an n-ary version and
update SameOperandShapes and SameOperandAndResultShapes traits to use
it.

Differential Revision: https://reviews.llvm.org/D98331
2021-03-11 13:04:10 +01:00
Julian Gross 2aef202981 [mlir] Fix invalid hoisting of dependent allocs in buffer hoisting pass.
Buffer hoisting moves allocs upwards although it has dependency within its
nested region. This patch fixes this issue.

https://bugs.llvm.org/show_bug.cgi?id=49142

Differential Revision: https://reviews.llvm.org/D98248
2021-03-11 11:46:16 +01:00
Christian Sigg bafe418d12 [mlir] Change test-gpu-to-cubin to derive from SerializeToBlobPass
Clean-up after D98279, remove one call to createConvertGPUKernelToBlobPass().

Depends On D98203

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98360
2021-03-11 10:42:20 +01:00
Frederik Gossen b975e3b5aa [MLIR] Add canoncalization for `shape.is_broadcastable`
Canonicalize `is_broadcastable` to constant true if fewer than 2 unique shape
operands. Eliminate redundant operands, otherwise.

Differential Revision: https://reviews.llvm.org/D98361
2021-03-11 10:10:34 +01:00
Christian Sigg 2224221fb3 [mlir] Add NVVM to CUBIN conversion to mlir-opt
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.

The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.

Depends On D98279

Reviewed By: herhut, rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D98203
2021-03-11 10:07:11 +01:00
Matthias Springer c40e0d7609 [mlir][AVX512] Implement sparse vector dot product integration test.
This test operates on two hardware-vector-sized vectors and utilizes vp2intersect and mask.compress.

PHAB_REVIEW=D98099
2021-03-11 13:00:17 +09:00
River Riddle 4e02eb8014 [mlir] Optimize the implementation of RegionDCE
The current implementation has some inefficiencies that become noticeable when running on large modules. This revision optimizes the code, and updates some out-dated idioms with newer utilities. The main components of this optimization include:

* Add an overload of Block::eraseArguments that allows for O(N) erasure of disjoint arguments.
* Don't process entry block arguments given that we don't erase them at this point.
* Don't track individual operation results, given that we don't erase them. We can just track the parent operation.

Differential Revision: https://reviews.llvm.org/D98309
2021-03-10 16:39:50 -08:00
Emilio Cota c0891706bc [mlir] Add polynomial approximation for math::Log2
```
name                     old cpu/op  new cpu/op  delta
BM_mlir_Log2_f32/10       134ns ±15%    45ns ± 4%  -66.39%  (p=0.000 n=20+17)
BM_mlir_Log2_f32/100     1.03µs ±16%  0.12µs ±10%  -88.78%  (p=0.000 n=20+18)
BM_mlir_Log2_f32/1k      10.3µs ±16%   0.7µs ± 5%  -93.24%  (p=0.000 n=20+17)
BM_mlir_Log2_f32/10k      104µs ±15%     7µs ±14%  -93.25%  (p=0.000 n=20+20)
BM_eigen_s_Log2_f32/10   95.3ns ±17%  90.9ns ± 6%     ~     (p=0.228 n=20+18)
BM_eigen_s_Log2_f32/100   907ns ± 3%   911ns ± 6%     ~     (p=0.539 n=16+20)
BM_eigen_s_Log2_f32/1k   9.88µs ± 4%  9.85µs ± 3%     ~     (p=0.790 n=16+17)
BM_eigen_s_Log2_f32/10k   105µs ±10%   110µs ±16%     ~     (p=0.459 n=16+20)
BM_eigen_v_Log2_f32/10   32.5ns ±31%  33.9ns ±14%   +4.31%  (p=0.028 n=17+20)
BM_eigen_v_Log2_f32/100   176ns ± 8%   180ns ± 7%   +2.19%  (p=0.045 n=16+17)
BM_eigen_v_Log2_f32/1k   1.44µs ± 4%  1.50µs ± 9%   +3.91%  (p=0.001 n=16+17)
BM_eigen_v_Log2_f32/10k  14.5µs ±10%  15.0µs ± 8%   +3.92%  (p=0.002 n=16+19)
```

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D98282
2021-03-10 14:49:22 -08:00
Christian Sigg 6a291ed0f0 [mlir] Remove unnecessary copying of pass options
I missed a comment in D98279 that you don't need to copy pass options.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D98366
2021-03-10 21:55:28 +01:00
Weiwei Li 619c1505f9 [mlir][spirv] Define spv.Image Operation
co-authered-by: Alan Liu <alanliu.yf@gmail.com>

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D98270
2021-03-10 15:48:04 -05:00
Alex Zinenko 79da91c59a Revert "[mlir][Vector][Affine] Improve affine vectorizer algorithm"
This reverts commit 95db7b4aea.

This breaks vectorize_2d.mlir and vectorize_3d.mlir test under ASAN (use
after free).
2021-03-10 20:25:49 +01:00
Alex Zinenko ed715536f1 Revert "[mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer."
This reverts commit 77a9d1549f.

Parent commit is broken.
2021-03-10 20:25:32 +01:00
Diego Caballero 77a9d1549f [mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer.
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97892
2021-03-10 20:40:21 +02:00
Diego Caballero 95db7b4aea [mlir][Vector][Affine] Improve affine vectorizer algorithm
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
  * Removed tracking of root and terminal ops. Existing vectorization
    functionality is preserved and extended so that loop nests without
    root-terminal chains can be vectorized.
  * Vectorizing a loop nest now only requires a single topological traversal.
  * A new vector loop nest is incrementally built along the vectorization
    process. The original scalar loop is kept intact. No cloning guard is needed
    to recover the scalar loop if vectorization fails. This approach also
    simplifies the challenging task of replacing a loop operation amid the
    vectorization process without invalidating the analysis information that
    depends on the original loop.
  * Vectorization of specific operations has been implemented as independent,
    preparing them to be moved to a potential vectorization interface.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97442
2021-03-10 20:29:58 +02:00
Vladislav Vinogradov b599f464d4 [mlir][CMAKE] Fix build with BUILD_SHARED_LIBS=ON
Link `MLIRStandardToLLVM` to `MLIRAVX512Transforms`, since
the latter uses `LLVMTypeConverter` defined in the first one.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D98336
2021-03-10 14:52:36 +01:00
Alex Zinenko e02dd790b1 [mlir] fix typo in OpDefinitions.md 2021-03-10 14:44:08 +01:00
Alex Zinenko 78f3fb4f46 [mlir] Update comments in ArmNeon dialect. NFC
These were not updated when squashing LLVMArmNeon and ArmNeon dialects.
2021-03-10 13:35:57 +01:00
Alex Zinenko a776942ba1 [mlir] squash LLVM_AVX512 dialect into AVX512
The dialect separation was introduced to demarkate ops operating in different
type systems. This is no longer the case after the LLVM dialect has migrated to
using built-in vector types, so the original reason for separation is no longer
valid. Squash the two dialects into one.

The code size decrease isn't quite large: the ops originally in LLVM_AVX512 are
preserved because they match LLVM IR intrinsics specialized for vector element
bitwidth. However, it is still conceptually beneficial to have only one
dialect. I originally considered to use Tablegen multiclasses to define both
the type-polymorphic op and its two intrinsic-related instantiations, but
decided against it given both the complexity of the required Tablegen input and
its dissimilarity with the rest of ODS-defined ops, both potentially resulting
in very poor maintainability.

Depends On D98327

Reviewed By: nicolasvasilache, springerm

Differential Revision: https://reviews.llvm.org/D98328
2021-03-10 13:07:26 +01:00
Alex Zinenko 0af53de369 [mlir] simplify type constraints in AVX512 dialect
VectorOfLengthAndType accepts a cartesian product of given lengths and types
rather than types produced by co-indexed values in the corresponding lists.
Update the definitions accordingly. The type validity is already enforced by
op traits.

Reviewed By: nicolasvasilache, springerm

Differential Revision: https://reviews.llvm.org/D98327
2021-03-10 13:07:25 +01:00
Inho Seo 2ce4caf414 Moved getStaticLoopRanges and getStaticShape methods to LinalgInterfaces.td to add static shape verification
It is to use the methods in LinalgInterfaces.cpp for additional static shape verification to match the shaped operands and loop on linalgOps. If I used the existing methods, I would face circular dependency linking issue. Now we can use them as methods of LinalgOp.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D98163
2021-03-10 04:06:22 -08:00
Christian Sigg 4d295cf5b5 [mlir] Add base class for GpuKernelToBlobPass
Instead of configuring kernel-to-cubin/rocdl lowering through callbacks, introduce a base class that target-specific passes can derive from.

Put the base class in GPU/Transforms, according to the discussion in D98203.

The mlir-cuda-runner will go away shortly, and the mlir-rocdl-runner as well at some point. I therefore kept the existing code path working and will remove it in a separate step.

Depends On D98168

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D98279
2021-03-10 12:14:43 +01:00
Vladislav Vinogradov f3bf5c053b [mlir] Model MemRef memory space as Attribute
Based on the following discussion:
https://llvm.discourse.group/t/rfc-memref-memory-shape-as-attribute/2229

The goal of the change is to make memory space property to have more
expressive representation, rather then "magic" integer values.

It will allow to have more clean ASM form:

```
gpu.func @test(%arg0: memref<100xf32, "workgroup">)

// instead of

gpu.func @test(%arg0: memref<100xf32, 3>)
```

Explanation for `Attribute` choice instead of plain `string`:

* `Attribute` classes allow to use more type safe API based on RTTI.
* `Attribute` classes provides faster comparison operator based on
  pointer comparison in contrast to generic string comparison.
* `Attribute` allows to store more complex things, like structs or dictionaries.
  It will allows to have more complex memory space hierarchy.

This commit preserve old integer-based API and implements it on top
of the new one.

Depends on D97476

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D96145
2021-03-10 12:57:27 +03:00
Hanhan Wang d5d4fb635e [mlir][linalg] Add support for using scalar attributes in TC ops.
Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D97876
2021-03-10 01:51:12 -08:00
Mehdi Amini 75f3f77805 Fix MLIR test post 890afad954 2021-03-09 23:30:51 +00:00
Mehdi Amini 890afad954 Fix Flang build after MLIR API changes around `generatedTypeParser` 2021-03-09 23:19:30 +00:00
River Riddle a776ecb6c2 [mlir][IR] Add an Operation::eraseOperands that supports batch erasure
This method allows for removing multiple disjoint operands at once, reducing the need to erase operands individually (which results in shifting the operand list).

Differential Revision: https://reviews.llvm.org/D98290
2021-03-09 15:07:53 -08:00
River Riddle 4a7aed4ee7 [mlir][IR] Add a new SymbolUserMap class
This class provides efficient implementations of symbol queries related to uses, such as collecting the users of a symbol, replacing all uses, etc. This provides similar benefits to use related queries, as SymbolTableCollection did for lookup queries.

Differential Revision: https://reviews.llvm.org/D98071
2021-03-09 15:07:52 -08:00
Mehdi Amini cd9a69289c Fix LLVM Dialect LoopOptionsAttr round-tripping: the keywords were missing in the output
This indicated some missing test coverage, which are now added to the
roundtrip test.
2021-03-09 22:00:22 +00:00
Mehdi Amini fe81e8f3b5 Add default LoopOptionsAttrBuilder constructor and method to check if empty() (NFC)
Also move setters out-of-line to make sure the templated helper is
actually instantiated.
2021-03-09 21:12:15 +00:00
Christian Sigg 840ff84d33 [mlir] Default for gpu-binary-annotation option.
Provide default for gpuBinaryAnnotation so that we don't need to specify it in tests.

The annotation likely only needs to be target specific if we want to lower to e.g. both CUDA and ROCDL.

Reviewed By: herhut, bondhugula

Differential Revision: https://reviews.llvm.org/D98168
2021-03-09 21:01:50 +01:00
Mehdi Amini 79f736c150 Switch generatedTypeParser/generatedAttributeParser to return an OptionalParseResult
This allows the caller to distinguish between a parse error or an
unmatched keyword. It fixes the redundant error that was emitted by the
caller when the generated parser would fail.

Differential Revision: https://reviews.llvm.org/D98162
2021-03-09 19:43:45 +00:00
Mehdi Amini 8205c1a90a Rework LLVM Dialect LoopOptions attribute
Instead of storing an array of LoopOpt attributes, which were just
wrapping std::pair<enum, int> anyway, we can have an attribute storing
a sorted ArrayRef<std::pair<enum, int>> as a single unit. This improves
here the textual format and the general API. Note that we're limiting
the options to fit into an int64_t by design, but this isn't a new
constraint.

Building the LoopOptions attribute is likely worth a specific builder
for efficient reason, that'll be the subject of a future patch.

Differential Revision: https://reviews.llvm.org/D98105
2021-03-09 19:43:45 +00:00
Lei Zhang 50000abe3c [mlir] Use affine.apply when distributing to processors
This makes it easy to compose the distribution computation with
other affine computations.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D98171
2021-03-09 08:37:20 -05:00
Alex Zinenko 8184247f0b [mlir] move LLVM target import header and tests
Move Target/LLVMIR.h to target/LLVMIR/Import.h to better reflect the purpose of
this file. Also move all LLVM IR target tests under the LLVMIR directory.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98178
2021-03-09 09:22:14 +01:00
Alex Zinenko 90fec5ed65 [mlir] make MLIRPresburger depend on MLIRIR
The analysis library uses Location, which is defined in the MLIRIR
library.
2021-03-09 09:19:53 +01:00
Vladislav Vinogradov 2241b3986c [mlir][CMAKE] Fix cross-compilation build
Use `MLIR_LINALG_ODS_GEN` and `MLIR_LINALG_ODS_YAML_GEN` variables
instead of `MLIR_LINALG_ODS_GEN_EXE` and `MLIR_LINALG_ODS_YAML_GEN_EXE`.
The former are defined in PARENT SCOPE only, so the `if` condition
is never evaluates to `TRUE`.

The logic should be the following (taken from tblgen part):

1. `TOOL_NAME` - CACHE variable (default equal to target name).
   User can override it to actual executable path.
2. `TOOL_NAME_EXE` - internal variable, initialized to `${TOOL_NAME}` first.
   In case of cross-compilation (`LLVM_USE_HOST_TOOLS == TRUE`) if user
   didn't set own path to native executable via `TOOL_NAME` variable,
   CMake will create separate targets to build native tool and
   will override `TOOL_NAME_EXE` to the executable produced by this target.
3. `TOOL_NAME_TARGET` - internal variable, which points to tool target name.
   If the native tool is built as described above, it will point to the
   target correspondant to that native tool.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98025
2021-03-09 10:51:56 +03:00
Tobias Gysi c1a4cd551f [mlir][linalg] refactor the result handling during vectorization.
Return the vectorization results using a vector passed by reference instead of returning them embedded in a structure.

Differential Revision: https://reviews.llvm.org/D98182
2021-03-09 07:11:57 +00:00
Stella Laurenzo e31c77b182 [mlir][python] Reorganize MLIR python into namespace packages.
* Only leaf packages are non-namespace packages. This allows most of the top levels to be split into different directories or deployment packages. In the previous state, the presence of __init__.py files at each level meant that the entire tree could only ever exist in one physical directory on the path.
* This changes the API usage slightly: `import mlir` will no longer do a deep import of `mlir.ir`, etc. This may necessitate some client code changes.
* Dialect gen code was restructured so that the user is responsible for providing the `my_dialect.py` file, which then must import its peer `_my_dialect_ops_gen`. This gives complete control of the dialect namespace to the user instead of to tablegen code, allowing further dialect-specific python APIs.
* Correspondingly, the previous extension modules `_my_dialect.py` are now `_my_dialect_ops_ext.py`.
* Now that the `linalg` namespace is open, moved the `linalg_opdsl` tool into it.
* This may require some corresponding downstream adjustments to npcomp, circt, et al:
  * Probably some shallow imports need to be converted to deep imports (i.e. not `import mlir` brings in the world).
  * Each tablegen generated dialect now needs an explicit `foo.py` which does a `from ._foo_ops_gen import *`. This is similar to the way that generated code operates in the C++ world.
  * If providing dialect op extensions, those need to be moved from `_foo.py` -> `_foo_ops_ext.py`.

Differential Revision: https://reviews.llvm.org/D98096
2021-03-08 23:01:34 -08:00
Mehdi Amini 038f2a337d Move LLVM::FMFAttr definition to TableGen (NFC)
This is using the new Attribute storage generation support in
TableGen to define the LLVM FastMathFlags.

Differential Revision: https://reviews.llvm.org/D98007
2021-03-09 05:29:54 +00:00
River Riddle 0d01dfbc37 [mlir][IR][NFC] Move the remaining builtin types to ODS
This will allow for removing the duplicated type documentation from LangRef and instead link to the builtin dialect documentation.

Differential Revision: https://reviews.llvm.org/D98093
2021-03-08 14:32:40 -08:00
River Riddle a4bb667d83 [mlir][IR][NFC] Define the Location classes in ODS instead of C++
This also removes the need for LocationDetail.h.

Differential Revision: https://reviews.llvm.org/D98092
2021-03-08 14:32:40 -08:00
Rob Suderman cb3542e1ca [MLIR][TOSA] Added lowerings for Reduce operations to Linalg
Lowerings for min, max, prod, and sum reduction operations on int and float
values. This includes reduction tests for both cases.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D97893
2021-03-08 10:57:19 -08:00
Christian Sigg 7cdcb4a3b9 [mlir] NFC: Add #endif comment. 2021-03-08 19:25:24 +01:00
Benjamin Kramer 42c195f0ec [mlir][Shape] Allow shape.split_at to return extent tensors and lower it to std.subtensor
split_at can return an error if the split index is out of bounds. If the
user knows that the index can never be out of bounds it's safe to use
extent tensors. This has a straight-forward lowering to std.subtensor.

Differential Revision: https://reviews.llvm.org/D98177
2021-03-08 16:48:05 +01:00
Frederik Gossen 3b9667a84c Clarify documentation for `Elementwise`, `Scalarizable`, `Vectorizable`, and
`Tensorizable` traits.

Differential Revision: https://reviews.llvm.org/D97841
2021-03-08 10:35:22 +01:00
Mehdi Amini e94e55712c Forward the `LLVM_ENABLE_LIBCXX` CMake parameter to the mlir standalone test
This allows to build and test MLIR with `-DLLVM_ENABLE_LIBCXX=ON`.
2021-03-08 05:07:26 +00:00
KareemErgawy-TomTom 3fb384d50e [MLIR][SPIRV] Rename `spv.selection` to `spv.mlir.selection`.
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere. For ops that
don't have a SPIR-V spec counterpart, we use spv.mlir.snake_case.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D98014
2021-03-06 16:05:31 +01:00
Lei Zhang bb6f5c8314 [mlir][spirv] Convert tensor.extract for very small tensors
Normally tensors will be stored in buffers before converting to SPIR-V,
given that is how a large amount of data is sent to the GPU. However,
SPIR-V supports converting from tensors directly too. This is for the
cases where the tensor just contains a small amount of elements and it
makes sense to directly inline them as a small data array in the shader.
To handle this, internally the conversion might create new local
variables. SPIR-V consumers in GPU drivers may or may not optimize that
away. So this has implications over register pressure. Therefore, a
threshold is used to control when the patterns should kick in.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D98052
2021-03-06 08:03:36 -05:00
Mehdi Amini f8fe6d9f3f Use gen-dialect-doc instead of gen-op-doc for the Builtin dialect
This is fixing the missing title and menu entry on the MLIR website.
2021-03-06 05:32:46 +00:00