Commit Graph

6181 Commits

Author SHA1 Message Date
Christopher Bate 53fe155b3f Revert "[mlir][vector] Allow unroll of contraction in arbitrary order"
Reverts commit 1469ebf838 (original commit)
Reverts commit a392a39f75 (build fix for above commit)

The commit broke tests in out-of-tree projects, indicating that some logical
error was made in the previous change but not covered by current tests.
2022-06-07 14:54:01 -06:00
Kiran Chandramohan dd32bf9a77 [Flang,MLIR,OpenMP] Fix a few tests that were not converting to LLVM
A few OpenMP tests were retaining the FIR operands even after running
the LLVM conversion pass. To fix these tests the legality checkes for
OpenMP conversion are made stricter to include operands and results.
The Flush, Single and Sections operations are added to conversions or
legality checks. The RegionLessOpConversion is appropriately renamed
to clarify that it works only for operations with Variable operands.
The operands of the flush operation are changed to match those of
Variable Operands.

Fix for an OpenMP issue mentioned in
https://github.com/llvm/llvm-project/issues/55210.

Reviewed By: shraiysh, peixin, awarzynski

Differential Revision: https://reviews.llvm.org/D127092
2022-06-07 09:55:53 +00:00
Alex Zinenko 3326eddcd1 [mlir] fix documentation format in SCF
Four leading spaces are interpreted as a code block in markdown. Unless
used consistently in ODS op description, they cannot be stripped away by
the tablegen backend, which results in malformed markdown being
generated.
2022-06-07 11:51:24 +02:00
lorenzo chelini 9b3712e0bf [MLIR][LLVMIR] Add round intrinsic
Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D126879
2022-06-07 10:27:55 +02:00
lewuathe 62a34f6a6f [mlir][complex] Add complex.conj op
Add complex.conj op to calculate the complex conjugate which is widely used for the mathematical operation on the complex space.

Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D127181
2022-06-07 09:38:35 +02:00
Georgios Pinitas 3bcaf2eb93 [mlir][tosa] Moves constant folding operations out of the Canonicalizer
Transpose operations on constant data were getting folded during the
canonicalization process. This has compile time cost proportional to
the constant size. Moving this to a separate pass to enable optionality
and flexibility of how such scenarios can be handled.

Reviewed By: rsuderman, jpienaar, stellaraccident

Differential Revision: https://reviews.llvm.org/D124685
2022-06-06 22:10:22 +00:00
Christopher Bate 1469ebf838 [mlir][vector] Allow unroll of contraction in arbitrary order
Adds supprot for vector unroll transformations to unroll in different
orders. For example, the `vector.contract` can be unrolled into a
smaller set of contractions.  There is a choice of how to unroll the
decomposition  based on the traversal order of (dim0, dim1, dim2).
The choice of traversal order can now be specified by a callback which
given by the caller of the transform. For now, only the
`vector.contract`, `vector.transfer_read/transfer_write` operations
support the callback.

Differential Revision: https://reviews.llvm.org/D127004
2022-06-06 14:31:04 -06:00
Christopher Bate cca662b849 [mlir][linalg] add conv_2d_nhwc_fhwc named op
This operation should be supported as a named op because
when the operands are viewed as having canonical layouts
with decreasing strides, then the "reduction" dimensions
of the filter (h, w, and c) are contiguous relative to each
output channel. When lowered to a matrix multiplication,
this layout is the simplest to deal with, and thus future
transforms/vectorizations of `conv2d` may find using this
named op convenient.

Differential Revision: https://reviews.llvm.org/D126995
2022-06-06 13:18:08 -06:00
jacquesguan ad44495ad3 [mlir][NFC] Replace some llvm::find with llvm::is_contained.
This patch replaces some llvm::find with llvm::is_contained, it should be more clear.

Differential Revision: https://reviews.llvm.org/D127077
2022-06-06 03:01:14 +00:00
Christian Sigg 400fef081a Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."
This change rolls bcfc0a9051 forward (i.e., reverting 369ce54bb3) with fixed CMakeLists.txt.
2022-06-05 09:11:43 +02:00
Mehdi Amini 369ce54bb3 Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."
This reverts commit bcfc0a9051.

The build is broken with shared library enabled.
2022-06-04 08:35:45 +00:00
Christian Sigg bcfc0a9051 [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.
This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:

- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor

Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"

// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float rcp;
  asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));

  float result = fa * rcp;
  auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
  if (exponent != 0 && exponent != 0x7f800000) {
    float err = __fmaf_rn(-fb, result, fa);
    result = __fmaf_rn(rcp, err, result);
  }

  return __float2half(result);
}

// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
  return __float2half(__half2float(a) / __half2float(b));
}

// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float result;
  asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
  return __float2half(result);
}

__global__ void CheckCorrectness() {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  half x = reinterpret_cast<const half&>(i);
  for (int j = 0; j < 65536; ++j) {
    half y = reinterpret_cast<const half&>(j);
    half d1 = hdiv_newton(x, y);
    half d2 = hdiv_promote(x, y);
    auto s1 = reinterpret_cast<const short&>(d1);
    auto s2 = reinterpret_cast<const short&>(d2);
    if (s1 != s2) {
      printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
             __half2float(x), i, __half2float(y), j, __half2float(d1), s1,
             __half2float(d2), s2);
      //__trap();
    }
  }
}

__device__ half dst;

__global__ void ProfileBuiltin(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = x / x;
  }
  dst = x;
}

__global__ void ProfilePromote(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_promote(x, x);
  }
  dst = x;
}

__global__ void ProfileNewton(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_newton(x, x);
  }
  dst = x;
}

__global__ void ProfileApprox(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_approx(x, x);
  }
  dst = x;
}

int main() {
  CheckCorrectness<<<256, 256>>>();
  half one = __float2half(1.0f);
  ProfileBuiltin<<<1, 1>>>(one);  // 1.001s
  ProfilePromote<<<1, 1>>>(one);  // 0.560s
  ProfileNewton<<<1, 1>>>(one);   // 0.508s
  ProfileApprox<<<1, 1>>>(one);   // 0.304s
  auto status = cudaDeviceSynchronize();
  printf("%s\n", cudaGetErrorString(status));
}
```

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D126158
2022-06-04 08:03:29 +02:00
wren romano 3cf03f1c56 [mlir][sparse] Adding IsSparseTensorPred and updating ops to use it
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D126994
2022-06-03 17:15:31 -07:00
Diego Caballero 9a79b1b04c [mlir] Add peeling xform to Codegen Strategy
This patch adds the knobs to use peeling in the codegen strategy
infrastructure.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D126842
2022-06-03 21:31:43 +00:00
Krzysztof Drewniak 95aff23e29 Re-land "[mlir] Add integer range inference analysis""
This reverts commit 4e5ce2056e.

This relands commit 1350c9887d.

Reinstates the range analysis with the build issue fixed.

Differential Revision: https://reviews.llvm.org/D126926
2022-06-03 17:13:48 +00:00
Nicolas Vasilache 72de7588cc [mlir][SCF] Add bufferization hook for scf.foreach_thread and terminator.
`scf.foreach_thread` results alias with the underlying `scf.foreach_thread.parallel_insert_slice` destination operands
and they bufferize to equivalent buffers in the absence of other conflicts.
`scf.foreach_thread.parallel_insert_slice` conflict detection is similar to `tensor.insert_slice` conflict detection.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D126769
2022-06-03 07:14:05 +00:00
Thomas Raoux 205c08b54d [mlir][scf] Add option to loop pipelining to not peel the epilogue
Add an option to predicate the epilogue within the kernel instead of
peeling the epilogue. This is a useful option to prevent generating
large amount of code for deep pipeline. This currently require a user
lamdba to implement operation predication.

Differential Revision: https://reviews.llvm.org/D126753
2022-06-03 04:20:20 +00:00
River Riddle ee1cf1f645 [mlir][NFC] Simplify the various `parseSourceFile<T>` overloads
These effectively all share the same implementation, i.e. forward
to the non-templated overload and then construct the container op.
2022-06-02 19:18:55 -07:00
River Riddle bf352e0b2e [mlir:PDLL] Add better support for providing Constraint/Pattern/Rewrite documentation
This commit enables providing long-form documentation more seamlessly to the LSP
by revamping decl documentation. For ODS imported constructs, we now also import
descriptions and attach them to decls when possible. For PDLL constructs, the LSP will
now try to provide documentation by parsing the comments directly above the decls
location within the source file. This commit also adds a new parser flag
`enableDocumentation` that gates the import and attachment of ODS documentation,
which is unnecessary in the normal build process (i.e. it should only be used/consumed
by tools).

Differential Revision: https://reviews.llvm.org/D124881
2022-06-02 16:31:07 -07:00
Arjun P 8bc2cff95a [MLIR][Presburger] Simplex: remove redundant member vars nRow, nCol
Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D126790
2022-06-03 00:30:48 +01:00
Mehdi Amini 4e5ce2056e Revert "[mlir] Add integer range inference analysis"
This reverts commit 1350c9887d.

Shared library build is broken with undefined references.
2022-06-02 21:24:06 +00:00
Krzysztof Drewniak 1350c9887d [mlir] Add integer range inference analysis
This commit defines a dataflow analysis for integer ranges, which
uses a newly-added InferIntRangeInterface to compute the lower and
upper bounds on the results of an operation from the bounds on the
arguments. The range inference is a flow-insensitive dataflow analysis
that can be used to simplify code, such as by statically identifying
bounds checks that cannot fail in order to eliminate them.

The InferIntRangeInterface has one method, inferResultRanges(), which
takes a vector of inferred ranges for each argument to an op
implementing the interface and a callback allowing the implementation
to define the ranges for each result. These ranges are stored as
ConstantIntRanges, which hold the lower and upper bounds for a
value. Bounds are tracked separately for the signed and unsigned
interpretations of a value, which ensures that the impact of
arithmetic overflows is correctly tracked during the analysis.

The commit also adds a -test-int-range-inference pass to test the
analysis until it is integrated into SCCP or otherwise exposed.

Finally, this commit fixes some bugs relating to the handling of
region iteration arguments and terminators in the data flow analysis
framework.

Depends on D124020

Depends on D124021

Reviewed By: rriddle, Mogball

Differential Revision: https://reviews.llvm.org/D124023
2022-06-02 20:24:11 +00:00
Aart Bik bf7dbc2a30 [mlir][sparse][bufferization] fix doc on new init operation
The example was still using the -now- removed sparse_tensor.init_tensor.
Also, I made the input operands of the matrix multiplication sparse too
(since it looks a bit strange to multiply two dense matrices into a sparse).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D126897
2022-06-02 12:04:36 -07:00
Alex Zinenko ce2e198bc2 [mlir] add decompose and generalize to structured transform ops
These ops complement the tiling/padding transformations by transforming
higher-level named structured operations such as depthwise convolutions into
lower-level and/or generic equivalents that are better handled by some
downstream transformations.

Differential Revision: https://reviews.llvm.org/D126698
2022-06-02 15:25:18 +02:00
Nicolas Vasilache 311967701a [mlir][SCF] Add scf.foreach_thread.parallel_insert_slice canonicalization.
Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D126761
2022-06-02 11:53:25 +00:00
jacquesguan 19e285477e [mlir][Arithmetic] Add constant folder for RemF.
This patch adds the constant folder for RemF.

Differential Revision: https://reviews.llvm.org/D126045
2022-06-02 06:24:37 +00:00
Matthias Springer 6232a8f3d6 [mlir][sparse][NFC] Switch InitOp to bufferization::AllocTensorOp
Now that we have an AllocTensorOp (previously InitTensorOp) in the bufferization dialect, the InitOp in the sparse dialect is no longer needed.

Differential Revision: https://reviews.llvm.org/D126180
2022-06-02 00:03:52 +02:00
wren romano b364c76683 [mlir][sparse] Using non-empty function name suffix for OverheadType::kIndex
The trick of using an empty token in the `FOREVERY_O` x-macro relies on preprocessor behavior which is only standard since C99 6.10.3/4 and C++11 N3290 16.3/4 (whereas it was undefined behavior up through C++03 16.3/10).  Since the `ExecutionEngine/SparseTensorUtils.cpp` file is required to be compile-able under C++98 compatibility mode (unlike the C++11 used elsewhere in MLIR), we shouldn't rely on that behavior.

Also, using a non-empty suffix helps improve uniformity of the API, since all other primary/overhead suffixes are also non-empty.  I'm using the suffix `0` since that's the value used by the `SparseTensorEncoding` attribute for indicating the index overhead-type.

Depends On D126720

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D126724
2022-06-01 14:18:42 -07:00
Rob Suderman f3bdb56d61 [mlir][math] Add math.ctlz expansion to control flow + arith operations
Ctlz is an intrinsic in LLVM but does not have equivalent operations in SPIR-V.
Including a decomposition gives an alternative path for these platforms.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D126261
2022-06-01 11:45:04 -07:00
Stella Laurenzo 3bb7999339 [mlir] Add global_load and global_store ops to ml_program.
* Adds simple, non-atomic, non-volatile, non-synchronized direct load/store ops.

Differential Revision: https://reviews.llvm.org/D126230
2022-06-01 11:32:15 -07:00
Arjun P ec145ba2a3 [MLIR][Presburger] Matrix: inline trivial accessors
This resolves a comment from https://reviews.llvm.org/D126708
that was previously missed.
2022-06-01 16:56:46 +01:00
Arjun P d5e31cf38a [MLIR][Presburger] Move Matrix accessors inline
This gives a 1.5x speedup on the Presburger unittests.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D126708
2022-06-01 16:51:42 +01:00
PeixinQiao fe2cc16035 [NFC][MLIR] Fix -Wtype-limits warning
Fix the warning: comparison of unsigned expression in ‘>= 0’ is always
true.

Reviewed By: kiranchandramohan, shraiysh

Differential Revision: https://reviews.llvm.org/D126784
2022-06-01 23:42:07 +08:00
Nicolas Vasilache 59b273a166 [mlir][SCF] Add parallel abstraction on tensors.
This revision adds `scf.foreach_thread` and other supporting abstractions
that allow connecting parallel abstractions and tensors.

Discussion is available [here](https://discourse.llvm.org/t/rfc-parallel-abstraction-for-tensors-and-buffers/62607).

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D126555
2022-06-01 09:16:01 +00:00
Nicolas Vasilache beab8e871e Revert "[mlir][SCF] Add parallel abstraction on tensors."
This reverts commit 9b7193f852.

This is an older branch that was committed by mistake and does not include addressed review comments, an updated version will come next.
2022-06-01 09:04:20 +00:00
Nicolas Vasilache 9b7193f852 [mlir][SCF] Add parallel abstraction on tensors.
This revision adds `scf.foreach_thread` and other supporting abstractions
that allow connecting parallel abstractions and tensors.

Discussion is available [here](https://discourse.llvm.org/t/rfc-parallel-abstraction-for-tensors-and-buffers/62607).
2022-06-01 09:02:16 +00:00
lewuathe 6d75c89783 [mlir][complex] Add tan op for complex dialect
Add tangent operation for complex dialect. This is the follow-up change of https://reviews.llvm.org/D126521

Differential Revision: https://reviews.llvm.org/D126685
2022-06-01 09:20:42 +02:00
wren romano a4c53f8cd6 [mlir][sparse] Factoring out SparseTensorFile class for readSparseTensorShape
The primary goal of this change is to define readSparseTensorShape.  Whereas the SparseTensorFile class is merely introduced as a way to reduce code duplication along the way.

Depends On D126106

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D126233
2022-05-31 13:24:28 -07:00
lorenzo chelini 850dbff708 [MLIR][Math] Improve docs (NFC)
Remove boilerplate examples and add a text at the dialect level to describe
what kind of operands the operations accept (i.e., scalar, tensor or vector).
Left a shorter sentence describing the input operands for each operation as
this redundancy is convenient when browsing the documentation using the
website.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D126648
2022-05-31 18:16:59 +02:00
jacquesguan 42c17073fc [mlir] Support import llvm intrinsics.
This patch supports to convert the llvm intrinsic to the corresponding op. It still leaves some intrinsics to be handled specially.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D126639
2022-05-31 11:08:23 +00:00
River Riddle 1c2edb026e [mlir:PDLL] Rework the C++ generation of native Constraint/Rewrite arguments and results
The current translation uses the old "ugly"/"raw" form which used PDLValue for the arguments
and results. This commit updates the C++ generation to use the recently added sugar that
allows for directly using the desired types for the arguments and result of PDL functions.
In addition, this commit also properly imports the C++ class for ODS operations, constraints,
and interfaces. This allows for a much more convienent C++ API than previously granted
with the raw/low-level types.

Differential Revision: https://reviews.llvm.org/D124817
2022-05-30 17:35:34 -07:00
River Riddle 91b8d96fd1 [mlir:PDLL] Add proper support for operation result type inference
This allows for the results of operations to be inferred in certain contexts,
and matches the support in PDL for result type inference. The main two
initial circumstances are when used as a replacement of another operation,
or when the operation being created implements InferTypeOpInterface.

Differential Revision: https://reviews.llvm.org/D124782
2022-05-30 17:35:33 -07:00
Mehdi Amini 940e290860 Apply clang-tidy fixes for performance-unnecessary-value-param in OneShotModuleBufferize.cpp (NFC) 2022-05-30 18:44:28 +00:00
Alex Zinenko cc6c159203 [mlir] add VectorizeOp to structured transform ops
Vectorization is a key transformation to achieve high performance on most
architectures. In the transform dialect, vectorization is implemented as a
parameterizable transform op. It currently applies to a scope of payload IR
delimited by some isolated-from-above op, mainly because several enabling
transformations (such as affine simplification) are needed to perform
vectorization and these transformation would apply to ops other than the "main"
computational payload op. A separate "navigation" transform op that obtains the
isolated-from-above ancestor of an op is introduced in the core transform
dialect. Even though it is currently only useful for vectorization,
isolated-from-above ops are a common anchor for transformations (usually
implemented as passes) that is likely to be reused in the future.

Depends On D126374

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D126542
2022-05-30 17:37:50 +02:00
Alex Zinenko 5cde5a5739 [mlir] add interchange, pad and scalarize to structured transform dialect
Add ops to the structured transform extension of the transform dialect that
perform interchange, padding and scalarization on structured ops. Along with
tiling that is already defined, this provides a minimal set of transformations
necessary to build vectorizable code for a single structured op.

Define two helper traits: one that implements TransformOpInterface by applying
a function to each payload op independently and another that provides a simple
"functional-style" producer/consumer list of memory effects for the transform
ops.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D126374
2022-05-30 11:42:40 +02:00
Christian Sigg bcf3d52486 [MLIR][GPU] Expose GpuParallelLoopMapping as non-test pass.
Reviewed By: bondhugula, herhut

Differential Revision: https://reviews.llvm.org/D126199
2022-05-30 09:20:48 +02:00
Groverkss dac27da7b9 [MLIR][Presburger] Add applyDomain/Range to IntegerRelation
This patch adds support for applying a relation on domain/range of a relation.

Reviewed By: arjunp, ftynse

Differential Revision: https://reviews.llvm.org/D126339
2022-05-29 02:06:11 +05:30
Matthias Springer 2f0a634c5e [mlir][bufferization] Add extra filter mechanism to bufferizeOp
Differential Revision: https://reviews.llvm.org/D126569
2022-05-28 04:49:23 +02:00
Matthias Springer f470f8cbce [mlir][bufferize][NFC] Split analysis+bufferization of ModuleBufferization
Analysis and bufferization can now be run separately.

Differential Revision: https://reviews.llvm.org/D126572
2022-05-28 04:43:50 +02:00
Matthias Springer 3490aadf56 [mlir][bufferization][NFC] Remove post-analysis step infrastructure
Now that analysis and bufferization are better separated, post-analysis steps are no longer needed. Users can directly interleave analysis and bufferization as needed.

Differential Revision: https://reviews.llvm.org/D126571
2022-05-28 04:37:13 +02:00