Adds vp2intersect to the AVX512 dialect and defines a lowering to the
LLVM dialect.
Author: Matthias Springer <springerm@google.com>
Differential Revision: https://reviews.llvm.org/D95301
[NFC] No new functionality, mostly a cleanup and one more abstraction level between Async and LLVM IR.
Instead of lowering from Async to LLVM coroutines and Async Runtime API in one shot, do it progressively via async.coro and async.runtime operations.
1. Lower from async to async.runtime/coro (e.g. async.execute to function with coro setup and runtime calls)
2. Lower from async.runtime/coro to LLVM intrinsics and runtime API calls
Intermediate coro/runtime operations will allow to run transformations on a higher level IR and do not try to match IR based on the LLVM::CallOp properties.
Although async.coro is very close to LLVM coroutines, it is not exactly the same API, instead it is optimized for usability in async lowering, and misses a lot of details that are present in @llvm.coro intrinsic.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94923
This revision starts evolving the APIs to manipulate ops with offsets, sizes and operands towards a ValueOrAttr abstraction that is already used in folding under the name OpFoldResult.
The objective, in the future, is to allow such manipulations all the way to the level of ODS to avoid all the genuflexions involved in distinguishing between values and attributes for generic constant foldings.
Once this evolution is accepted, the next step will be a mechanical OpFoldResult -> ValueOrAttr.
Differential Revision: https://reviews.llvm.org/D95310
This revision addresses a remaining comment that was overlooked in https://reviews.llvm.org/D95243:
the pad hoisting transformation is made to additionally bail out on side effecting ops other than LoopLikeOps.
This transformation anchors on a padding op whose result is only used as an input
to a Linalg op and pulls it out of a given number of loops.
The result is a packing of padded tailes of ops that is amortized just before
the outermost loop from which the pad operation is hoisted.
Differential revision: https://reviews.llvm.org/D95243
This revision allows the base Linalg tiling pattern to optionally require padding to
a constant bounding shape.
When requested, a simple analysis is performed, similar to buffer promotion.
A temporary `linalg.simple_pad` op is added to model padding for the purpose of
connecting the dots. This will be replaced by a more fleshed out `linalg.pad_tensor`
op when it is available.
In the meantime, this temporary op serves the purpose of exhibiting the necessary
properties required from a more fleshed out pad op, to compose with transformations
properly.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D95149
Fusion of generic/indexed_generic operations with tensor_reshape by
expansion when the latter just adds/removes unit-dimensions is
disabled since it just adds unit-trip count loops.
Differential Revision: https://reviews.llvm.org/D94626
representing dependence from producer result to consumer.
With Linalg on tensors the dependence between operations can be from
the result of the producer to the consumer. This change just does a
NFC refactoring of the LinalgDependenceGraphElem to allow representing
both OpResult and OpOperand*.
Differential Revision: https://reviews.llvm.org/D95208
spv.Ordered/spv.Unordered are meant for OpenCL Kernel capability.
For Vulkan Shader capability, we should use spv.IsNan to check
whether a number is NaN.
Add a new pattern for converting `std.cmpf ord|uno` to spv.IsNan
and bumped the pattern converting to spv.Ordered/spv.Unordered
to a higher benefit. The SPIR-V target environment will properly
select between these two patterns.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D95237
An `unrealized_conversion_cast` operation represents an unrealized conversion
from one set of types to another, that is used to enable the inter-mixing of
different type systems. This operation should not be attributed any special
representational or execution semantics, and is generally only intended to be
used to satisfy the temporary intermixing of type systems during the conversion
of one type system to another.
This operation was discussed in the following RFC(and ODM):
https://llvm.discourse.group/t/open-meeting-1-14-dialect-conversion-and-type-conversion-the-question-of-cast-operations/
Differential Revision: https://reviews.llvm.org/D94832
Use cases with 16- or even 8-bit pointer/index structures have been identified.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D95015
In prehistorical times, AffineApplyOp was allowed to produce multiple values.
This allowed the creation of intricate SSA use-def chains.
AffineApplyNormalizer was originally introduced as a means of reusing the AffineMap::compose method to write SSA use-def chains.
Unfortunately, symbols that were produced by an AffineApplyOp needed to be promoted to dims and reordered for the mathematical composition to be valid.
Since then, single result AffineApplyOp became the law of the land but the original assumptions were not revisited.
This revision revisits these assumptions and retires AffineApplyNormalizer.
Differential Revision: https://reviews.llvm.org/D94920
The operantion is an identity if the values yielded by the operation
is the argument of the basic block of that operation. Add this missing check.
Differential Revision: https://reviews.llvm.org/D94819
This is a very minor improvement during iteration graph construction.
If the first attempt considering the dimension order of all tensors fails,
a second attempt is made using the constraints of sparse tensors only.
Dense tensors prefer dimension order (locality) but provide random access
if needed, enabling the compilation of more sparse kernels.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D94709
With the recent changes to linalg on tensor semantics, the tiling
operations works out-of-the-box for generic operations. Add a test to
verify that and some minor refactoring.
Differential Revision: https://reviews.llvm.org/D93077
Add canonicalization to replace use of the result of a linalg
operation on tensors in a dim operation, to use one of the operands of
the linalg operations instead. This allows the linalg op itself to be
deleted when all its non-dim uses are removed (say through tiling, etc.)
Differential Revision: https://reviews.llvm.org/D93076
linalg.generic/indexed_generic operations on tensors whose body is
just yielding the (non-induction variable) arguments of the operation
can be canonicalized by replacing uses of the result with the
corresponding arguments.
Differential Revision: https://reviews.llvm.org/D94581
The standard and gpu dialect both have `alloc` operations which use the
memory effect `MemAlloc`. In both cases, it is specified on both the
operation itself and on the result. This results in two memory effects
being created for these operations. When `MemAlloc` is defined on an
operation, it represents some background effect which the compiler
cannot reason about, and inhibits the ability of the compiler to
remove dead `std.alloc` operations. This change removes the uneeded
`MemAlloc` effect from these operations and leaves the effect on the
result, which allows dead allocs to be erased.
There is the same problem, but to a lesser extent, with MemFree, MemRead
and MemWrite. Over-specifying these traits is not currently inhibiting
any optimization.
Differential Revision: https://reviews.llvm.org/D94662
In the overwhelmingly common case, enum attribute case strings represent valid identifiers in MLIR syntax. This revision updates the format generator to format as a keyword in these cases, removing the need to wrap values in a string. The parser still retains the ability to parse the string form, but the printer will use the keyword form when applicable.
Differential Revision: https://reviews.llvm.org/D94575
Similar to the parallelization strategies, the vectorization strategies
provide control on what loops should be vectorize. Unlike the parallel
strategies, only innermost loops are considered, but including reductions,
with the control of vectorizing dense loops only or dense and sparse loops.
The vectorized loops are always controlled by a vector mask to avoid
overrunning the iterations, but subsequent vector operation folding removes
redundant masks and replaces the operations with more efficient counterparts.
Similarly, we will rely on subsequent loop optimizations to further optimize
masking, e.g. using an unconditional full vector loop and scalar cleanup loop.
The current strategy already demonstrates a nice interaction between the
sparse compiler and all prior optimizations that went into the vector dialect.
Ongoing discussion at:
https://llvm.discourse.group/t/mlir-support-for-sparse-tensors/2020/10
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D94551
This commit moves dangling ops in the main ops.td file to the proper
file matching their categories. This makes ops.td as purely including
all category files.
Differential Revision: https://reviews.llvm.org/D94413
Continue the convergence between LLVM dialect and built-in types by using the
built-in vector type whenever possible, that is for fixed vectors of built-in
integers and built-in floats. LLVM dialect vector type is still in use for
pointers, less frequent floating point types that do not have a built-in
equivalent, and scalable vectors. However, the top-level `LLVMVectorType` class
has been removed in favor of free functions capable of inspecting both built-in
and LLVM dialect vector types: `LLVM::getVectorElementType`,
`LLVM::getNumVectorElements` and `LLVM::getFixedVectorType`. Additional work is
necessary to design an implemented the extensions to built-in types so as to
remove the `LLVMFixedVectorType` entirely.
Note that the default output format for the built-in vectors does not have
whitespace around the `x` separator, e.g., `vector<4xf32>` as opposed to the
LLVM dialect vector type format that does, e.g., `!llvm.vec<4 x fp128>`. This
required changing the FileCheck patterns in several tests.
Reviewed By: mehdi_amini, silvas
Differential Revision: https://reviews.llvm.org/D94405
This ensures the memref base + indices expression is well-formed
Reviewed By: ThomasRaoux, ftynse
Differential Revision: https://reviews.llvm.org/D94441
When fusing tensor_reshape ops with generic/indexed_Generic op, new
linalg.init_tensor operations were created for the `outs` of the fused
op. While correct (technically) it is better to just reshape the
original `outs` operands and rely on canonicalization of init_tensor
-> tensor_reshape to achieve the same effect.
Differential Revision: https://reviews.llvm.org/D93774
This allow more accurate modeling of the side effects and allow dead code
elimination to remove dead transfer ops.
Differential Revision: https://reviews.llvm.org/D94318
Linalg ops are perfect loop nests. When materializing the concrete
loop nest, the default order specified by the Linalg op's iterators
may not be the best for further CodeGen: targets frequently need
to plan the loop order in order to gain better data access. And
different targets can have different preferences. So there should
exist a way to control the order.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D91795
This commit adds support for (de-)serializing SpecConstantOpeation. One
thing worth noting is that during deserialization, we assign a fake ID to
enclosed ops inside SpecConstantOpeation. We need to do this in order
for deserialization logic to properly update ID to value map and to
later reference the created value from the sibling 'spv::YieldOp'.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D93591
This change makes the scatter/gather syntax more consistent with
the syntax of all the other memory operations in the Vector dialect
(order of types, use of [] for index, etc.). This will make the MLIR
code easier to read. In addition, the pass_thru parameter of the
gather has been made mandatory (there is very little benefit in
using the implicit "undefined" values).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94352
Adding the ability to index the base address brings these operations closer
to the transfer read and write semantics (with lowering advantages), ensures
more consistent use in vector MLIR code (easier to read), and reduces the
amount of code duplication to lower memrefs into base addresses considerably
(making codegen less error-prone).
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D94278
The existing verification of reshape ops in linalg (linalg.reshape and
linalg.tensor_reshape) allows specification of illegal ops, where
- A dynamic dimension is expanded into multiple dynamic
dimensions. This is ill-specified.
- A static dimension is expanded into dynamic dimension or viceversa,
- The product of extents of the static dimensions in the expanded type
doesnt match the static dimension of the collapsed type.
Making all of these illegal. This also implies that some pessimization
in canonicalization due to incomplete semantics of the operation can
be dropped.
Differential Revision: https://reviews.llvm.org/D93724
Continue the convergence between LLVM dialect and built-in types by replacing
the bfloat, half, float and double LLVM dialect types with their built-in
counterparts. At the API level, this is a direct replacement. At the syntax
level, we change the keywords to `bf16`, `f16`, `f32` and `f64`, respectively,
to be compatible with the built-in type syntax. The old keywords can still be
parsed but produce a deprecation warning and will be eventually removed.
Depends On D94178
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94179
to the conversion of LLVM IR dialect. These attributes are used in FIR to
support the lowering of Fortran using target-specific calling conventions.
Add roundtrip tests.
Add changes per review comments/concerns.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94052
The LLVM dialect type system has been closed until now, i.e. did not support
types from other dialects inside containers. While this has had obvious
benefits of deriving from a common base class, it has led to some simple types
being almost identical with the built-in types, namely integer and floating
point types. This in turn has led to a lot of larger-scale complexity: simple
types must still be converted, numerous operations that correspond to LLVM IR
intrinsics are replicated to produce versions operating on either LLVM dialect
or built-in types leading to quasi-duplicate dialects, lowering to the LLVM
dialect is essentially required to be one-shot because of type conversion, etc.
In this light, it is reasonable to trade off some local complexity in the
internal implementation of LLVM dialect types for removing larger-scale system
complexity. Previous commits to the LLVM dialect type system have adapted the
API to support types from other dialects.
Replace LLVMIntegerType with the built-in IntegerType plus additional checks
that such types are signless (these are isolated in a utility function that
replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility
to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice.
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94178
the conversion of LLVM IR dialect. These attributes are used in FIR to
support the lowering of Fortran using target-specific calling
conventions.
Add roundtrip tests. Add changes per review comments/concerns.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94052
Add same hoisting transformation existing for transfer ops on buffers for
transfer_ops on tensor. The logic is significantly different so this is done as
a separate transformation and it is expect that user would know which
transformation to use based on the flow.
Differential Revision: https://reviews.llvm.org/D94115