vector.transfer_read and vector.transfer_write operations are converted
to llvm intrinsics with specific alignment information, however there
doesn't seem to be a way in llvm to take information from llvm.assume
intrinsics and change this alignment information. In any
event, due the to the structure of the llvm.assume instrinsic, applying
this information at the llvm level is more cumbersome. Instead, let's
generate the masked vector load and store instrinsic with the right
alignment information from MLIR in the first place. Since
we're bothering to do this, lets just emit the proper alignment for
loads, stores, scatter, and gather ops too.
Differential Revision: https://reviews.llvm.org/D100444
Initial version of pooling assumed normalization was accross all elements
equally. TOSA actually requires the noramalization is perform by how
many elements were summed (edges are not artifically dimmer). Updated
the lowering to reflect this change with corresponding tests.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D102540
Lowering div elementwise op to the linalg dialect. Since tosa only supports integer division, that is the only version that is currently implemented.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D102430
Create a copy of vector-to-loops.mlir and adapt the test for
ProgressiveVectorToSCF. Fix a small bug in getExtractOp() triggered by
this test.
Differential Revision: https://reviews.llvm.org/D102388
Rounding to integers requires rounding (for floating points) and clipping
to the min/max values of the destination range. Added this behavior and
updated tests appropriately.
Reviewed By: sjarus, silvas
Differential Revision: https://reviews.llvm.org/D102375
Instead of an SCF for loop, these pattern generate fully unrolled loops with no temporary buffer allocations.
Differential Revision: https://reviews.llvm.org/D101981
Add a conversion pass to convert higher-level type before translation.
This conversion extract meangingful information and pack it into a struct that
the translation (D101504) will be able to understand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D102170
Updated tests to include broadcast of left and right. Includes
bypass if in-type and out-type match shape (no broadcasting).
Differential Revision: https://reviews.llvm.org/D102276
In the buffer deallocation pass, unranked memref types are not properly supported.
After investigating this issue, it turns out that the Clone and Dealloc operation
does not support unranked memref types in the current implementation.
This patch adds the missing feature and enables the transformation of any memref
type.
This patch solves this bug: https://bugs.llvm.org/show_bug.cgi?id=48385
Differential Revision: https://reviews.llvm.org/D101760
Implements support for undialated depthwise convolution using the existing
depthwise convolution operation. Once convolutions migrate to yaml defined
versions we can rewrite for cleaner implementation.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D101579
All linalg.init operations must be fed into a linalg operation before
subtensor. The inserted linalg.fill guarantees it executes correctly.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D101848
Lowerings equal and arithmetic_right_shift for elementwise ops to linalg dialect using linalg.generic
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D101804
Constant-0 dim expr values should be avoided for linalg as it can prevent
fusion. This includes adding support for rank-0 reshapes.
Differential Revision: https://reviews.llvm.org/D101418
This enables to express more complex parallel loops in the affine framework,
for example, in cases of tiling by sizes not dividing loop trip counts perfectly
or inner wavefront parallelism, among others. One can't use affine.max/min
and supply values to the nested loop bounds since the results of such
affine.max/min operations aren't valid symbols. Making them valid symbols
isn't an option since they would introduce selection trees into memref
subscript arithmetic as an unintended and undesired consequence. Also
add support for converting such loops to SCF. Drop some API that isn't used in
the core repo from AffineParallelOp since its semantics becomes ambiguous in
presence of max/min bounds. Loop normalization is currently unavailable for
such loops.
Depends On D101171
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D101172
MatMul and FullyConnected have transposed dimensions for the weights.
Also, removed uneeded tensor reshape for bias.
Differential Revision: https://reviews.llvm.org/D101220
Quantized negation can be performed using higher bits operations.
Minimal bits are picked to perform the operation.
Differential Revision: https://reviews.llvm.org/D101225
Includes tests and implementation for both integer and floating point values.
Both nearest neighbor and bilinear interpolation is included.
Differential Revision: https://reviews.llvm.org/D101009
std.xor ops on bool are lowered to spv.LogicalNotEqual. For Boolean values, xor
and not-equal are the same thing.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D100817
Some Math operations do not have an equivalent in LLVM. In these cases,
allow a low priority fallback of calling the libm functions. This is to
give functionality and is not a performant option.
Differential Revision: https://reviews.llvm.org/D100367
ArmSVE dialect is behind the recent changes in how the Vector dialect
interacts with backend vector dialects and the MLIR -> LLVM IR
translation module. This patch cleans up ArmSVE initialization within
Vector and removes the need for an LLVMArmSVE dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D100171
Handles lowering conv2d to linalg's convolution operator. This implementation
only supports floating point values but handles all strides, dilations, and
padding values.
Differential Revision: https://reviews.llvm.org/D100061
Per the SPIR-V spec "2.16.2. Validation Rules for Shader Capabilities":
Composite objects in the StorageBuffer, PhysicalStorageBuffer,
Uniform, and PushConstant Storage Classes must be explicitly
laid out.
For other cases we don't need to attach the struct offsets.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D100386
The stride should be calculated with the converted array element
type, not the original input type.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D100337
These patterns have been used as a prerequisite step for lowering
to SPIR-V. But they don't involve SPIR-V dialect ops; they are
pure memref/vector op transformations. Given now we have a dedicated
MemRef dialect, moving them to Memref/Transforms/, which is a more
suitable place to host them, to allow used by others.
This commit just moves code around and renames patterns/passes
accordingly. CMakeLists.txt for existing MemRef libraries are
also improved along the way.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D100326
Lowerings tosa.max_pool2d to linalg equivalent operations. Includes
adding max pooling operations for linalg, with corresponding tests.
Differential Revision: https://reviews.llvm.org/D99824
This patch unconditionally converts i1 types to i8 types on memrefs. If the
extensions or capabilities are not met, they will be converted to i32. Hence the
logic in IntLoadPattern and IntStorePattern are also updated.
Also added the implementation of SPIRVTypeConverter::getOptions().
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D99724
Non-32-bit scalar types require special hardware support that may not
exist on all GPUs. This is reflected in SPIR-V as that non-32-bit scalar
types require special capabilities or extensions.
Previously when there is a non-32-bit type and no native support, we
unconditionally emulate it with 32-bit ones. This isn't good given that
it can have implications over ABI and data layout consistency.
This commit introduces an option to control whether to use 32-bit
types to emulate.
Differential Revision: https://reviews.llvm.org/D100059
The patch enables the use of index type in vectors. It is a prerequisite to support vectorization for indexed Linalg operations. This refactoring became possible due to the newly introduced data layout infrastructure. The data layout of a module defines the bitwidth of the index type needed to verify bitcasts and similar vector operations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D99948