The computed number of hardware threads can change over the life of the process based on affinity changes. Since we need a data structure that is at least as large as the maximum parallelism, it is important to use the value that was actually latched for the thread pool we will be dispatching work to.
Also adds an assert specifically for if it doesn't line up (I was getting a crash on an index into the vector).
Differential Revision: https://reviews.llvm.org/D116257
This reverts commit 313de31fbb.
There is a missing CMake dependency, building with shared libraries is
broken:
55.509 [45/4/3061] Linking CXX shared library lib/libMLIRTosaToLinalg.so.14git
FAILED: lib/libMLIRTosaToLinalg.so.14git
...
TosaToLinalgPass.cpp: undefined reference to `mlir::createCanonicalizerPass()'
Linalg named ops lowering are moved to a separate pass. This allows TOSA
canonicalizers to run between named-ops lowerings and the general TOSA
lowerings. This allows the TOSA canonicalizers to run between lowerings.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D116057
Previously, we defined a struct named `RootOrderingCost`, which stored the cost (a pair consisting of the depth of the connector and a tie breaking ID), as well as the connector itself. This created some confusion, because we would sometimes write, e.g., `cost.cost.first` (the first `cost` referring to the struct, the second one referring to the `cost` field, and `first` referring to the depth). In order to address this confusion, here we rename `RootOrderingCost` to `RootOrderingEntry` (keeping the fields and their names as-is).
This clarification exposed non-determinism in the optimal branching algorithm. When choosing the best local parent, we were previuosly only considering its depth (`cost.first`) and not the tie-breaking ID (`cost.second`). This led to non-deterministic choice of the parent when multiple potential parents had the same depth. The solution is to compare both the depth and the tie-breaking ID.
Testing: Rely on existing unit tests. Non-detgerminism is hard to unit-test.
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D116079
There is no way to programmatically configure the list of disabled and enabled patterns in the canonicalizer pass, other than the duplicate the whole pass. This patch exposes the `disabledPatterns` and `enabledPatterns` options.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116055
* There is no reason to forbid that case
* Also, user will get very unfriendly error like `expected result type with offset = -9223372036854775808 instead of 1`
Differential Revision: https://reviews.llvm.org/D114678
These conversions are better suited to be applied at whole tensor
level. Applying these as canonicalizations end up triggering such
canonicalizations at all levels of the stack which might be
undesirable. For example some of the resulting code patterns wont
bufferize in-place and need additional stack buffers. Best is to be
more deliberate in when these canonicalizations apply.
Differential Revision: https://reviews.llvm.org/D115912
This commit rewrites most existing unittests involving FlatAffineConstraints
to use the parsing utility. This helps to make the tests more understandable.
This relands commit b0e8667b1d, which was
reverted in 6963be1276, with a fix to a unittest
which was incorrectly rewritten before.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D115920
This reverts commit b0e8667b1d.
ASAN/UBSAN bot is broken with this trace:
[ RUN ] FlatAffineConstraintsTest.FindSampleTest
llvm-project/mlir/include/mlir/Support/MathExtras.h:27:15: runtime error: signed integer overflow: 1229996100002 * 809999700000 cannot be represented in type 'long'
#0 0x7f63ace960e4 in mlir::ceilDiv(long, long) llvm-project/mlir/include/mlir/Support/MathExtras.h:27:15
#1 0x7f63ace8587e in ceil llvm-project/mlir/include/mlir/Analysis/Presburger/Fraction.h:57:42
#2 0x7f63ace8587e in operator* llvm-project/llvm/include/llvm/ADT/STLExtras.h:347:42
#3 0x7f63ace8587e in uninitialized_copy<llvm::mapped_iterator<mlir::Fraction *, long (*)(mlir::Fraction), long>, long *> include/c++/v1/__memory/uninitialized_algorithms.h:36:62
#4 0x7f63ace8587e in uninitialized_copy<llvm::mapped_iterator<mlir::Fraction *, long (*)(mlir::Fraction), long>, long *> llvm-project/llvm/include/llvm/ADT/SmallVector.h:490:5
#5 0x7f63ace8587e in append<llvm::mapped_iterator<mlir::Fraction *, long (*)(mlir::Fraction), long>, void> llvm-project/llvm/include/llvm/ADT/SmallVector.h:662:5
#6 0x7f63ace8587e in SmallVector<llvm::mapped_iterator<mlir::Fraction *, long (*)(mlir::Fraction), long> > llvm-project/llvm/include/llvm/ADT/SmallVector.h:1204:11
#7 0x7f63ace8587e in mlir::FlatAffineConstraints::findIntegerSample() const llvm-project/mlir/lib/Analysis/AffineStructures.cpp:1171:27
#8 0x7f63ae95a84d in mlir::checkSample(bool, mlir::FlatAffineConstraints const&, mlir::TestFunction) llvm-project/mlir/unittests/Analysis/AffineStructuresTest.cpp:37:23
#9 0x7f63ae957545 in mlir::FlatAffineConstraintsTest_FindSampleTest_Test::TestBody() llvm-project/mlir/unittests/Analysis/AffineStructuresTest.cpp:222:3
This method is more suitable as an opinterface: it seems intrinsic to
individual instances of the operation instead of the dialect.
Also remove the restriction on the interface being applicable to the entry block only.
Differential Revision: https://reviews.llvm.org/D116018
This is a purely mechanical patch moving some functionality out from the
`Simplex` class out into a `SimplexBase` class. This pavees the way for
a future patch adding support for lexicographic optimization with a class
`LexSimplex`, which will inherit from `SimplexBase`. Inheriting directly
from `Simplex` would bring many additional functions that would not work in
`LexSimplex` because it operates slighty differently from `Simplex`. So We
split out only the basic functionality it needs to inherit into `SimplexBase`.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D115831
This commit rewrites most existing unittests involving FlatAffineConstraints to use the parsing utility. This helps to make the tests more understandable.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D115920
TOSA's canonicalizers that change dense operations should be moved to a
seperate optimization pass to avoid canonicalizing to operations not supported
for relevant backends.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D115890
`EnumAttr` is a pure TableGen implementation of enum attributes using `AttrDef`. This is meant as a drop-in replacement for `StrEnumAttr`, which is soon to be deprecated. `StrEnumAttr` is often used over `IntEnumAttr` because its more readable in MLIR assembly formats. However, storing and manipulating strings is not efficient. Defining `StrEnumAttr` can also be awkward and relies on a lot of special logic in `EnumsGen`, and has some hidden sharp edges.
Also, `EnumAttr` stores the enum directly, removing the need to convert to/from integers when calling attribute getters on ops.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D115181
It is possible for the shift value to exceed the number of bits. In these
cases we can just multiply by zero. This is relatively rare occurence but
should be handled.
Reviewed By: not-jenni
Differential Revision: https://reviews.llvm.org/D115779
When the input and output of a pool2d op are both 1x1, it can be canonicalized to a no-op
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D115908
Slight rename and better variable type usage in tosa.conv2d to
tosa.fully_connected lowering. Included disabling pass for padded
convolutions.
Reviewed By: not-jenni
Differential Revision: https://reviews.llvm.org/D115776
The generated parser for ops with type inference calls `inferReturnTypes` before region resolution and segment attribute resolution, i.e. regions and the segment attributes are not passed to the `inferReturnTypes` even though it may need that information.
In particular, an op that has sized operand segments which queries those operands in its `inferReturnTypes` function will crash because the segment attributes hadn't been added yet.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D115782
When a dialect is loaded with `getOrLoadDialect`, its constructor may recurse and call `getOrLoadDialect` on a dependent dialect, which may result in an insertion in the dialect map, invalidating the reference to the (previously null) dialect pointer.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D115846
This allows the pass to participate in progressive lowering
and it also allows us to write tests better.
Along the way, cleaned up the tests.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D115756
This patch extends the GPU kernel outlining pass so that it can take in
an optional data layout specification that will be attached to the GPU
module operation generated. If the data layout specification is not provided
the default data layout is used instead.
Reviewed By: herhut, mehdi_amini
Differential Revision: https://reviews.llvm.org/D115722
Having a default value for the lowering strategy of the multi-reduction op has proven
to be unexpected by users. This patch is dropping the default value so that users have
to explicitly choose the lowering strategy to be applied.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115805
This allows op interface implementations to make decisions based on dialect-specific bufferization state.
This is in preparation of fixing conflict detection of CallOps in ModuleBufferization.
Differential Revision: https://reviews.llvm.org/D115705
The `rewrite` statement allows for rewriting a given root
operation with a block of nested rewriters. The root operation is
not implicitly erased or replaced, and any transformations to it
must be expressed within the nested rewrite block. The inner body
may contain any number of other rewrite statements, variables, or
expressions.
Differential Revision: https://reviews.llvm.org/D115299
This statement acts as a companion to the existing `erase`
statement, and is the corresponding PDLL construct for the
`PatternRewriter::replaceOp` C++ API. This statement replaces a
given operation with a set of values.
Differential Revision: https://reviews.llvm.org/D115298
Tuples are used to group multiple elements into a single
compound value. The values in a tuple can be of any type, and
do not need to be of the same type. There is also no limit to
the number of elements held by a tuple.
Tuples will be used to support multiple results from
Constraints and Rewrites (added in a followup), and will also
make it easier to support more complex primitives (such as
range based maps that can operate on multiple values).
Differential Revision: https://reviews.llvm.org/D115297
An operation expression in PDLL represents an MLIR operation. In
the match section of a pattern, this expression models one of
the input operations to the pattern. In the rewrite section of
a pattern, this expression models one of the operations to
create. The general structure of the operation expression is very
similar to that of the "generic form" of textual MLIR assembly:
```
let root = op<my_dialect.foo>(operands: ValueRange) {attr = attr: Attr} -> (resultTypes: TypeRange);
```
For now we only model the components that are within PDL, as PDL
gains support for blocks and regions so will this expression.
Differential Revision: https://reviews.llvm.org/D115296
This allows for using literal attributes and types within PDLL,
which simplifies building both constraints and rewriters. For
example, checking if an attribute is true is as simple as
`attr<"true">`.
Differential Revision: https://reviews.llvm.org/D115295
This allows for overriding the metadata of a pattern and
providing information such as the benefit, bounded recursion,
and more in the future.
Differential Revision: https://reviews.llvm.org/D115294
This is a new pattern rewrite frontend designed from the ground
up to support MLIR constructs, and to target PDL. This frontend
language was proposed in https://llvm.discourse.group/t/rfc-pdll-a-new-declarative-rewrite-frontend-for-mlir/4798
This commit starts sketching out the base structure of the
frontend, and is intended to be a minimal starting point for
building up the language. It essentially contains support for
defining a pattern, variables, and erasing an operation. The
features mentioned in the proposal RFC (including IDE support)
will be added incrementally in followup commits.
I intend to upstream the documentation for the language in a
followup when a bit more of the pieces have been landed.
Differential Revision: https://reviews.llvm.org/D115093
Previously, the LogicalResult return value of restoreRow was being ignored in
places where it was expected to always be success. Instead, check the result
and go to an `llvm_unreachable` if it turns out to be failure.
If all the dims are reduction dims, it is already in inner-most/outer-most
reduction form.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D115820
Implements the RegionBranchOpInterface method getNumRegionInvocations to `scf::IfOp` so that, when the condition is constant, the number of region executions can be analyzed by `NumberOfExecutions`.
Reviewed By: jpienaar, ftynse
Differential Revision: https://reviews.llvm.org/D115087
* Call `replaceOp` instead of `mapBuffer`.
* Remove bvm and all helper functions around bvm.
* Simplify FuncOp bufferization and rely on existing functionality to generate ToMemrefOps for function BlockArguments.
Differential Revision: https://reviews.llvm.org/D115515
Ops for the signed counterparts "llvm.smin" and "llvm.smax" already exist. This patch adds the unsigned versions as well.
Differential Revision: https://reviews.llvm.org/D115796
After removing the range type, Linalg does not define any type. The revision thus consolidates the LinalgOps.h and LinalgTypes.h into a single Linalg.h header. Additionally, LinalgTypes.cpp is renamed to LinalgDialect.cpp to follow the convention adopted by other dialects such as the tensor dialect.
Depends On D115727
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115728
This patch adds lowering from omp.sections and omp.section (simple lowering along with the nowait clause) to LLVM IR.
Tests for the same are also added.
Reviewed By: ftynse, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D115030
Instead of modifying the existing linalg.tiled_loop op, create a new op with memref input/outputs and delete the old op.
Differential Revision: https://reviews.llvm.org/D115493
Instead of modifying the existing scf.if op, create a new op with memref OpOperands/OpResults and delete the old op.
New allocations / other memrefs can now be yielded from the op. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.
Differential Revision: https://reviews.llvm.org/D115491
With VectorType supporting scalable dimensions, we don't need many of
the operations currently present in ArmSVE, like mask generation and
basic arithmetic instructions. Therefore, this patch also gets
rid of those.
Having built-in scalable vector support also simplifies the lowering of
scalable vector dialects down to LLVMIR.
Scalable dimensions are indicated with the scalable dimensions
between square brackets:
vector<[4]xf32>
Is a scalable vector of 4 single precission floating point elements.
More generally, a VectorType can have a set of fixed-length dimensions
followed by a set of scalable dimensions:
vector<2x[4x4]xf32>
Is a vector with 2 scalable 4x4 vectors of single precission floating
point elements.
The scale of the scalable dimensions can be obtained with the Vector
operation:
%vs = vector.vscale
This change is being discussed in the discourse RFC:
https://llvm.discourse.group/t/rfc-add-built-in-support-for-scalable-vector-types/4484
Differential Revision: https://reviews.llvm.org/D111819
Instead of modifying the existing scf.for op, create a new op with memref OpOperands/OpResults and delete the old op.
New allocations / other memrefs can now be yielded from the loop. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.
This change also introduces `replaceOp`, which will be utilized by all other `bufferize` implementations in future commits. Bufferization will then no longer rely on old (pre-bufferize) ops to DCE away. Instead old ops are deleted on the spot. This improves debuggability because there won't be any duplicate ops anymore (bufferized + not-yet-bufferized) when dumping IR during bufferization. It is also less fragile because unbufferized IR can no longer silently "hang around" due to an implementation bug.
Differential Revision: https://reviews.llvm.org/D114926
Added documentation to clearify the purpose of the bufferization to memref pass
and added some remarks.
Differential Revision: https://reviews.llvm.org/D115326
Remove the RangeOp and the RangeType that are not actively used anymore. After removing RangeType, the LinalgTypes header only includes the generated dialect header.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115727
Break up the vectorization pre-condition into the part checking for
static shape and the rest checking if the linalg op is supported by
vectorization. This allows checking if an op could be vectorized if it
had static shapes.
Differential Revision: https://reviews.llvm.org/D115754
While the default value for the amdgpu-flat-work-group-size attribute,
"1, 256", matches the defaults from Clang, some users of the ROCDL dialect,
namely Tensorflow, use larger workgroups, such as 1024. Therefore,
instead of hardcoding this value, we add a rocdl.max_flat_work_group_size
attribute that can be set on GPU kernels to override the default value.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D115741
data point using the 3-dim tensor nell-2.tns
MLIR:
READ FILE INTO COO: 24424.369294 ms ---> improves to ----> 9638.501044 ms
SORT COO BEFORE PACK: 762.834831 ms
PACK COO TO TENSOR: 1243.376245 ms
TACO:
b file read: 13270.9 ms
b pack: 7137.74 ms
b size: (12092 x 9184 x 28818), 925300328 bytes
https://github.com/llvm/llvm-project/issues/52679
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115696
Make the reduction handling in OpenMPIRBuilder compatible with
opaque pointers by explicitly storing the element type in ReductionInfo,
and also passing it to the atomic reduction callback, as at least
the ones in the test need the type there.
This doesn't make things fully compatible yet, there are other
uses of element types in this class. I also left one
getPointerElementType() call in mlir, because I'm not familiar
with that area.
Differential Revison: https://reviews.llvm.org/D115638
Instead of printing analysis debug information to stderr, annotate the IR. This makes it easier to understand decisions made by the analysis, especially in larger input IR.
Differential Revision: https://reviews.llvm.org/D115575
Implementation of the interface allows querying the size and alignments of an LLVMArrayType as well as query the size and alignment of a struct containing an LLVMArrayType.
The implementation should yield the same results as llvm::DataLayout, including support for over aligned element types.
There is no customization point for adjusting an arrays alignment; it is simply taken from the element type.
Differential Revision: https://reviews.llvm.org/D115704
This is the second part of https://reviews.llvm.org/D114993 after slicing
into 2 independent commits.
This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.
For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.
The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.
The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114993
This is the first part of https://reviews.llvm.org/D114993 which has been
split into small independent commits.
This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.
For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.
The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.
The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.
Reviewed By: nicolasvasilache
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
* Since these passes are now generic, they do not automatically nest with pass managers set up for implicit nesting.
* The Detensorize pass must run on a FunctionLike, and this requires explicit nesting.
* Addressed missed comments from the original and per-suggestion removed the assert on FunctionLike in ElementwiseToLinalg and DropUnitDims.cpp, which also is what was causing the integration test to fail.
This reverts commit aa8815e42e.
Differential Revision: https://reviews.llvm.org/D115671
explores various sparsity combinations of
the SDMM kernel and verifies that the computed
result is the same for all cases
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115476
Add convertFromMLIRSparseTensor to the supporting C shared library to convert
SparseTensorStorage to COO-flavor format.
Add Python routine sparse_tensor_to_coo_tensor to convert sparse tensor storage
pointer to numpy values for COO-flavor format tensor.
Add a Python test for sparse tensor output.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115557
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
* Since these passes are now generic, they do not automatically nest with pass managers set up for that.
* If running them over nested functions, you must nest explicitly. Upstream has adopted this style but *-opt still has some uses of implicit pipelines via args. See tests for argument changes needed.
Differential Revision: https://reviews.llvm.org/D115645
Adapt the LinalgStrategyVectorizationPattern pass to apply the vectorization patterns in two stages. The change ensures the generic pad tensor op vectorization pattern does not run too early. Additionally, the revision adds the transfer op canonicalization patterns to the set of applied patterns, since they are needed to enable efficient vectorization for rank-reduced convolutions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115627
This gives us better debugging print as it supports indent
levels and other nice features.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115583
The previous "optimization" that tries to reuse existing block for
selection header block can be problematic for deserialization
because it effectively pulls in previous ops in the selection op's
enclosing block into the selection op's header. When deserializing,
those ops will be placed in the selection op's region. If any of
the previous ops has usage after the section op, it will break. That
is, the following IR cannot round trip:
```mlir
^bb:
%def = ...
spv.mlir.selection { ... }
%use = spv.SomeOp %def
```
This commit removes the "optimization" to always create new blocks
for the selection header.
Along the way, also made error reporting better in deserialization
by turning asserts into proper errors and add check of uses outside
of sinked structured control flow region blocks.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115582
Despite handling regions and inferred return types, the builder was never generated for ops with both InferReturnTypeOpInterface and regions.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D115525
Use the current instead of the new source type to compute the rank-reduction map in getCanonicalSubViewResultType. Otherwise, the computation of the rank-reduction map fails when folding a cast into a subview since the strides of the new source type cannot be related to the strides of the current result type.
Depends On D115428
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115446
Using this implementation of the interface it is possible to query the size, ABI alignment as well as the preferred alignment of a struct. It should yield the same results as LLVMs `llvm::DataLayout` on an equivalent `llvm::StructType`, including for packed structs.
Additionally it is also possible to increase the ABI and preferred alignment using a data layout entry with the type `llvm.struct<()>, which serves the same functionality as the `a:` component in LLVMs data layout string.
Differential Revision: https://reviews.llvm.org/D115600
Do not compose pad tensor operations if the extract slice of the outer pad tensor operation is rank reducing. The inner extract slice op cannot be rank-reducing since it source type must match the desired type of the padding.
Depends On D115359
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115428
Tighten the matcher of the PadTensorOpVectorizationWithInsertSlicePattern pattern. Only match if the PadOp result is used by the InsertSliceOp source. Fail if the result is used by the InsertSliceOp dest.
Depends On D115336
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115359
Adapt the computation of a static bounding box to take rank-reducing slice operations into account by filtering out reduced size one dimensions. The revision is needed to make padding work for decomposed convolution operations. The decomposition introduces rank reducing extract slice operations that previously let padding fail.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115336
We currently restrict parsing of location to not allow nameloc being
nested inside nameloc. This restriction may be historical as there
doesn't seem to be a reason for it anymore (locations like this can be
constructed in C++ and they print fine). Relax this restriction in the
parser to allow this nesting.
Differential Revision: https://reviews.llvm.org/D115581
Flags some potential cases where splitting isn't happening and so could result
in confusing results. Also update some test files where there were near misses
in splitting that seemed unintentional.
Differential Revision: https://reviews.llvm.org/D109636
The 0-D case gets lowered in almost the same way that the 1-D case does
in VectorCreateMaskOpConversion. I also had to slightly update the
verifier for the op to always require exactly 1 operand in the 0-D case.
Depends On D115220
Reviewed by: ftynse
Differential revision: https://reviews.llvm.org/D115221
Following the example of `VectorOfAnyRankOf`, I've done a few changes in the
`.td` files to help with adding the support for the 0-D case gradually.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D115220
When subtracting `b \ c`, when there are divisions in `c`, these division
constraints get added to `b`. `b` must be restored to its original state
when returning, but these added divisions constraints were not removed in
one of the return paths. This patch fixes this and deduplicates the
restoration logic by encapuslating it in a lambda `restoreState`. The patch
also includes a regression test for the bug fix.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D115577
If we have a `spv.mlir.selection` op nested in a `spv.mlir.loop`
op, when serializing the loop's block, we might need to jump
from the selection op's merge block, which might be different
than the immediate MLIR IR predecessor block. But we still need
to get the block argument from the MLIR IR predecessor block.
Also, if the `spv.mlir.selection` is in the `spv.mlir.loop`'s
header block, we need to make sure `OpLoopMerge` is emitted
in the current block before start processing the nested selection
op. Otherwise we'll see the LoopMerge in the wrong SPIR-V
basic block.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115560
This patch adds support for extracting divisions when the set contains bounds
which are tighter than the division bounds. For example:
```
3q - i + 2 >= 0 <-- Lower bound for 'q'
-3q + i - 1 >= 0 <-- Tighter upper bound for 'q'
```
Here, the actual upper bound for division for `q` would be `-3q + i >= 0`, but
since this actual upper bound is implied by a tighter upper bound, which awe can still
extract the divison.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D115096
`(void)` was added when LogicalResult was marked as non
discard. This commit cleans them up to properly propagate
failures.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115541
This should really come from a matching target environment. But
as a default, it can be handy (to avoid always listing the full
resource limits attribute in IR, etc.). It's common to see 32
so use that as the subgroup size.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115534
It's legal per the Vulkan / SPIR-V spec; still it's better to avoid
such duplication to have cleaner blob and reduce the binary size.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115532
In SPIR-V, symbol names are encoded as `OpName` instructions.
They are not semantic impacting and can be omitted, which can
reduce the binary size.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115531
The method that was previously used for computing dual variables was incorrect.
This was used in the integer emptiness check algorithm, where this bug could lead to much longer running times. (Due to the way it is used, this never results in an incorrect emptiness check result.)
This patch fixes the dual computation and adds some additional asserts that catch this bug, along with regression test cases that trigger the asserts when the incorrect dual computation is used.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D113803
Introduce a function `getNumIdKind` that returns the number of ids of the
specified kind. Remove the function `assertAtMostNumIdKind` and instead just
directly assert the inequality with a call to `getNumIdKind`.
NFC. Move out and expose affine scalar replacement utility through
affine utils. Renaming misleading forwardStoreToLoad ->
affineScalarReplace. Update a stale doc comment.
Differential Revision: https://reviews.llvm.org/D115495
InsertSliceOp may have subprefix semantics where missing trailing dimensions
are automatically inferred directly from the operand shape.
This revision fixes an overflow that occurs in such cases when the impl is based on the op rank.
Differential Revision: https://reviews.llvm.org/D115549
* Constraints/Rewrites registered before a pattern was added were dropped
* Constraints/Rewrites may be registered multiple times (if different pattern sets depend on them)
* ModuleOp no longer has a terminator, so we shouldn't be removing the terminator from it
Differential Revision: https://reviews.llvm.org/D114816
Switch the attribute creation operations to use attr-dict-with-
keyword to avoid conflicts (in the case of pdl.attribute) and
confusion(in the case of pdl_interp.create_attribute) with
having a DictionaryAttr as a value and specifying the
attributes of the operation itself (as a dictionary).
Differential Revision: https://reviews.llvm.org/D114815
The results of a rewrite are optional, but we currently require
them to be present in the assembly format. This commit
makes the results component in the format optional.
Differential Revision: https://reviews.llvm.org/D114814
Custom ops that have no parser or printer should fall back to the dialect's parser and/or printer hooks. This avoids the need to define parsers and printers that simply dispatch to the dialect hook.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D115481
Wrong type was used for the result type in the tosa.conv_2d canonicalization.
The type should match the result element type should match the result type
not the input element type.
Differential Revision: https://reviews.llvm.org/D115463
This patterns tries to convert an inner (outer) dim reduction to an
outer (inner) dim reduction. Doing this on a 1D or 0D vector results
in an infinite loop since the converted op is same as the original
operation. Just returning failure when source rank <= 1 fixes the
issue.
Differential Revision: https://reviews.llvm.org/D115426
The sparse tensor code generator allocates memory for the output tensor. As
such, we only need to allocate a MemRefDescriptor to receive the output tensor
and do not need to allocate and initialize the storage for the tensor.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115292
This patch adds the documentation for the operations `omp.atomic.read`,
`omp.atomic.write` and `omp.atomic.update`.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D115445
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered
This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.
And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.
In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110448
Depends On D115263
By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D115436
With complex recursive structure of async dispatch function LLVM can't always propagate constants to the parallel_compute_fn and it often prevents optimizations like loop unrolling and vectorization. We help LLVM by pushing known constants into the parallel_compute_fn explicitly.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D115263
LinalgOp results usually bufferize inplace with output args. With this change, they may buffer inplace with input args if the value of the output arg is not used in the computation.
Differential Revision: https://reviews.llvm.org/D115022
This patch factors out math functionality that is a subset of Presburger arithmetic and moves it from FlatAffineConstraints to Presburger/IntegerPolyhedron. This patch only moves some parts of the functionality planned to be moved, with subsequent patches moving more functionality. There are three main reasons for this:
1. This split makes the Presburger Library easier and more flexible to use
across MLIR, by not depending on IR.
2. This split allows the Presburger library to be developed independently from
Affine Analysis, with Affine Analysis using this library.
3. With more functionality being upstreamed to the Presburger Library, the
mlir/Analysis directory will be cluttered with Presburger library components
since they depend on math functionality from FlatAffineConstraints. Moving this
functionality to the Presburger directory allows keeping the new functionality
in the Presburger directory.
This patch is part of an ongoing effort to make the Presburger Library easier to use. The motivation for this effort is the feedback received at the LLVM conference from Mehdi and others.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D114674
This patch provides functionality for simplifying `PresburgerSet`s by checking if any `FlatAffineConstraints` in the set is contained in another, and removing such redundant FACs.
This is part of a series of patches to provide functionality for [integer set coalescing](http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf) in MLIR.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D110617
This patch supports the atomic construct (update) following section 2.17.7 of OpenMP 5.0 standard. Also added tests and verifier for the same.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D112982
The region of `linalg.generic` might contain `tensor` operations. For
example, current lowering of `gather` uses a `tensor.extract` in the
body of the `LinalgOp`. Bufferize the ops within a `LinalgOp` region
as well to catch such cases.
Differential Revision: https://reviews.llvm.org/D115322
Count leading/trailing zeros are an existing LLVM intrinsic. Added LLVM
support for the intrinsics with lowerings from the math dialect to LLVM
dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D115206
This adds a new option `dialectFilter` to BufferizationOptions. Only ops from dialects that are allow-listed in the filter are bufferized. Other ops are left unbufferized. Note: This option requires `allowUnknownOps = true`.
To make use of `dialectFilter`, BufferizationOptions or BufferizationState must be passed to various helper functions.
The purpose of this change is to provide a better infrastructure for partial bufferization, which will be fully activated in a subsequent change.
Differential Revision: https://reviews.llvm.org/D114691
This is a defensive action to catch at build time on Linux failures that
may happen only on Windows otherwise.
Differential Revision: https://reviews.llvm.org/D115316
The new form of printing attribute in the declarative assembly is eliding the `#dialect.mnemonic` prefix to only keep the `<....>` part.
Differential Revision: https://reviews.llvm.org/D113873
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115091
Quantized case needs to include zero-point corrections before the tosa.mul.
Disabled for the quantized use-case.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D115264
These functions are generic utility functions that operates on
affine ops within SCF regions. Moving them to their own files
for a better code structure, instead of mixing with loop
specialization logic.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115245
This change mainly changes the API. There is no mentioning of FuncOps in ComprehensiveBufferize anymore.
Also, bufferize methods of the op interface are called for ops without tensor operands/results if they have a region.
Differential Revision: https://reviews.llvm.org/D115212
The new `getLocalReprs` function also outputs `dividends` and
`denominators` and hence the CheckDivisionRepresentation fn is
modified to take the newer getLocalReprs function into account.
Signed-off-by: Prashant Kumar <pk5561@gmail.com>
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D115146
This patch adds lowering from omp.atomic.read to LLVM IR along with the
memory ordering clause. Tests for the same are also added.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D115134
Depends On D115004
Cleans up code legibility by requiring the `emitCInterface` parameter to be explicit at all call-sites, and defining boolean aliases for that parameter.
Reviewed By: aartbik, rriddle
Differential Revision: https://reviews.llvm.org/D115005
For a 1x1 weight and stride of 1, the input/weight can be reshaped and
multiplied elementwise then reshaped back
Reviewed By: rsuderman, KoolJBlack
Differential Revision: https://reviews.llvm.org/D115207
Make fields private and clean up the interface. In particular, BufferizableOpInterface::bufferize no longer has access to `aliasInfo`. This was potentially dangerous because some of the ops registered in BufferizationAliasInfo may have been deleted.
Differential Revision: https://reviews.llvm.org/D114931
Fixed the tosa.conv2d to tosa.fully_connected canonicalization for incorrect
output channels. Included uptes to tests to include checks for the result
shapes during canonicalization.
This allows conv2d to transform to the simpler fully_connected operation.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D115170
Do load and store to verify that we process each element of the iteration space once.
Reviewed By: cota
Differential Revision: https://reviews.llvm.org/D115152
Conversion of LLVM named structs leads to them being renamed since we cannot
modify the body of the struct type once it is set. Previously, this applied to
all named struct types, even if their element types were not affected by the
conversion. Make this behvaior only applicable when element types are changed.
This requires making the LLVM dialect type-compatibility check recursively look
at the element types (arguably, it should have been doing than since the moment
the LLVM dialect type system stopped being closed). In addition, have a more
lax check for outer types only to avoid repeated check when necessary (e.g.,
parser, verifiers that are going to also look at the inner type).
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D115037
This is a cleanup of ModuleBufferization. Instead of storing information about writable function arguments in BufferizationAliasInfo, we can use isWritable and make the decision there, based on dialect-specifc bufferization state.
Differential Revision: https://reviews.llvm.org/D114930
Remove all function calls related to buffer equivalence from bufferize implementations.
Add a new PostAnalysisStep for scf.for that ensures that yielded values are equivalent to the corresponding BBArgs. (This was previously checked in `bufferize`.) This will be relaxed in a subsequent commit.
Note: This commit changes two test cases. These were broken by design
and should not have passed. With the new scf.for PostAnalysisStep, this
bug was fixed.
Differential Revision: https://reviews.llvm.org/D114927
Adding the default implementation of `getLoopIteratorTypes` and
`getLoopBounds` allows ExternalModels to override these methods.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115101
Collect equivalent BBArgs right after the equivalence analysis of the FuncOp and before bufferizing. This is in preparation of decoupling bufferization from aliasInfo.
Also gather equivalence info for CallOps, which was missing in the
previous commit.
Differential Revision: https://reviews.llvm.org/D114847
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:
- `vector.constant_mask [0] : vector<i1>` which gets lowered to
`arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
`arith.constant dense<true> : vector<i1>`
(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115023
This revision makes the padding pattern independent of the application order. It addresses the concern that we cannot rely on the execution order of the greedy rewriter (https://reviews.llvm.org/D114689). Instead, the pattern is updated to apply repeatedly till all operations are padded.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D114851
When using -test-loop-permutation="permutation-map=...", applies the
permutation map on each affine nest in the function (and not only the
first one). If the size of the permutation map and the size of a nest
are not consistent, do nothing on this particular nest (instead of
making MLIR crash).
Differential Revision: https://reviews.llvm.org/D112947
Let the user registers their own handler to processing the matching
failure information.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D110896
Instead of checking buffer equivalence during bufferization, gather buffer equivalence information right after the analysis. This is in preparation of decoupling bufferization from BufferizationAliasInfo.
This change also fixes equivalence analysis for scf.if op results, which was not fully implemented. scf.if op results are equivalent to their corresponding yield values if both yield values are equivalent.
Differential Revision: https://reviews.llvm.org/D114774
Fix affine.for unroll for multi-result upper bound maps: these can't be
unrolled/unroll-and-jammed in cases where the trip count isn't known to
be a multiple of the unroll factor.
Fix and clean up repeated/unnecessary checks/comments at helper callees.
Also, fix clang-tidy variable naming warnings and redundant includes.
Differential Revision: https://reviews.llvm.org/D114662
Internally we use int64_t to hold shapes, but for some
reason the parser was limiting shapes to unsigned. This
change updates the parser to properly handle int64_t shape
dimensions.
Differential Revision: https://reviews.llvm.org/D115086
Test case files at most places in MLIR uses hyphens and not underscores.
A counter-pattern was somehow started to use underscores in some places.
Rename test cases in test/mlir-cpu-runner to use hyphens so that it's
consistent at least within its directory.
Differential Revision: https://reviews.llvm.org/D114672
Also set insertion point right before calling `bufferize`. No need to put an InsertionGuard anymore.
Differential Revision: https://reviews.llvm.org/D114928
This reverts commit 13bdb7ab4a. The commit introduced/uncovered an unintended bug in models containing Conv2D.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D115079
BufferizationState had map/lookup overloads for non-tensor values. This was necessary for IREE. There is now a better way to do this, so these overloads can be removed.
Differential Revision: https://reviews.llvm.org/D114929
A previous commit added support for converting elemental types contained in
LLVM dialect types in case they were not compatible with the LLVM dialect. It
was missing support for named structs as they could be recursive, which was not
supported by the conversion infra. Now that it is, add support for converting
such named structs.
Depends On D113579
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D113580
Allow ops that are not bufferizable in the input IR. (Deactivated by default.)
bufferization::ToMemrefOp and bufferization::ToTensorOp are generated at the bufferization boundaries.
Differential Revision: https://reviews.llvm.org/D114669
Also store a reference to BufferizationOptions in BufferizationState. This is in preparation of adding support for partial bufferization.
Differential Revision: https://reviews.llvm.org/D114661
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114854
This change provides `BufferizableOpInterface` implementations for ops from the Bufferization dialects. These ops are needed at the bufferization boundaries for partial bufferization.
Differential Revision: https://reviews.llvm.org/D114618
Affine maps and integer sets previously relied on a single lock for creating unique instances. In a multi-threaded setting, this lock becomes a contention point. This commit updates AffineMap and IntegerSet to use StorageUniquer instead. StorageUniquer internally uses sharded locks and thread-local caches to reduce contention. It is already used for affine expressions, types and attributes. On my local machine, this gives me a 5X speedup for an application that manipulates a lot of affine maps and integer sets.
This commit also removes the integer set uniquer threshold. The threshold was used to avoid adding integer sets with a lot of constraints to the hash_map containing unique instances, but the constraints and the integer set were still allocated in the same allocator and never freed, thus not saving any space expect for the hash-map entry.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D114942
This patch implements detecting duplicate local identifiers by extracting their
division representation while merging local identifiers.
For example, given the FACs A, B:
```
A: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 2)
B: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 5)
```
The intersection of A and B without this patch would lead to the following FAC:
```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4], d2 = [x / 4], d3 = [x / 4]: d0 <= s0, d1 <= s0, d2 <= s0, d3 <= s0, x + y >= 2, x + y >= 5)
```
after this patch, merging of local ids will detect that `d0 = d2` and `d1 = d3`,
and the intersection of these two FACs will be (after removing duplicate constraints):
```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4] : d0 <= s0, d1 <= s0, x + y >= 2, x + y >= 5)
```
This reduces the number of constraints by 2 (constraints) + 4 (2 constraints for each extra division) for this case.
This is used to reduce the output size representation of operations like
PresburgerSet::subtract, PresburgerSet::intersect which require merging local
variables.
Reviewed By: arjunp, bondhugula
Differential Revision: https://reviews.llvm.org/D112867
Revert changes that were meant to be sent as a single commit with
summary for the differential review, but were accidently sent directly.
This reverts commit 3bc5353fc6.
When an attribute is optional & is given an additional constraint in
rewrite pattern that could lead to dereferencing null Attribute. Avoid
cases where the constraints checks attribute but has no check if null.
This should be improved to be more uniformly guarded.
This is a lightweight operation, useful for writing unit tests. It will be utilized for testing in subsequent commits.
Differential Revision: https://reviews.llvm.org/D114693
This patch fixes the build by removing
extractVectorTypeFromShapedValue. The last use was removed Dec 1,
2021 in commit extractVectorTypeFromShapedValue.
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.
Reviewed By: ThomasRaoux, springerm
Differential Revision: https://reviews.llvm.org/D114803
RootOrderingTest is a low-level unit test that creates values and uses them as vertices in a directed graph. These vertices were created using `builder.create`, but never freed, due to my insufficient understanding of the MLIR infrastructure.
Reviewed By: mehdi_amini, bondhugula, rriddle
Differential Revision: https://reviews.llvm.org/D114745
However, since CallOps have no aliasing OpResults, their OpOperands always bufferize out-of-place.
This change removes `bufferizesToMemoryWrite` from `CallOpInterface`. This method was called, but its return value did not matter.
Differential Revision: https://reviews.llvm.org/D114616
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.
This solves regression in IREE due to e4e4da86af
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D114838
This reverts commit 9a844c2a9b.
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.
This solves regression in IREE due to e4e4da86af
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D114838
This reverts commit 29a50c5864.
After LLVM lowering, the original patch incorrectly moved alignment
information across an unconstrained GEP operation. This is only correct
for some index offsets in the GEP. It seems that the best approach is,
in fact, to rely on LLVM to propagate information from the llvm.assume()
to users.
Thanks to Thomas Raoux for catching this.
Proper test for sparse tensor outputs is a single condition throughout
the whole tensor index expression (not a general conjunction, since this
may include other conditions that cause cancellation).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114810
This revision reintroduces tensor.insert_slice verification which seems
to have vanished over time: a verifier was initially introduced in cf9503c1b7
but for some reason the invalid.mlir was not properly updated; as time passed the verifier was not called anymore and later the code was deleted.
As a consequence, a non-negligible portion of tests has run astray using invalid
tensor.insert_slice semantics and needed to be fixed.
Also, extract isRankReducedType from TensorOps for better reuse
Originally, this facility was used by both tensor and memref forms but
it got copied around as dialects were split.
Differential Revision: https://reviews.llvm.org/D114715
The canonical type of the result of the `memref.subview` needs to make
sure that the previously dropped unit-dimensions are the ones dropped
for the canonicalized type as well. This means the generic
`inferRankReducedResultType` cannot be used. Instead the current
dropped dimensions need to be querried and the same need to be dropped.
Reviewed By: nicolasvasilache, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D114751
For a 1x1 weight and stride of 1, the input/weight can be reshaped and passed into a fully connected op then reshaped back
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D114757
Add the decompose patterns that lower higher dimensional convolutions to lower dimensional ones to CodegenStrategy and use CodegenStrategy to test the decompose patterns. Additionally, remove the assertion that checks the anchor op name is set in the CodegenStrategyTest pass. Removing the assertion allows us to simplify the pipelines used in the interchange and decompose tests.
Depends On D114797
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114798
The revision updates the convolution decomposition patterns to take a linalg transformation filter. The transformation filter in a later revision allows use the patterns from CodegenStrategy.
Depends On D114690
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114797
Add support for an empty anchor op string in vectorization. An empty anchor op string is useful after fusion when there are multiple different operations to vectorize.
Depends On D114689
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114690
Pad the operation using a top down traversal. The top down traversal unlocks folding opportunities and dim op canonicalizations due to the introduced extract slice operation after the padded operation.
Depends On D114585
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114689
Iterating backwardSlice and removing elements at the same time can fail on windows for specific build configurations (the code was introduced in https://reviews.llvm.org/D114420). This revision introduces a second vector to collect all operations and removes them after finishing the reverse iteration.
Reviewed By: hpmorgan
Differential Revision: https://reviews.llvm.org/D114775
Add CSE after every transformation. Transformations such as tiling introduce redundant computation, for example, one AffineMinOp for every operand dimension pair. Follow up transformations such as Padding and Hoisting benefit from CSE since comparing slice sizes simplifies to comparing SSA values instead of analyzing affine expressions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114585