This patch extends affine data copy optimization utility with an
optional memref filter argument. When the memref filter is used, data
copy optimization will only generate copies for such a memref.
Note: this patch is just porting the memref filter feature from Uday's
'hop' branch: https://github.com/bondhugula/llvm-project/tree/hop.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D74342
Summary:
Adds affine loop fusion transformation function to LoopFusionUtils.
Updates TestLoopFusion utility to run loop fusion transformation until a fixed point is reached.
Adds unit tests to test the transformation.
Includes ASAN bug fix for D73190.
Reviewers: bondhugula, dcaballe
Reviewed By: bondhugula, dcaballe
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74330
Summary:
This revision adds a utility to generate debug locations from the IR during compilation, by snapshotting to a output stream and using the locations that operations were dumped in that stream. The new locations may either;
* Replace the original location of the operation.
old:
loc("original_source.cpp":1:1)
new:
loc("snapshot_source.mlir":10:10)
* Fuse with the original locations as NamedLocs with a specific tag.
old:
loc("original_source.cpp":1:1)
new:
loc(fused["original_source.cpp":1:1, "snapshot"("snapshot_source.mlir":10:10)])
This feature may be used by a debugger to display the code at various different levels of the IR. It would also be able to show the different levels of IR attached to a specific source line in the original source file.
This feature may also be used to generate locations for operations generated during compilation, that don't necessarily have a user source location to attach to.
This requires changes in the printer to track the locations of operations emitted in the stream. Moving forward we need to properly(and efficiently) track the number of newlines emitted to the stream during printing.
Differential Revision: https://reviews.llvm.org/D74019
This reverts commit 64871f778d.
ASAN indicates a use-after-free in in mlir::canFuseLoops(mlir::AffineForOp, mlir::AffineForOp, unsigned int, mlir::ComputationSliceState*) lib/Transforms/Utils/LoopFusionUtils.cpp:202:41
Summary:
Adds affine loop fusion transformation function to LoopFusionUtils.
Updates TestLoopFusion utility to run loop fusion transformation until a fixed point is reached.
Adds unit tests to test the transformation.
Reviewers: bondhugula, dcaballe, nicolasvasilache
Reviewed By: bondhugula, dcaballe
Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73190
Summary: This allows for providing a default "catchall" legality check that is not dependent on specific operations or dialects. For example, this can be useful to check legality based on the specific types of operation operands or results.
Differential Revision: https://reviews.llvm.org/D73379
Summary:
Affine minimum computation will be used in tiling transformation. The
implementation is mostly boilerplate as we already lower the minimum in the
upper bound of an affine loop.
Differential Revision: https://reviews.llvm.org/D73488
Summary: The current syntax for AffineMapAttr and IntegerSetAttr conflict with function types, making it currently impossible to round-trip function types(and e.g. FuncOp) in the IR. This revision changes the syntax for the attributes by wrapping them in a keyword. AffineMapAttr is wrapped with `affine_map<>` and IntegerSetAttr is wrapped with `affine_set<>`.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D72429
Rename the 'shlis' operation in the standard dialect to 'shift_left'. Add tests
for this operation (these have been missing so far) and add a lowering to the
'shl' operation in the LLVM dialect.
Add also 'shift_right_signed' (lowered to LLVM's 'ashr') and 'shift_right_unsigned'
(lowered to 'lshr').
The original plan was to name these operations 'shift.left', 'shift.right.signed'
and 'shift.right.unsigned'. This works if the operations are prefixed with 'std.'
in MLIR assembly. Unfortunately during import the short form is ambigous with
operations from a hypothetical 'shift' dialect. The best solution seems to omit
dots in standard operations for now.
Closestensorflow/mlir#226
PiperOrigin-RevId: 286803388
- a block argument associated with an arbitrary op can't be a valid
dimensional identifier; it has to be the block argument of either
a function op or an affine.for.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#331
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/331 from bondhugula:valid_dim 3273b4fcbaa31fb7b6671d93c9e42a6b2a6a4e4c
PiperOrigin-RevId: 286593693
Introduce affine.prefetch: op to prefetch using a multi-dimensional
subscript on a memref; similar to affine.load but has no effect on
semantics, but only on performance.
Provide lowering through std.prefetch, llvm.prefetch and map to llvm's
prefetch instrinsic. All attributes reflected through the lowering -
locality hint, rw, and instr/data cache.
affine.prefetch %0[%i, %j + 5], false, 3, true : memref<400x400xi32>
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#225
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/225 from bondhugula:prefetch 4c3b4e93bc64d9a5719504e6d6e1657818a2ead0
PiperOrigin-RevId: 286212997
This change allows for DialectConversion to attempt folding as a mechanism to legalize illegal operations. This also expands folding support in OpBuilder::createOrFold to generate new constants when folding, and also enables it to work in the context of a PatternRewriter.
PiperOrigin-RevId: 285448440
Add one more simplification for floordiv and mod affine expressions.
Examples:
(2*d0 + 1) floordiv 2 is simplified to d0
(8*d0 + 4*d1 + d2) floordiv 4 simplified to 4*d0 + d1 + d2 floordiv 4.
etc.
Similarly, (4*d1 + 1) mod 2 is simplified to 1,
(2*d0 + 8*d1) mod 8 simplified to 2*d0 mod 8.
Change getLargestKnownDivisor to return int64_t to be consistent and
to avoid casting at call sites (since the return value is used in expressions
of int64_t/index type).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#202
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/202 from bondhugula:affine b13fcb2f1c00a39ca5434613a02408e085a80e77
PiperOrigin-RevId: 284866710
Now that we have unrolling as a declarative pattern, we can drop a full pass that has gone stale. In the future we may want to add specific unrolling patterns for VectorTransferReadOp.
PiperOrigin-RevId: 283806880
In the replaceAllUsesExcept utility function called from loop coalescing the
iteration over the use-chain is incorrect. The use list nodes (IROperands) have
next/prev links, and bluntly resetting the use would make the loop to continue
on uses of the value that was replaced instead of the original one. As a
result, it could miss the existing uses and update the wrong ones. Make sure we
increment the iterator before updating the use in the loop body.
Reported-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#291.
PiperOrigin-RevId: 283754195
tensorflow/mlir#162 introduced a bug that
incorrectly allowed fusion of producer loops with multiple outgoing
edges. This commit fixes that problem. It also introduces a new flag to
disable sibling loop fusion so that we can test producer-consumer fusion
in isolation.
Closestensorflow/mlir#259
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282
PiperOrigin-RevId: 283531105
Depending on which of the offsets, sizes, or strides are constant, the
subview op can be canonicalized in different ways. Add such
canonicalizations, which generalize the existing approach of
canonicalizing subview op only if all of offsets, sizes and shapes are
constants.
PiperOrigin-RevId: 282010703
This moves the different canonicalizations of regions into one place and invokes them in the fixed-point iteration of the canonicalizer.
PiperOrigin-RevId: 281617072
This is a simple multi-level DCE pass that operates pretty generically on
the IR. Its key feature compared to the existing peephole dead op folding
that happens during canonicalization is being able to delete recursively
dead cycles of the use-def graph, including block arguments.
PiperOrigin-RevId: 281568202
This method is needed for N->1 conversion patterns to retrieve remapped
Values used in the original N operations.
Closestensorflow/mlir#237
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/237 from dcaballe:dcaballe/getRemappedValue 1f64fadcf2b203f7b336ff0c5838b116ae3625db
PiperOrigin-RevId: 281321881
This CL utilizies the more robust fusion feasibility analysis being built out in LoopFusionUtils, which will eventually be used to replace the current affine loop fusion pass.
PiperOrigin-RevId: 281112340
In essence, std.subview is just an abstract indexing transformation (somewhat
akin to a gep in llvm) and by itself has no effect. From a practical perspective
this helps, as it allows to remove dead subview operations.
PiperOrigin-RevId: 280630046
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.
PiperOrigin-RevId: 280529784
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
This refactors the implementation of block signature(type) conversion to not insert fake cast operations to perform the type conversion, but to instead create a new block containing the proper signature. This has the benefit of enabling the use of pre-computed analyses that rely on mapping values. It also leads to a much cleaner implementation overall. The major user facing change is that applySignatureConversion will now replace the entry block of the region, meaning that blocks generally shouldn't be cached over calls to applySignatureConversion.
PiperOrigin-RevId: 280226936
A pattern rewriter hook, mergeBlock, is added that allows for merging the operations of one block into the end of another. This is used to support a canonicalization pattern for branch operations that folds the branch when the successor has a single predecessor(the branch block).
Example:
^bb0:
%c0_i32 = constant 0 : i32
br ^bb1(%c0_i32 : i32)
^bb1(%x : i32):
return %x : i32
becomes:
^bb0:
%c0_i32 = constant 0 : i32
return %c0_i32 : i32
PiperOrigin-RevId: 278677825
The current lowering of loops to GPU only supports lowering of loop
nests where the loops mapped to workgroups and workitems are perfectly
nested. Here a new lowering is added to handle lowering of imperfectly
nested loop body with the following properties
1) The loops partitioned to workgroups are perfectly nested.
2) The loop body of the inner most loop partitioned to workgroups can
contain one or more loop nests that are to be partitioned across
workitems. Each individual loops nests partitioned to workitems should
also be perfectly nested.
3) The number of workgroups and workitems are not deduced from the
loop bounds but are passed in by the caller of the lowering as values.
4) For statements within the perfectly nested loop nest partitioned
across workgroups that are not loops, it is valid to have all threads
execute that statement. This is NOT verified.
PiperOrigin-RevId: 277958868
Rewrite patterns may make modifications to the CFG, including dropping edges between blocks. This change adds a simple unreachable block elimination run at the end of each iteration to ensure that the CFG remains valid.
PiperOrigin-RevId: 277545805