This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
ModulePass doesn't provide any special utilities and thus doesn't give enough benefit to warrant a special pass class. This revision replaces all usages with the more general OperationPass.
Differential Revision: https://reviews.llvm.org/D77339
Fix point-wise copy generation to work with bounds that have max/min.
Change structure of copy loop nest to use absolute loop indices and
subtracting base from the indexes of the fast buffers. Update supporting
utilities: Fix FlatAffineConstraints::getLowerAndUpperBound to look at
equalities as well and for a missing division. Update unionBoundingBox
to not discard common constraints (leads to a tighter system). Update
MemRefRegion::getConstantBoundingSizeAndShape to add memref dimension
constraints. Run removeTrivialRedundancy at the end of
MemRefRegion::compute. Run single iteration loop promotion and
load/store canonicalization after affine data copy (in its test pass as
well).
Differential Revision: https://reviews.llvm.org/D77320
Add a pattern rewriter utility to erase blocks (while notifying the
pattern rewriting driver of the erased ops). Use this to remove trivial
else blocks in affine.if ops.
Differential Revision: https://reviews.llvm.org/D77083
Removing dead ops should make the outer loop of the pattern rewriting
driver run again. Although its operands are added to the worklist, if no
changes happenned to them or remaining ops in the worklist, the driver
wouldn't run once again - but it should be.
Differential Revision: https://reviews.llvm.org/D77483
PatternRewriter and derived classes provide a set of virtual methods to
manipulate blocks, which ConversionPatternRewriter overrides to keep track of
the manipulations and undo them in case the conversion fails. However, one can
currently create a block only by splitting another block into two. This not
only makes the API inconsistent (`splitBlock` is allowed in conversion
patterns, but `createBlock` is not), but it also make it impossible for one to
create blocks with argument lists different from those of already existing
blocks since in-place block updates are not supported either. Such
functionality precludes dialect conversion infrastructure from being used more
extensively on region-containing ops, for example, for value-returning "if"
operations. At the same time, ConversionPatternRewriter already allows one to
undo block creation as block creation is one of the primitive operations in
already supported region inlining.
Support block creation in conversion patterns by hooking `createBlock` on the
block action undo mechanism. This requires to make `Builder::createBlock`
virtual, similarly to Op insertion. This is a minimal change to the Builder
infrastructure that will later help support additional use cases such as block
signature changes. `createBlock` now additionally takes the types of the block
arguments that are added immediately so as to avoid in-place argument list
manipulation that would be illegal in conversion patterns.
Add a method that given an affine map returns another with just its unique
results. Use this to drop redundant bounds in max/min for affine.for. Update
affine.for's canonicalization pattern and createCanonicalizedForOp to use
this.
Differential Revision: https://reviews.llvm.org/D77237
Modernize/cleanup code in loop transforms utils - a lot of this code was
written prior to the currently available IR support / code style. This
patch also does some variable renames including inst -> op, comment
updates, turns getCleanupLoopLowerBound into a local function.
Differential Revision: https://reviews.llvm.org/D77175
This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).
Differential Revision: https://reviews.llvm.org/D76659
This will greatly simplify a number of things related to passes:
* Enables generation of pass registration
* Enables generation of boiler plate pass utilities
* Enables generation of pass documentation
This revision focuses on adding the basic structure and adds support for generating the registration for passes in the Transforms/ directory. Future revisions will add more support and move more passes over.
Differential Revision: https://reviews.llvm.org/D76656
Rewrite mlir::permuteLoops (affine loop permutation utility) to fix
incorrect approach. Avoiding using sinkLoops entirely - use single move
approach. Add test pass.
This fixes https://bugs.llvm.org/show_bug.cgi?id=45328
Depends on D77003.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D77004
Add missing assert checks for input to mlir::interchangeLoops utility.
Rename interchangeLoops -> permuteLoops; update doc comments to clarify
inputs / return val. Other than the assert checks, this is NFC.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D77003
This patch introduces a utility to separate full tiles from partial
tiles when tiling affine loop nests where trip counts are unknown or
where tile sizes don't divide trip counts. A conditional guard is
generated to separate out the full tile (with constant trip count loops)
into the then block of an 'affine.if' and the partial tile to the else
block. The separation allows the 'then' block (which has constant trip
count loops) to be optimized better subsequently: for eg. for
unroll-and-jam, register tiling, vectorization without leading to
cleanup code, or to offload to accelerators. Among techniques from the
literature, the if/else based separation leads to the most compact
cleanup code for multi-dimensional cases (because a single version is
used to model all partial tiles).
INPUT
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
"foo"() : () -> ()
}
}
OUTPUT AFTER TILING W/O SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
OUTPUT AFTER TILING WITH SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0) -> (d0 + 32)>
map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
#set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.if #set0(%arg2, %arg3)[%M, %N] {
// Full tile.
affine.for %arg4 = #map0(%arg2) to #map1(%arg2) {
affine.for %arg5 = #map0(%arg3) to #map1(%arg3) {
"foo"() : () -> ()
}
}
} else {
// Partial tile.
affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
}
The separation is tested via a cmd line flag on the loop tiling pass.
The utility itself allows one to pass in any band of contiguously nested
loops, and can be used by other transforms/utilities. The current
implementation works for hyperrectangular loop nests.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76700
This allows conversion of a ParallelLoop from N induction variables to
some nuber of induction variables less than N.
The first intended use of this is for the GPUDialect to convert
ParallelLoops to iterate over 3 dimensions so they can be launched as
GPU Kernels.
To implement this:
- Normalize each iteration space of the ParallelLoop
- Use the same induction variable in a new ParallelLoop for multiple
original iterations.
- Split the new induction variable back into the original set of values
inside the body of the ParallelLoop.
Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76363
The declarations for these were already part of transforms utils, but
the definitions were left in affine transforms. Move definitions to loop
transforms utils.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76633
When trying to fold an operation during operation creation check that
the operation folding succeeds before inserting the op.
Differential Revision: https://reviews.llvm.org/D76415
Move some of the affine transforms and their test cases to their
respective dialect directory. This patch does not complete the move, but
takes care of a good part.
Renames: prefix 'affine' to affine loop tiling cl options,
vectorize -> super-vectorize
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76565
Summary:
Change AffineOps Dialect structure to better group both IR and Tranforms. This included extracting transforms directly related to AffineOps. Also move AffineOps to Affine.
Differential Revision: https://reviews.llvm.org/D76161
OperationFolder::tryToFold was running the pre-replacement
action even when there was no constant folding, i.e., when the operation
was just being updated in place but was not going to be replaced. This
led to nested ops being unnecessarily removed from the worklist and only
being processed in the next outer iteration of the greedy pattern
rewriter, which is also why this didn't affect the final output IR but
only the convergence rate. It also led to an op's results' users to be
unnecessarily added to the worklist.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76268
Summary: This is somewhat complex(annoying) as it involves directly tracking the uses within each of the callgraph nodes, and updating them as needed during inlining. The benefit of this is that we can have a more exact cost model, enable inlining some otherwise non-inlinable cases, and also ensure that newly dead callables are properly disposed of.
Differential Revision: https://reviews.llvm.org/D75476
Summary:
This revision adds a new hook, `notifyMatchFailure`, that allows for notifying the rewriter that a match failure is coming with the provided reason. This hook takes as a parameter a callback that fills a `Diagnostic` instance with the reason why the match failed. This allows for the rewriter to decide how this information can be displayed to the end-user, and may completely ignore it if desired(opt mode). For now, DialectConversion is updated to include this information in the debug output.
Differential Revision: https://reviews.llvm.org/D76203
Summary: PatternState was a mechanism to pass state between the match and rewrite calls of a RewritePattern. With the rise of matchAndRewrite, this class is unused and unnecessary. This revision removes PatternState and simplifies PatternMatchResult to just be a LogicalResult. A future revision will replace all usages of PatternMatchResult/matchSuccess/matchFailure with LogicalResult equivalents.
Differential Revision: https://reviews.llvm.org/D76202
- rename vars that had inst suffixes (due to ops earlier being
known as insts); other renames for better readability
- drop unnecessary matches in test cases
- iterate without block terminator
- comment/doc updates
- instBodySkew -> affineForOpBodySkew
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76214
Summary:
- remove stale declarations on flat affine constraints
- avoid allocating small vectors where possible
- clean up code comments, rename some variables
Differential Revision: https://reviews.llvm.org/D76117
Summary: A number of transform import StandardOps despite not being dependent on it. Cleaned it up to better understand what dialects each of these transforms depend on.
Differential Revision: https://reviews.llvm.org/D76112
HasNoSideEffect can now be implemented using the MemoryEffectInterface, removing the need to check multiple things for the same information. This also removes an easy foot-gun for users as 'Operation::hasNoSideEffect' would ignore operations that dynamically, or recursively, have no side effects. This also leads to an immediate improvement in some of the existing users, such as DCE, now that they have access to more information.
Differential Revision: https://reviews.llvm.org/D76036
The current mechanism for identifying is a bit hacky and extremely adhoc, i.e. we explicit check 1-result, 0-operand, no side-effect, and always foldable and then assume that this is a constant. Adding a trait adds structure to this, and makes checking for a constant much more efficient as we can guarantee that all of these things have already been verified.
Differential Revision: https://reviews.llvm.org/D76020
These terminator operations don't really have any side effects, and this allows for more accurate side-effect analysis for region operations. For example, currently we can't detect like a loop.for or affine.for are dead because the affine.terminator is "side effecting".
Note: Marking as NoSideEffect doesn't mean that these operations can be opaquely erased.
Differential Revision: https://reviews.llvm.org/D75888
Summary:
affineDataCopyGenerate is a monolithinc function that
combines several steps for good reasons, but it makes customizing
the behaivor even harder. The major two steps by affineDataCopyGenerate are:
a) Identify interesting memrefs and collect their uses.
b) Create new buffers to forward these uses.
Step (a) actually has requires tremendous customization options. One could see
that from the recently added filterMemRef parameter.
This patch adds a function that only does (b), in the hope that (a)
can be directly implemented by the callers. In fact, (a) is quite
simple if the caller has only one buffer to consider, or even one use.
Differential Revision: https://reviews.llvm.org/D75965
Summary:
Interfaces/ is the designated directory for these types of interfaces, and also removes the need for including them directly in IR/.
Differential Revision: https://reviews.llvm.org/D75886
The interfaces themselves aren't really analyses, they may be used by analyses though. Having them in Analysis can also create cyclic dependencies if an analysis depends on a specific dialect, that also provides one of the interfaces.
Differential Revision: https://reviews.llvm.org/D75867
Summary:
The old interface was a temporary stopgap to allow for implementing simple LICM that took side effects of region operations into account. Now that MLIR has proper support for specifying memory effects, this interface can be deleted.
Differential Revision: https://reviews.llvm.org/D74441
add convenience method for affine data copy generation for a loop body
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D75822
Putting this up mainly for discussion on
how this should be done. I am interested in MLIR from
the Julia side and we currently have a strong preference
to dynamically linking against the LLVM shared library,
and would like to have a MLIR shared library.
This patch adds a new cmake function add_mlir_library()
which accumulates a list of targets to be compiled into
libMLIR.so. Note that not all libraries make sense to
be compiled into libMLIR.so. In particular, we want
to avoid libraries which primarily exist to support
certain tools (such as mlir-opt and mlir-cpu-runner).
Note that the resulting libMLIR.so depends on LLVM, but
does not contain any LLVM components. As a result, it
is necessary to link with libLLVM.so to avoid linkage
errors. So, libMLIR.so requires LLVM_BUILD_LLVM_DYLIB=on
FYI, Currently it appears that LLVM_LINK_LLVM_DYLIB is broken
because mlir-tblgen is linked against libLLVM.so and
and independent LLVM components.
Previous version of this patch broke depencies on TableGen
targets. This appears to be because it compiled all
libraries to OBJECT libraries (probably because cmake
is generating different target names). Avoiding object
libraries results in correct dependencies.
(updated by Stephen Neuendorffer)
Differential Revision: https://reviews.llvm.org/D73130
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
CMake allows calling target_link_libraries() without a keyword,
but this usage is not preferred when also called with a keyword,
and has surprising behavior. This patch explicitly specifies a
keyword when using target_link_libraries().
Differential Revision: https://reviews.llvm.org/D75725
Summary:
This revision removes all of the functionality related to successor operands on the core Operation class. This greatly simplifies a lot of handling of operands, as well as successors. For example, DialectConversion no longer needs a special "matchAndRewrite" for branching terminator operations.(Note, the existing method was also broken for operations with variadic successors!!)
This also enables terminator operations to define their own relationships with successor arguments, instead of the hardcoded "pass-through" behavior that exists today.
Differential Revision: https://reviews.llvm.org/D75318
The existing API for successor operands on operations is in the process of being removed. This revision simplifies a later one that completely removes the existing API.
Differential Revision: https://reviews.llvm.org/D75316
Summary:
Make computeConversionSet bubble up errors from nested regions. Note
that this doesn't change top-level behavior - since the nested region
calls emitError, the error was visible before, just not surfaced as
quickly.
Differential Revision: https://reviews.llvm.org/D75369
Summary: For example, DenseElementsAttr currently does not properly round-trip unsigned integer values.
Differential Revision: https://reviews.llvm.org/D75374
Putting this up mainly for discussion on
how this should be done. I am interested in MLIR from
the Julia side and we currently have a strong preference
to dynamically linking against the LLVM shared library,
and would like to have a MLIR shared library.
This patch adds a new cmake function add_mlir_library()
which accumulates a list of targets to be compiled into
libMLIR.so. Note that not all libraries make sense to
be compiled into libMLIR.so. In particular, we want
to avoid libraries which primarily exist to support
certain tools (such as mlir-opt and mlir-cpu-runner).
Note that the resulting libMLIR.so depends on LLVM, but
does not contain any LLVM components. As a result, it
is necessary to link with libLLVM.so to avoid linkage
errors. So, libMLIR.so requires LLVM_BUILD_LLVM_DYLIB=on
FYI, Currently it appears that LLVM_LINK_LLVM_DYLIB is broken
because mlir-tblgen is linked against libLLVM.so and
and independent LLVM components.
Previous version of this patch broke depencies on TableGen
targets. This appears to be because it compiled all
libraries to OBJECT libraries (probably because cmake
is generating different target names). Avoiding object
libraries results in correct dependencies.
(updated by Stephen Neuendorffer)
Differential Revision: https://reviews.llvm.org/D73130
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
Putting this up mainly for discussion on
how this should be done. I am interested in MLIR from
the Julia side and we currently have a strong preference
to dynamically linking against the LLVM shared library,
and would like to have a MLIR shared library.
This patch adds a new cmake function add_mlir_library()
which accumulates a list of targets to be compiled into
libMLIR.so. Note that not all libraries make sense to
be compiled into libMLIR.so. In particular, we want
to avoid libraries which primarily exist to support
certain tools (such as mlir-opt and mlir-cpu-runner).
Note that the resulting libMLIR.so depends on LLVM, but
does not contain any LLVM components. As a result, it
is necessary to link with libLLVM.so to avoid linkage
errors. So, libMLIR.so requires LLVM_BUILD_LLVM_DYLIB=on
FYI, Currently it appears that LLVM_LINK_LLVM_DYLIB is broken
because mlir-tblgen is linked against libLLVM.so and
and independent LLVM components
(updated by Stephen Neuendorffer)
Differential Revision: https://reviews.llvm.org/D73130
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
Summary:
NFC - Moved StandardOps/Ops.h to a StandardOps/IR dir to better match surrounding
directories. This is to match other dialects, and prepare for moving StandardOps
related transforms in out for Transforms and into StandardOps/Transforms.
Differential Revision: https://reviews.llvm.org/D74940
Thus far IntegerType has been signless: a value of IntegerType does
not have a sign intrinsically and it's up to the specific operation
to decide how to interpret those bits. For example, std.addi does
two's complement arithmetic, and std.divis/std.diviu treats the first
bit as a sign.
This design choice was made some time ago when we did't have lots
of dialects and dialects were more rigid. Today we have much more
extensible infrastructure and different dialect may want different
modelling over integer signedness. So while we can say we want
signless integers in the standard dialect, we cannot dictate for
others. Requiring each dialect to model the signedness semantics
with another set of custom types is duplicating the functionality
everywhere, considering the fundamental role integer types play.
This CL extends the IntegerType with a signedness semantics bit.
This gives each dialect an option to opt in signedness semantics
if that's what they want and helps code sharing. The parser is
modified to recognize `si[1-9][0-9]*` and `ui[1-9][0-9]*` as
signed and unsigned integer types, respectively, leaving the
original `i[1-9][0-9]*` to continue to mean no indication over
signedness semantics. All existing dialects are not affected (yet)
as this is a feature to opt in.
More discussions can be found at:
https://groups.google.com/a/tensorflow.org/d/msg/mlir/XmkV8HOPWpo/7O4X0Nb_AQAJ
Differential Revision: https://reviews.llvm.org/D72533
It replaces DenseMap output with a SmallVector and it
removes empty loop levels from the output.
Reviewed By: andydavis1, mehdi_amini
Differential Revision: https://reviews.llvm.org/D74658
Summary:
This revision refactors the TypeConverter class to not use inheritance to add type conversions. It instead moves to a registration based system, where conversion callbacks are added to the converter with `addConversion`. This method takes a conversion callback, which must be convertible to any of the following forms(where `T` is a class derived from `Type`:
* Optional<Type> (T type)
- This form represents a 1-1 type conversion. It should return nullptr
or `llvm::None` to signify failure. If `llvm::None` is returned, the
converter is allowed to try another conversion function to perform
the conversion.
* Optional<LogicalResult>(T type, SmallVectorImpl<Type> &results)
- This form represents a 1-N type conversion. It should return
`failure` or `llvm::None` to signify a failed conversion. If the new
set of types is empty, the type is removed and any usages of the
existing value are expected to be removed during conversion. If
`llvm::None` is returned, the converter is allowed to try another
conversion function to perform the conversion.
When attempting to convert a type, the TypeConverter walks each of the registered converters starting with the one registered most recently.
Differential Revision: https://reviews.llvm.org/D74584
This patch extends affine data copy optimization utility with an
optional memref filter argument. When the memref filter is used, data
copy optimization will only generate copies for such a memref.
Note: this patch is just porting the memref filter feature from Uday's
'hop' branch: https://github.com/bondhugula/llvm-project/tree/hop.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D74342
In the previous state, we were relying on forcing the linker to include
all libraries in the final binary and the global initializer to self-register
every piece of the system. This change help moving away from this model, and
allow users to compose pieces more freely. The current change is only "fixing"
the dialect registration and avoiding relying on "whole link" for the passes.
The translation is still relying on the global registry, and some refactoring
is needed to make this all more convenient.
Differential Revision: https://reviews.llvm.org/D74461
Summary:
Adds affine loop fusion transformation function to LoopFusionUtils.
Updates TestLoopFusion utility to run loop fusion transformation until a fixed point is reached.
Adds unit tests to test the transformation.
Includes ASAN bug fix for D73190.
Reviewers: bondhugula, dcaballe
Reviewed By: bondhugula, dcaballe
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74330
The existing (default) calling convention for memrefs in standard-to-LLVM
conversion was motivated by interfacing with LLVM IR produced from C sources.
In particular, it passes a pointer to the memref descriptor structure when
calling the function. Therefore, the descriptor is allocated on stack before
the call. This convention leads to several problems. PR44644 indicates a
problem with stack exhaustion when calling functions with memref-typed
arguments in a loop. Allocating outside of the loop may lead to concurrent
access problems in case the loop is parallel. When targeting GPUs, the contents
of the stack-allocated memory for the descriptor (passed by pointer) needs to
be explicitly copied to the device. Using an aggregate type makes it impossible
to attach pointer-specific argument attributes pertaining to alignment and
aliasing in the LLVM dialect.
Change the default calling convention for memrefs in standard-to-LLVM
conversion to transform a memref into a list of arguments, each of primitive
type, that are comprised in the memref descriptor. This avoids stack allocation
for ranked memrefs (and thus stack exhaustion and potential concurrent access
problems) and simplifies the device function invocation on GPUs.
Provide an option in the standard-to-LLVM conversion to generate auxiliary
wrapper function with the same interface as the previous calling convention,
compatible with LLVM IR porduced from C sources. These auxiliary functions
pack the individual values into a descriptor structure or unpack it. They also
handle descriptor stack allocation if necessary, serving as an allocation
scope: the memory reserved by `alloca` will be freed on exiting the auxiliary
function.
The effect of this change on MLIR-generated only LLVM IR is minimal. When
interfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integration
only needs to require auxiliary functions and change the function name to call
the wrapper function instead of the original function.
This also opens the door to forwarding aliasing and alignment information from
memrefs to LLVM IR pointers in the standrd-to-LLVM conversion.
Summary:
This revision adds a utility to generate debug locations from the IR during compilation, by snapshotting to a output stream and using the locations that operations were dumped in that stream. The new locations may either;
* Replace the original location of the operation.
old:
loc("original_source.cpp":1:1)
new:
loc("snapshot_source.mlir":10:10)
* Fuse with the original locations as NamedLocs with a specific tag.
old:
loc("original_source.cpp":1:1)
new:
loc(fused["original_source.cpp":1:1, "snapshot"("snapshot_source.mlir":10:10)])
This feature may be used by a debugger to display the code at various different levels of the IR. It would also be able to show the different levels of IR attached to a specific source line in the original source file.
This feature may also be used to generate locations for operations generated during compilation, that don't necessarily have a user source location to attach to.
This requires changes in the printer to track the locations of operations emitted in the stream. Moving forward we need to properly(and efficiently) track the number of newlines emitted to the stream during printing.
Differential Revision: https://reviews.llvm.org/D74019
Summary: This is the most common operation performed on a CallOpInterface. This just moves the existing functionality from the CallGraph so that other users can access it.
Differential Revision: https://reviews.llvm.org/D74250
This reverts commit 64871f778d.
ASAN indicates a use-after-free in in mlir::canFuseLoops(mlir::AffineForOp, mlir::AffineForOp, unsigned int, mlir::ComputationSliceState*) lib/Transforms/Utils/LoopFusionUtils.cpp:202:41
Summary:
This revision adds basic support for emitting line table information when exporting to LLVMIR. We don't yet have a story for supporting all of the LLVM debug metadata, so this revision stubs some features(like subprograms) to enable emitting line tables.
Differential Revision: https://reviews.llvm.org/D73934
Summary:
Adds affine loop fusion transformation function to LoopFusionUtils.
Updates TestLoopFusion utility to run loop fusion transformation until a fixed point is reached.
Adds unit tests to test the transformation.
Reviewers: bondhugula, dcaballe, nicolasvasilache
Reviewed By: bondhugula, dcaballe
Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73190
Summary:
MLIRAnalysis depended on MLIRVectorOps
MLIRVectorOps depended on MLIRAnalysis for Loop information.
Both of these can be solved by factoring out libraries related to loop
analysis into their own library. The new MLIRLoopAnalysis might be
better off with the Loop Dialect in the future.
Reviewers: nicolasvasilache, rriddle!, mehdi_amini
Reviewed By: mehdi_amini
Subscribers: Joonsoo, vchuravy, merge_guards_bot, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73655
Summary:
This breaks a cyclic library dependency where MLIRPass used the verifier
in MLIRAnalysis, but MLIRAnalysis also contained passes used for testing.
The presence of the test passes here is archaeology, predating
test/lib/Transform.
Reviewers: rriddle
Reviewed By: rriddle
Subscribers: merge_guards_bot, mgorny, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74067
This is fixing a build error:
error: non-constant-expression cannot be narrowed from type 'unsigned int' to 'Region::iterator::difference_type' (aka 'int') in initializer list
Fix pr44767
Seen on gcc 8, in release mode & assertions off warnings about logger,
made all statements referencing logger inside LLVM_DEBUG blocks and
ifdef a few variables only used in debug.
This is mechanical fix to get CI green.
Summary:
This revision beefs up the debug logging within dialect conversion. Given the nature of multi-level legalization, and legalization in general, it is one of the harder pieces of infrastructure to debug. This revision adds nice formatting to make the output log easier to parse:
```
Legalizing operation : 'std.constant'(0x608000002420) {
* Fold {
} -> FAILURE : unable to fold
* Pattern : 'std.constant -> ()' {
} -> FAILURE : pattern failed to match
* Pattern : 'std.constant -> ()' {
} -> FAILURE : pattern failed to match
* Pattern : 'std.constant -> (spv.constant)' {
** Insert : 'spv.constant'(0x608000002c20)
** Replace : 'std.constant'(0x608000002420)
//===-------------------------------------------===//
Legalizing operation : 'spv.constant'(0x608000002c20) {
} -> SUCCESS : operation marked legal by the target
//===-------------------------------------------===//
} -> SUCCESS : pattern applied successfully
} -> SUCCESS
```
Differential Revision: https://reviews.llvm.org/D73747
The refactored MemRefType::get() calls all intend to clone from another
memref type, with some modifications. In fact, some calls dropped memory space
during the cloning. Migrate them to the cloning API so that nothing gets
dropped if they are not explicitly listed.
It's close to NFC but not quite, as it helps with propagating memory spaces in
some places.
Differential Revision: https://reviews.llvm.org/D73296
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Summary: This pass deletes all symbols that are found to be unreachable. This is done by computing the set of operations that are known to be live, propagating that liveness to other symbols, and then deleting all symbols that are not within this live set.
Differential Revision: https://reviews.llvm.org/D72482
Summary: The new internal representation of operation results now allows for accessing the result types to be more efficient. Changing the API to ArrayRef is more efficient and removes the need to explicitly materialize vectors in several places.
Differential Revision: https://reviews.llvm.org/D73429
Summary: This allows for providing a default "catchall" legality check that is not dependent on specific operations or dialects. For example, this can be useful to check legality based on the specific types of operation operands or results.
Differential Revision: https://reviews.llvm.org/D73379
Summary:
Remove 'valuesToRemoveIfDead' from PatternRewriter API. The removal
functionality wasn't implemented and we decided [1] not to implement it in
favor of having more powerful DCE approaches.
[1] https://github.com/tensorflow/mlir/pull/212
Reviewers: rriddle, bondhugula
Reviewed By: rriddle
Subscribers: liufengdb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72545
Summary:
First step towards the consolidation
of a lot of vector related utilities
that are now all over the place
(or even duplicated).
Reviewers: nicolasvasilache, andydavis1
Reviewed By: nicolasvasilache, andydavis1
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72955
Summary:
This enables tracking calls that cross symbol table boundaries. It also simplifies some of the implementation details of CallableOpInterface, i.e. there can only be one region within the callable operation.
Depends On D72042
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D72043
Summary: A new class is added, IRMultiObjectWithUseList, that allows for representing an IR use list that holds multiple sub values(used in this case for OpResults). This class provides all of the same functionality as the base IRObjectWithUseList, but for specific sub-values. This saves a word per operation result and is a necessary step in optimizing the layout of operation results. For now the use list is placed on the operation itself, so zero-result operations grow by a word. When the work for optimizing layout is finished, this can be moved back to being a trailing object based on memory/runtime benchmarking.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D71955
This means that in-place, or root, updates need to use explicit calls to `startRootUpdate`, `finalizeRootUpdate`, and `cancelRootUpdate`. The major benefit of this change is that it enables in-place updates in DialectConversion, which simplifies the FuncOp pattern for example. The major downside to this is that the cases that *may* modify an operation in-place will need an explicit cancel on the failure branches(assuming that they started an update before attempting the transformation).
PiperOrigin-RevId: 286933674
This is an initial step to refactoring the representation of OpResult as proposed in: https://groups.google.com/a/tensorflow.org/g/mlir/c/XXzzKhqqF_0/m/v6bKb08WCgAJ
This change will make it much simpler to incrementally transition all of the existing code to use value-typed semantics.
PiperOrigin-RevId: 286844725
Rename the 'shlis' operation in the standard dialect to 'shift_left'. Add tests
for this operation (these have been missing so far) and add a lowering to the
'shl' operation in the LLVM dialect.
Add also 'shift_right_signed' (lowered to LLVM's 'ashr') and 'shift_right_unsigned'
(lowered to 'lshr').
The original plan was to name these operations 'shift.left', 'shift.right.signed'
and 'shift.right.unsigned'. This works if the operations are prefixed with 'std.'
in MLIR assembly. Unfortunately during import the short form is ambigous with
operations from a hypothetical 'shift' dialect. The best solution seems to omit
dots in standard operations for now.
Closestensorflow/mlir#226
PiperOrigin-RevId: 286803388
* Fixes use of anonymous namespace for static methods.
* Uses explicit qualifiers(mlir::) instead of wrapping the definition with the namespace.
PiperOrigin-RevId: 286222654
Introduce affine.prefetch: op to prefetch using a multi-dimensional
subscript on a memref; similar to affine.load but has no effect on
semantics, but only on performance.
Provide lowering through std.prefetch, llvm.prefetch and map to llvm's
prefetch instrinsic. All attributes reflected through the lowering -
locality hint, rw, and instr/data cache.
affine.prefetch %0[%i, %j + 5], false, 3, true : memref<400x400xi32>
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#225
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/225 from bondhugula:prefetch 4c3b4e93bc64d9a5719504e6d6e1657818a2ead0
PiperOrigin-RevId: 286212997
This keeps the IR valid and consistent as it is expected that each block should have a valid parent region/operation. Previously, converted blocks were kept floating without a valid parent region.
PiperOrigin-RevId: 285821687
This change allows for DialectConversion to attempt folding as a mechanism to legalize illegal operations. This also expands folding support in OpBuilder::createOrFold to generate new constants when folding, and also enables it to work in the context of a PatternRewriter.
PiperOrigin-RevId: 285448440
It is sometimes useful to create operations separately from the builder before insertion as it may be easier to erase them in isolation if necessary. One example use case for this is folding, as we will only want to insert newly generated constant operations on success. This has the added benefit of fixing some silent PatternRewriter failures related to cloning, as the OpBuilder 'clone' methods don't call createOperation.
PiperOrigin-RevId: 285086242
This allows for users to provide operand_range and result_range in builder.create<> calls, instead of requiring an explicit copy into a separate data structure like SmallVector/std::vector.
PiperOrigin-RevId: 284360710
This class represents a generic abstraction over the different ways to represent a range of Values: ArrayRef<Value *>, operand_range, result_range. This class will allow for removing the many instances of explicit SmallVector<Value *, N> construction. It has the same memory cost as ArrayRef, and only suffers cost from indexing(if+elsing the different underlying representations).
This change only updates a few of the existing usages, with more to be changed in followups; e.g. 'build' API.
PiperOrigin-RevId: 284307996
Statistics are a way to keep track of what the compiler is doing and how effective various optimizations are. It is useful to see what optimizations are contributing to making a particular program run faster. Pass-instance specific statistics take this even further as you can see the effect of placing a particular pass at specific places within the pass pipeline, e.g. they could help answer questions like "what happens if I run CSE again here".
Statistics can be added to a pass by simply adding members of type 'Pass::Statistics'. This class takes as a constructor arguments: the parent pass pointer, a name, and a description. Statistics can be dumped by the pass manager in a similar manner to how pass timing information is dumped, i.e. via PassManager::enableStatistics programmatically; or -pass-statistics and -pass-statistics-display via the command line pass manager options.
Below is an example:
struct MyPass : public OperationPass<MyPass> {
Statistic testStat{this, "testStat", "A test statistic"};
void runOnOperation() {
...
++testStat;
...
}
};
$ mlir-opt -pass-pipeline='func(my-pass,my-pass)' foo.mlir -pass-statistics
Pipeline Display:
===-------------------------------------------------------------------------===
... Pass statistics report ...
===-------------------------------------------------------------------------===
'func' Pipeline
MyPass
(S) 15 testStat - A test statistic
MyPass
(S) 6 testStat - A test statistic
List Display:
===-------------------------------------------------------------------------===
... Pass statistics report ...
===-------------------------------------------------------------------------===
MyPass
(S) 21 testStat - A test statistic
PiperOrigin-RevId: 284022014
Now that we have unrolling as a declarative pattern, we can drop a full pass that has gone stale. In the future we may want to add specific unrolling patterns for VectorTransferReadOp.
PiperOrigin-RevId: 283806880
In the replaceAllUsesExcept utility function called from loop coalescing the
iteration over the use-chain is incorrect. The use list nodes (IROperands) have
next/prev links, and bluntly resetting the use would make the loop to continue
on uses of the value that was replaced instead of the original one. As a
result, it could miss the existing uses and update the wrong ones. Make sure we
increment the iterator before updating the use in the loop body.
Reported-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#291.
PiperOrigin-RevId: 283754195
This CL refactors some of the MLIR vector dependencies to allow decoupling VectorOps, vector analysis, vector transformations and vector conversions from each other.
This makes the system more modular and allows extracting VectorToVector into VectorTransforms that do not depend on vector conversions.
This refactoring exhibited a bunch of cyclic library dependencies that have been cleaned up.
PiperOrigin-RevId: 283660308
tensorflow/mlir#162 introduced a bug that
incorrectly allowed fusion of producer loops with multiple outgoing
edges. This commit fixes that problem. It also introduces a new flag to
disable sibling loop fusion so that we can test producer-consumer fusion
in isolation.
Closestensorflow/mlir#259
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282
PiperOrigin-RevId: 283531105
To simplify the lowering into SPIR-V, while still respecting the ABI
requirements of SPIR-V/Vulkan, split the process into two
1) While lowering a function to SPIR-V (when the function is an entry
point function), allow specifying attributes on arguments and
function itself that describe the ABI of the function.
2) Add a pass that materializes the ABI described in the function.
Two attributes are needed.
1) Attribute on arguments of the entry point function that describe
the descriptor_set, binding, storage class, etc, of the
spv.globalVariable this argument will be replaced by
2) Attribute on function that specifies workgroup size, etc. (for now
only workgroup size).
Add the pass -spirv-lower-abi-attrs to materialize the ABI described
by the attributes.
This change makes the SPIRVBasicTypeConverter class unnecessary and is
removed, further simplifying the SPIR-V lowering path.
PiperOrigin-RevId: 282387587
Change vector op names from VectorFooOp to Vector_FooOp and from
vector::VectorFooOp to vector::FooOp.
Closestensorflow/mlir#257
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/257 from Kayjukh:master dfc3a0e04114885aaec8740d5951d6984d6e1577
PiperOrigin-RevId: 281967461
This moves the different canonicalizations of regions into one place and invokes them in the fixed-point iteration of the canonicalizer.
PiperOrigin-RevId: 281617072
This is a simple multi-level DCE pass that operates pretty generically on
the IR. Its key feature compared to the existing peephole dead op folding
that happens during canonicalization is being able to delete recursively
dead cycles of the use-def graph, including block arguments.
PiperOrigin-RevId: 281568202
This CL uses the pattern rewrite infrastructure to implement a simple VectorOps -> VectorOps legalization strategy to unroll coarse-grained vector operations into finer grained ones.
The transformation is written using local pattern rewrites to allow composition with other rewrites. It proceeds by iteratively introducing fake cast ops and cleaning canonicalizing or lowering them away where appropriate.
This is an example of writing transformations as compositions of local pattern rewrites that should enable us to make them significantly more declarative.
PiperOrigin-RevId: 281555100
This method is needed for N->1 conversion patterns to retrieve remapped
Values used in the original N operations.
Closestensorflow/mlir#237
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/237 from dcaballe:dcaballe/getRemappedValue 1f64fadcf2b203f7b336ff0c5838b116ae3625db
PiperOrigin-RevId: 281321881
This CL utilizies the more robust fusion feasibility analysis being built out in LoopFusionUtils, which will eventually be used to replace the current affine loop fusion pass.
PiperOrigin-RevId: 281112340
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.
PiperOrigin-RevId: 280529784
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
This refactors the implementation of block signature(type) conversion to not insert fake cast operations to perform the type conversion, but to instead create a new block containing the proper signature. This has the benefit of enabling the use of pre-computed analyses that rely on mapping values. It also leads to a much cleaner implementation overall. The major user facing change is that applySignatureConversion will now replace the entry block of the region, meaning that blocks generally shouldn't be cached over calls to applySignatureConversion.
PiperOrigin-RevId: 280226936
This also previously triggered the warning:
warning: missing field 'isRecursivelyLegal' initializer [-Wmissing-field-initializers]
legalOperations[op] = {action};
^
PiperOrigin-RevId: 279399175
A pattern rewriter hook, mergeBlock, is added that allows for merging the operations of one block into the end of another. This is used to support a canonicalization pattern for branch operations that folds the branch when the successor has a single predecessor(the branch block).
Example:
^bb0:
%c0_i32 = constant 0 : i32
br ^bb1(%c0_i32 : i32)
^bb1(%x : i32):
return %x : i32
becomes:
^bb0:
%c0_i32 = constant 0 : i32
return %c0_i32 : i32
PiperOrigin-RevId: 278677825
The current lowering of loops to GPU only supports lowering of loop
nests where the loops mapped to workgroups and workitems are perfectly
nested. Here a new lowering is added to handle lowering of imperfectly
nested loop body with the following properties
1) The loops partitioned to workgroups are perfectly nested.
2) The loop body of the inner most loop partitioned to workgroups can
contain one or more loop nests that are to be partitioned across
workitems. Each individual loops nests partitioned to workitems should
also be perfectly nested.
3) The number of workgroups and workitems are not deduced from the
loop bounds but are passed in by the caller of the lowering as values.
4) For statements within the perfectly nested loop nest partitioned
across workgroups that are not loops, it is valid to have all threads
execute that statement. This is NOT verified.
PiperOrigin-RevId: 277958868
Rewrite patterns may make modifications to the CFG, including dropping edges between blocks. This change adds a simple unreachable block elimination run at the end of each iteration to ensure that the CFG remains valid.
PiperOrigin-RevId: 277545805
When we removed a pattern, we removed it from worklist but not from
worklistMap. Then, when we tried to add a new pattern on the same Operation
again, the pattern wasn't added since it already existed in the
worklistMap (but not in the worklist).
Closestensorflow/mlir#211
PiperOrigin-RevId: 277319669
In some cases, it may be desirable to mark entire regions of operations as legal. This provides an additional granularity of context to the concept of "legal". The `ConversionTarget` supports marking operations, that were previously added as `Legal` or `Dynamic`, as `recursively` legal. Recursive legality means that if an operation instance is legal, either statically or dynamically, all of the operations nested within are also considered legal. An operation can be marked via `markOpRecursivelyLegal<>`:
```c++
ConversionTarget &target = ...;
/// The operation must first be marked as `Legal` or `Dynamic`.
target.addLegalOp<MyOp>(...);
target.addDynamicallyLegalOp<MySecondOp>(...);
/// Mark the operation as always recursively legal.
target.markOpRecursivelyLegal<MyOp>();
/// Mark optionally with a callback to allow selective marking.
target.markOpRecursivelyLegal<MyOp, MySecondOp>([](Operation *op) { ... });
/// Mark optionally with a callback to allow selective marking.
target.markOpRecursivelyLegal<MyOp>([](MyOp op) { ... });
```
PiperOrigin-RevId: 277086382
This allows for them to be used on other non-function, or even other function-like, operations. The algorithms are already generic, so this is simply changing the derived pass type. The majority of this change is just ensuring that the nesting of these passes remains the same, as the pass manager won't auto-nest them anymore.
PiperOrigin-RevId: 276573038
This allows mixing linalg operations with vector transfer operations (with additional modifications to affine ops) and is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275543361
This Chapter now introduces and makes use of the Interface concept
in MLIR to demonstrate ShapeInference.
END_PUBLIC
Closestensorflow/mlir#191
PiperOrigin-RevId: 275085151
The current SignatureConversion framework (part of DialectConversion)
allows remapping input arguments to a function from 1->0, 1->1 or
1->many arguments during conversion. Another case is where the
argument itself is dropped, but it's use are remapped to another
Value*.
An example of this is: The Vulkan/SPIR-V spec requires entry functions
to be of type void(void). The GPU -> SPIR-V conversion implemented
this without having the DialectConversion framework track the
remapping that lead to some undefined behavior. The changes here
addresses that.
PiperOrigin-RevId: 275059656
b843cc5d5a introduced a new op LICM transformation and a LoopLike interface,
but missed the CMake aspects of it. This should fix the build.
PiperOrigin-RevId: 275038533
When dealing with regions, or other patterns that need to generate temporary operations, it is useful to be able to replace other operations than the root op being matched. Before this PR, these operations would still be considered for legalization meaning that the conversion would either fail, erroneously need to mark these ops as legal, or add unnecessary patterns.
PiperOrigin-RevId: 274598513
This will allow for inlining newly devirtualized calls, as well as give a more accurate cost model(when we have one). Currently canonicalization will only run for nodes that have no child edges, as the child nodes may be erased during canonicalization. We can support this in the future, but it requires more intricate deletion tracking.
PiperOrigin-RevId: 274011386
When an operation with regions gets replaced, we currently require that all of the remaining nested operations are still converted even though they are going to be replaced when the rewrite is finished. This cl adds a tracking for a minimal set of operations that are known to be "dead". This allows for ignoring the legalization of operations that are won't survive after conversion.
PiperOrigin-RevId: 274009003
This PR is a stepping stone towards supporting generic multi-store
source loop nests in affine loop fusion. It extends the algorithm to
support fusion of multi-store loop nests that:
1. have only one store that writes to a function-local live out, and
2. the remaining stores are involved in loop nest self dependences
or no dependences within the function.
Closestensorflow/mlir#162
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/162 from dcaballe:dcaballe/multi-output-fusion 7fb7dec6fe8b45f5ce176f018bfe37b256420c45
PiperOrigin-RevId: 273773907
This is similar to the `inlineRegionBefore` hook, except the original blocks are unchanged. The region to be cloned *must* not have been modified during the conversion process at the point of cloning, i.e. it must belong an operation that has yet to be converted, or the operation that is currently being converted.
PiperOrigin-RevId: 273622533
- bodies would earlier appear in the order (i, i+3, i+2, i+1) instead of
(i, i+1, i+2, i+3) for example for factor 4.
- clean up hardcoded test cases
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#170
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/170 from bondhugula:ujam b66b405b2b1894a03b376952e32a9d0292042665
PiperOrigin-RevId: 273613131
Some dialects have implicit conversions inherent in their modeling, meaning that a call may have a different type that the type that the callable expects. To support this, a hook is added to the dialect interface that allows for materializing conversion operations during inlining when there is a mismatch. A hook is also added to the callable interface to allow for introspecting the expected result types.
PiperOrigin-RevId: 272814379
This allows for the inliner to work on arbitrary call operations. The updated inliner will also work bottom-up through the callgraph enabling support for multiple levels of inlining.
PiperOrigin-RevId: 272813876
The generated build methods have result type before the arguments (operands and attributes, which are also now adjacent in the explicit create method). This also results in changing the create method's ordering to match most build method's ordering.
PiperOrigin-RevId: 271755054
- also remove stale terminology/references in docs
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#148
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/148 from bondhugula:cleanup e846b641a3c2936e874138aff480a23cdbf66591
PiperOrigin-RevId: 271618279
The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues.
Solving the ABI problem generally is a very complex problem and most likely involves taking
a dependence on clang that we do not want atm.
A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement.
PiperOrigin-RevId: 271591708
- fix store to load forwarding for a certain set of cases (where
forwarding shouldn't have happened); use AffineValueMap difference
based MemRefAccess equality checking; utility logic is also greatly
simplified
- add missing equality/inequality operators for AffineExpr ==/!= ints
- add == != operators on MemRefAccess
Closestensorflow/mlir#136
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/136 from bondhugula:store-load-forwarding d79fd1add8bcfbd9fa71d841a6a9905340dcd792
PiperOrigin-RevId: 270457011
computeDepth calls itself recursively, which may insert into minPatternDepth. minPatternDepth is a DenseMap, which invalidates iterators on insertion, so this may lead to asan failures.
PiperOrigin-RevId: 270374203
- allow symbols in index remapping provided for memref replacement
- fix memref normalize crash on cases with layout maps with symbols
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Reported by: Alex Zinenko
Closestensorflow/mlir#139
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/139 from bondhugula:memref-rep-symbols 2f48c1fdb5d4c58915bbddbd9f07b18541819233
PiperOrigin-RevId: 269851182
- add canonicalization pattern to compose maps into affine loads/stores;
templatize the pattern and reuse it for affine.apply as well
- rename getIndices -> getMapOperands() (getIndices is confusing since
these are no longer the indices themselves but operands to the map
whose results are the indices). This also makes the accessor uniform
across affine.apply/load/store. Change arg names on the affine
load/store builder to avoid confusion. Drop an unused confusing build
method on AffineStoreOp.
- update incomplete doc comment for canonicalizeMapAndOperands (this was
missed from a previous update).
Addresses issue tensorflow/mlir#121
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#122
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/122 from bondhugula:compose-load-store e71de1771e56a85c4282c10cb43f30cef0701c4f
PiperOrigin-RevId: 269619540
When performing A->B->C conversion, an operation may still refer to an operand of A. This makes it necessary to unmap through multiple levels of replacement for a specific value.
PiperOrigin-RevId: 269367859
- turn copy/dma generation method into a utility in LoopUtils, allowing
it to be reused elsewhere.
- no functional/logic change to the pass/utility
- trim down header includes in files affected
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#124
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/124 from bondhugula:datacopy 9f346e62e5bd9dd1986720a30a35f302eb4d3252
PiperOrigin-RevId: 269106088
- take care of symbolic operands with alloc
- add missing check for compose map failure and a test case
- add test cases on strides
- drop incorrect check for one-to-one'ness
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#132
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/132 from bondhugula:normalize-memrefs 8aebf285fb0d7c19269d85255aed644657e327b7
PiperOrigin-RevId: 269105947
* Add GraphTraits that treat a block as a graph, Operation* as node and use-relationship for edges;
- Just basic graph output;
* Add use iterator to iterate over all uses of an Operation;
* Add testing pass to generate op graph;
This does not support arbitrary operations other than function nor nested regions yet.
PiperOrigin-RevId: 268121782
This will allow clients to implement a different collection strategy on these
values, including collecting each uses within the region for example.
PiperOrigin-RevId: 267803978
This defines a set of initial utilities for inlining a region(or a FuncOp), and defines a simple inliner pass for testing purposes.
A new dialect interface is defined, DialectInlinerInterface, that allows for dialects to override hooks controlling inlining legality. The interface currently provides the following hooks, but these are just premilinary and should be changed/added to/modified as necessary:
* isLegalToInline
- Determine if a region can be inlined into one of this dialect, *or* if an operation of this dialect can be inlined into a given region.
* shouldAnalyzeRecursively
- Determine if an operation with regions should be analyzed recursively for legality. This allows for child operations to be closed off from the legality checks for operations like lambdas.
* handleTerminator
- Process a terminator that has been inlined.
This cl adds support for inlining StandardOps, but other dialects will be added in followups as necessary.
PiperOrigin-RevId: 267426759
- address remaining comments from PR tensorflow/mlir#87 for better test coverage for
pipeline-data-transfer/replaceAllMemRefUsesWith
- remove dead tag allocs the same way they are removed for the replaced buffers
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#106
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/106 from bondhugula:followup 9e868666d047e8d43e5f82f43e4093b838c710fa
PiperOrigin-RevId: 267144774
- introduce utility to convert memrefs with non-identity layout maps to
ones with identity layout maps: convert the type and rewrite/remap all
its uses
- add this utility to -simplify-affine-structures pass for testing
purposes
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#104
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/104 from bondhugula:memref-normalize f2c914aa1890e8860326c9e33f9aa160b3d65e6d
PiperOrigin-RevId: 266985317
- the [begin, end) range identified for copying could end in between the
block, which makes hoisting invalid in some cases. Change the range
identification to always end with end of block.
- add test case to exercise these (with fast mem capacity set to minimal so
that single element memref buffers are generated at the innermost loop)
- the location of begin/end of the block range for data copying was
being confused with the insert points for copy in and copy out code.
In cases, where we choose to hoist transfers, these are separate.
- when copy loops are single iteration ones, promote their bodies at
the end of the pass.
- change default fast mem space to 1 (setting it to zero made it
generate DMA op's that won't verify in the default case - since the
DMA ops have a check for src/dest memref spaces being different).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Co-Authored-By: Mehdi Amini <joker.eph@gmail.com>
Closestensorflow/mlir#88
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/88 from bondhugula:datacopy 88697267c45e850c3ced87671e16e4a930c02a42
PiperOrigin-RevId: 266980911
This interface will allow for providing hooks to interrop with operation folding. The first hook, 'shouldMaterializeInto', will allow for controlling which region to insert materialized constants into. The folder will generally materialize constants into the top-level isolated region, this allows for materializing into a lower level ancestor region if it is more profitable/correct.
PiperOrigin-RevId: 266702972
This is done by providing a walk callback that returns a WalkResult. This result is either `advance` or `interrupt`. `advance` means that the walk should continue, whereas `interrupt` signals that the walk should stop immediately. An example is shown below:
auto result = op->walk([](Operation *op) {
if (some_invariant)
return WalkResult::interrupt();
return WalkResult::advance();
});
if (result.wasInterrupted())
...;
PiperOrigin-RevId: 266436700
This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example:
op->walk<AffineForOp>([](AffineForOp op) { ... });
is now accomplished via:
op->walk([](AffineForOp op) { ... });
PiperOrigin-RevId: 266209552
Refactor replaceAllMemRefUsesWith to split it into two methods: the new
method does the replacement on a single op, and is used by the existing
one.
- make the methods return LogicalResult instead of bool
- Earlier, when replacement failed (due to non-deferencing uses of the
memref), the set of ops that had already been processed would have
been replaced leaving the IR in an inconsistent state. Now, a
pass is made over all ops to first check for non-deferencing
uses, and then replacement is performed. No test cases were affected
because all clients of this method were first checking for
non-deferencing uses before calling this method (for other reasons).
This isn't true for a use case in another upcoming PR (scalar
replacement); clients can now bail out with consistent IR on failure
of replaceAllMemRefUsesWith. Add test case.
- multiple deferencing uses of the same memref in a single op is
possible (we have no such use cases/scenarios), and this has always
remained unsupported. Add an assertion for this.
- minor fix to another test pipeline-data-transfer case.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#87
PiperOrigin-RevId: 265808183
Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.
PiperOrigin-RevId: 263953918
Often we want to ensure that block arguments are converted before operations that use them. This refactors the current implementation to be cleaner/less frequent by triggering conversion when a set of blocks are moved/inlined; or when legalization is successful.
PiperOrigin-RevId: 263795005
Since raw pointers are always passed around for IR construct without
implying any ownership transfer, it can be error prone to have implicit
ownership transferred the same way.
For example this code can seem harmless:
Pass *pass = ....
pm.addPass(pass);
pm.addPass(pass);
pm.run(module);
PiperOrigin-RevId: 263053082
There are currently several different terms used to refer to a parent IR unit in 'get' methods: getParent/getEnclosing/getContaining. This cl standardizes all of these methods to use 'getParent*'.
PiperOrigin-RevId: 262680287
This will allow for reusing the same pattern list, which may be costly to continually reconstruct, on multiple invocations.
PiperOrigin-RevId: 262664599
This CL is step 2/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds the vector.extractelement operation to the MLIR vector dialect as well as the appropriate roundtrip test. Lowering to LLVM will occur in the following CL.
PiperOrigin-RevId: 262545089
These methods will allow replacing the uses of results with an existing operation, with the same number of results, or a range of values. This removes a number of hand-rolled result replacement loops and simplifies replacement for operations with multiple results.
PiperOrigin-RevId: 262206600
This CL modifies the LowerLinalgToLoopsPass to use RewritePattern.
This will make it easier to inline Linalg generic functions and regions when emitting to loops in a subsequent CL.
PiperOrigin-RevId: 261894120
This allows for proper forward declaration, as opposed to leaking the internal implementation via a using directive. This also allows for all pattern building to go through 'insert' methods on the OwningRewritePatternList, replacing uses of 'push_back' and 'RewriteListBuilder'.
PiperOrigin-RevId: 261816316
When inlining the declaration for llvm::DenseSet/DenseMap in the mlir
namespace from a forward declaration, clang does not take the default
for the template parameters if their are declared later.
namespace llvm {
template<typename Foo>
class DenseMap;
}
namespace mlir {
using llvm::DenseMap;
}
namespace llvm {
template<typename Foo = int>
class DenseMap {};
}
namespace mlir {
DenseMap<> map;
}
PiperOrigin-RevId: 261495612
AffineDataCopyGeneration pass relied on command line flags for internal logic
in several places, which makes it unusable in a library context (i.e. outside a
standalone mlir-opt binary that does the command line parsing). Define
configuration flags in the constructor instead, and set them up to command
line-based defaults to maintain the original behavior.
PiperOrigin-RevId: 261322364
Clipping creates non-affine memory accesses, use std_load and std_store instead of affine_load and affine_store.
In the future we may also want a fill with the neutral element rather than clip, this would make the accesses affine if we wanted more analyses and transformations to happen post lowering to pointwise copies.
PiperOrigin-RevId: 260110503
This mode analyzes which operations are legalizable to the given target if a conversion were to be applied, i.e. no rewrites are ever performed even on success. This mode is useful for device partitioning or other utilities that may want to analyze the effect of conversion to different targets before performing it.
The analysis method currently just fills a provided set with the operations that were found to be legalizable. This can be extended in the future to capture more information as necessary.
PiperOrigin-RevId: 259987105
This cl enforces that the conversion of the type signatures for regions, and thus their entry blocks, is handled via ConversionPatterns. A new hook 'applySignatureConversion' is added to the ConversionPatternRewriter to perform the desired conversion on a region. This also means that the handling of rewriting the signature of a FuncOp is moved to a pattern. A default implementation is provided via 'mlir::populateFuncOpTypeConversionPattern'. This removes the hacky implicit 'dynamically legal' status of FuncOp that was present previously, and leaves it up to the user to decide when/how to convert the signature of a function.
PiperOrigin-RevId: 259161999
This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values.
A corresponding unit test is added.
For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop:
```
loop.for %i = %lb to %ub step %step {
...
}
```
is rewritten into a version resembling the following pseudo-IR:
```
loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub
step %gridDim.x * blockDim.x {
...
}
```
PiperOrigin-RevId: 258945942
This CL adapts the recently introduced parametric tiling to have an API matching the tiling
of AffineForOp. The transformation using stripmineSink is more general and produces imperfectly nested loops.
Perfect nesting invariants of the tiled version are obtained by selectively applying hoisting of ops to isolate perfectly nested bands. Such hoisting may fail to produce a perfect loop nest in cases where ForOp transitively depend on enclosing induction variables. In such cases, the API provides a LogicalResult return but the SimpleParametricLoopTilingPass does not currently use this result.
A new unit test is added with a triangular loop for which the perfect nesting property does not hold. For this example, the old behavior was to produce IR that did not verify (some use was not dominated by its def).
PiperOrigin-RevId: 258928309
This allows for providing specific handling for dynamically legal operations/dialects without overriding the general 'isDynamicallyLegal' hook. This also means that a derived ConversionTarget class need not always be defined when some operations are dynamically legal.
Example usage:
ConversionTarget target(...);
target.addDynamicallyLegalOp<ReturnOp>([](ReturnOp op) {
return ...
};
target.addDynamicallyLegalDialect<StandardOpsDialect>([](Operation *op) {
return ...
};
PiperOrigin-RevId: 258884753
This specific PatternRewriter will allow for exposing hooks in the future that are only useful for the conversion framework, e.g. type conversions.
PiperOrigin-RevId: 258818122
This cl begins a large refactoring over how signature types are converted in the DialectConversion infrastructure. The signatures of blocks are now converted on-demand when an operation held by that block is being converted. This allows for handling the case where a region is created as part of a pattern, something that wasn't possible previously.
This cl also generalizes the region signature conversion used by FuncOp to work on any region of any operation. This generalization allows for removing the 'apply*Conversion' functions that were specific to FuncOp/ModuleOp. The implementation currently uses a new hook on TypeConverter, 'convertRegionSignature', but this should ideally be removed in favor of using Patterns. That depends on adding support to the PatternRewriter used by ConversionPattern to allow applying signature conversions to regions, which should be coming in a followup.
PiperOrigin-RevId: 258645733
This explicit tag is useful is several ways:
*) This simplifies how to mark sub sections of a dialect as explicitly unsupported, e.g. my target supports all operations in the foo dialect except for these select few. This is useful for partial lowerings between dialects.
*) Partial conversions will now verify that operations that were explicitly marked as illegal must be converted. This provides some guarantee that the operations that need to be lowered by a specific pass will be.
PiperOrigin-RevId: 258582879
Users generally want several different modes of conversion. This cl refactors DialectConversion to provide two:
* Partial (applyPartialConversion)
- This mode allows for illegal operations to exist in the IR, and does not fail if an operation fails to be legalized.
* Full (applyFullConversion)
- This mode fails if any operation is not properly legalized to the conversion target. This allows for ensuring that the IR after a conversion only contains operations legal for the target.
PiperOrigin-RevId: 258412243
These methods don't compose well with the rest of conversion framework, and create artificial breaks in conversion. Replace these methods with two(populateAffineToStdConversionPatterns and populateLoopToStdConversionPatterns respectively) that populate a list of patterns to perform the same behavior.
PiperOrigin-RevId: 258219277
This cl changes the way that operations/blocks to convert are collected/traversed so that parent region operations can be legalized before their bodies. Most RewritePatterns for region operations assume that the entry arguments to each region are yet to be converted. Given that the bodies are currently converted first, this makes it difficult to fit these patterns into the same run as one converting types.
The operations/blocks to convert are now collected before any legalization has run, which simplifies the conversion logic itself, as legalization may insert new operations, move blocks, etc.
PiperOrigin-RevId: 258170158
ConversionPattern should ideally only be used when the types of the operands are changing, which in this case they aren't. Using OpRewritePattern also lends to much simpler code.
PiperOrigin-RevId: 258158474
Multiple (perfectly) nested loops with independent bounds can be combined into
a single loop and than subdivided into blocks of arbitrary size for load
balancing or more efficient parallelism exploitation. However, MLIR wants to
preserve the multi-dimensional multi-loop structure at higher levels of
abstraction. Introduce a transformation that coalesces nested loops with
independent bounds so that they can be further subdivided by tiling.
PiperOrigin-RevId: 258151016
These ops should not belong to the std dialect.
This CL extracts them in their own dialect and updates the corresponding conversions and tests.
PiperOrigin-RevId: 258123853
When using a RewritePattern and replacing an operation with an existing value, that value may have already been replaced by something else. This cl ensures that only the final value is used when applying rewrites.
PiperOrigin-RevId: 258058488
This field wasn't updated as the insertion point changed, making it potentially dangerous given the multi-level of MLIR(e.g. 'createBlock' would always insert the new block in 'region'). This also allows for building an OpBuilder with just a context.
PiperOrigin-RevId: 257829135
The GreedyPatternRewriteDriver currently does not notify the OperationFolder when constants are removed as part of a pattern match. This materializes in a nasty bug where a different operation may be allocated to the same address. This causes an assertion in the OperationFolder when it gets notified of the new operations removal.
PiperOrigin-RevId: 257817627