This change makes the ModuleTranslation threadsafe by locking on the
LLVMContext. Furthermore, we now clone the llvm module into a new
context when compiling to PTX similar to what the OrcJit does.
Differential Revision: https://reviews.llvm.org/D78207
Summary:
OpBase.td defined attributes kind for all integer types expect index. This
commit fixes that by adding an IndexAttr attribute kind.
Differential Revision: https://reviews.llvm.org/D78195
Summary:
Modified AffineMap::get to remove support for the overload which allowed
an ArrayRef of AffineExpr but no context (and gathered the context from a
presumed first entry, resulting in bugs when there were 0 results).
Instead, we support only a ArrayRef and a context, and a version which
takes a single AffineExpr.
Additionally, removed some now needless case logic which previously
special cased which call to AffineMap::get to use.
Reviewers: flaub, bondhugula, rriddle!, nicolasvasilache, ftynse, ulysseB, mravishankar, antiagainst, aartbik
Subscribers: mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, bader, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78226
This revision introduces a utility to unswitch affine.for/parallel loops
by hoisting affine.if operations past surrounding affine.for/parallel.
The hoisting works for both perfect/imperfect nests and in the presence
of else blocks. The hoisting is currently to as outermost a level as
possible. Uses a test pass to test the utility.
Add convenience method Operation::getParentWithTrait<Trait>.
Depends on D77487.
Differential Revision: https://reviews.llvm.org/D77870
Introduce mlir::applyOpPatternsAndFold which applies patterns as well as
any folding only on a specified op (in contrast to
applyPatternsAndFoldGreedily which applies patterns only on the regions
of an op isolated from above). The caller is made aware of the op being
folded away or erased.
Depends on D77485.
Differential Revision: https://reviews.llvm.org/D77487
Summary:
This revision adds two utilities currently present in MLIR to LLVM StringExtras:
* convertToSnakeFromCamelCase
Convert a string from a camel case naming scheme, to a snake case scheme
* convertToCamelFromSnakeCase
Convert a string from a snake case naming scheme, to a camel case scheme
Differential Revision: https://reviews.llvm.org/D78167
This class implements a switch-like dispatch statement for a value of 'T' using dyn_cast functionality. Each `Case<T>` takes a callable to be invoked if the root value isa<T>, the callable is invoked with the result of dyn_cast<T>() as a parameter.
Differential Revision: https://reviews.llvm.org/D78070
These have proved incredibly useful for interleaving values between a range w.r.t to streams. After this revision, the mlir/Support/STLExtras.h is empty. A followup revision will remove it from the tree.
Differential Revision: https://reviews.llvm.org/D78067
This revision moves the various range utilities present in MLIR to LLVM to enable greater reuse. This revision moves the following utilities:
* indexed_accessor_*
This is set of utility iterator/range base classes that allow for building a range class where the iterators are represented by an object+index pair.
* make_second_range
Given a range of pairs, returns a range iterating over the `second` elements.
* hasSingleElement
Returns if the given range has 1 element. size() == 1 checks end up being very common, but size() is not always O(1) (e.g., ilist). This method provides O(1) checks for those cases.
Differential Revision: https://reviews.llvm.org/D78064
This revision moves several type_trait utilities from MLIR into LLVM. Namely, this revision adds:
is_detected - This matches the experimental std::is_detected
is_invocable - This matches the c++17 std::is_invocable
function_traits - A utility traits class for getting the argument and result types of a callable type
Differential Revision: https://reviews.llvm.org/D78059
This revision adds a DenseMapInfo overload for std::tuples whose elements all have a DenseMapInfo. The implementation is similar to that of std::pair, and has been used within MLIR for over a year.
Differential Revision: https://reviews.llvm.org/D78057
Summary:
Also,
- add IndexTensor to OpBase.td
- fix typo in the op name. It was mistakenly `to_tensor` instead of
`to_extent_tensor`.
Differential Revision: https://reviews.llvm.org/D78149
The inversePermutation method returns a null map on failure. Update
uses of this method within Linalg to handle this. In LinalgToLoops the
null return value was used to emit scalar code. Modify that to return
failure, and emit scalar implementation when affine map is "empty",
i.e. 1 dims, 0 symbols and no result exprs.
Differential Revision: https://reviews.llvm.org/D77964
Summary: Functional.h contains many different methods that have a direct, and more efficient, equivalent in LLVM. This revision replaces all usages with the LLVM equivalent, and removes the header. This is part of larger cleanup, pr45513, merging MLIR support facilities into LLVM.
Differential Revision: https://reviews.llvm.org/D78053
This change is NFC since the facility to tile and generate
loop.parallel loops already exists in Linalg.
Differential Revision: https://reviews.llvm.org/D77965
The outer parallel loops of a linalg operation is lowered to
loop.parallel, with the other loops lowered to loop.for. This gets the
lowering to loop.parallel on par with the loop.for lowering. In future
the reduction loop could also be lowered to loop.parallel.
Also add a utility function that returns the loops that are
created.
Differential Revision: https://reviews.llvm.org/D77678
Summary: This revision adds blurbs of documentation to various different passes, namely: Canonicalizer, CSE, LocationSnapshot, StripDebugInfo, and SymbolDCE.
Differential Revision: https://reviews.llvm.org/D78007
This commit added stride support in runtime array types. It also
adjusted the assembly form for the stride from `[N]` to `stride=N`.
This makes the IR more readable, especially for the cases where
one mix array types and struct types.
Differential Revision: https://reviews.llvm.org/D78034
Summary:
This revision adds generation of two utility methods during EnumGen:
```
llvm::Optional<EnumType> symbolizeEnum<EnumType>(llvm::StringRef)
<stringifyResult> stringifyEnum(EnumType);
```
This provides a generic interface for stringifying/symbolizing any enum that can be used in a template environment.
Differential Revision: https://reviews.llvm.org/D77937
Summary: This revision makes the registration of command line options for these two files manual with `registerMLIRContextCLOptions` and `registerAsmPrinterCLOptions` methods. This removes the last remaining static constructors within lib/.
Differential Revision: https://reviews.llvm.org/D77960
Summary: std::function has a notoriously large amount of malloc traffic, whereas function_ref is a cheaper and more efficient alternative.
Differential Revision: https://reviews.llvm.org/D77959
Summary:
Identifier doesn't maintain a length, so every time strref() is called,
it does a strlen. In the case of comparisons, this isn't necessary:
there is no need to scan a string to get its length, then rescan it to
do the comparison. Just done one comparison.
This also moves some assertions in Identifier::get as another
microoptimization for 'assertions enabled' modes.
Reviewers: rriddle!
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77958
OperatioFolder::tryToFold performs both true folding and in a few
instances in-place updates through op rewrites. In the latter case, we
should still be applying the supplied pattern rewrites in the same
iteration; however this wasn't the case since tryToFold returned
success() for both true folding and in-place updates, and the patterns
for the in-place updated ops were being applied only in the next
iteration of the driver's outer loop. This fix would make it converge
faster.
Differential Revision: https://reviews.llvm.org/D77485
Summary: ClassID is a bit janky right now as it involves passing a magic pointer around. This revision hides the internal implementation mechanism within a new class TypeID. This class is a value-typed wrapper around the original ClassID implementation.
Differential Revision: https://reviews.llvm.org/D77768
Summary: This hook allows for passes to specify the command line argument without the need for registration. More concretely this will allow for generating pass crash reproducers without needing to have the passes registered. This should remove the need for production tools to register passes, leaving that solely to development tools like mlir-opt.
Differential Revision: https://reviews.llvm.org/D77907
Summary: This revision adds support for specifying operands or results as "optional". This is a special case of variadic where the number of elements is either 0 or 1. Operands and results of this kind will have accessors generated using Value instead of the range types, making it more natural to interface with.
Differential Revision: https://reviews.llvm.org/D77863
Summary:
The string in the location is used to provide metadata for the fused location
or create a NamedLoc. This allows tagging individual locations to convey
additional rewrite information.
Differential Revision: https://reviews.llvm.org/D77840
Summary:
This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745).
While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.
Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:
1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax
2. implicit and eager discovery of dims and symbols when parsing
3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)
A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.
For the following "Tensor Comprehension-inspired" string:
```
def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}
```
With -gen-ods-decl=1, this emits (modulo formatting):
```
def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
NInputs<2>,
NOutputs<1>,
NamedStructuredOpTraits]> {
let arguments = (ins Variadic<LinalgOperand>:$views);
let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
let extraClassDeclaration = [{
llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
void regionBuilder(ArrayRef<BlockArgument> args);
}];
let hasFolder = 1;
}
```
With -gen-ods-impl, this emits (modulo formatting):
```
llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getReductionIteratorTypeName() };
}
llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
{
MLIRContext *context = getContext();
AffineExpr d0, d1, d2, d3;
bindDims(context, d0, d1, d2, d3);
return SmallVector<AffineMap, 8>{
AffineMap::get(4, 0, {d0, d1, d3}),
AffineMap::get(4, 0, {d3, d2}),
AffineMap::get(4, 0, {d0, d1, d2}) };
}
void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
using namespace edsc;
using namespace intrinsics;
ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);
ValueHandle _4 = std_mulf(_0, _1);
ValueHandle _5 = std_addf(_2, _4);
(linalg_yield(ValueRange{ _5 }));
}
```
Differential Revision: https://reviews.llvm.org/D77067
This patch adds support for taskwait and taskyield operations in OpenMP dialect and translation of the these constructs to LLVM IR. The OpenMP IRBuilder is used for this translation.
The patch includes code changes and a testcase modifications.
Differential Revision: https://reviews.llvm.org/D77634
Rename mlir::applyPatternsGreedily -> applyPatternsAndFoldGreedily. The
new name is a more accurate description of the method - it performs
both, application of the specified patterns and folding of all ops in
the op's region irrespective of whether any patterns have been supplied.
Differential Revision: https://reviews.llvm.org/D77478
Introduce a new operation property / trait (AutomaticAllocationScope)
for operations with regions that define a new scope for automatic allocations;
such allocations (typically realized on stack) are automatically freed when
control leaves such ops' regions. std.alloca's are freed at the closest
surrounding op that has this trait. All FunctionLike operations should normally
have this trait.
Differential Revision: https://reviews.llvm.org/D77787
Summary:
LLVM matrix intrinsics recently introduced an option to support row-major mode.
This matches the MLIR vector model, this revision switches to row-major.
A corner case related to degenerate sizes was also fixed upstream.
This revision removes the guard against this corner case.
A bug was uncovered on the output vector construction which this revision also fixes.
Lastly, this has been tested on a small size and benchmarked independently: no visible performance regression is observed.
In the future, when matrix intrinsics support per op attribute, we can more aggressively translate to that and avoid inserting MLIR-level transposes.
This has been tested independently to work on small matrices.
Differential Revision: https://reviews.llvm.org/D77761
Summary:
This revision adds support to lower 1-D vector transfers to LLVM.
A mask of the vector length is created that compares the base offset + linear index to the dim of the vector.
In each position where this does not overflow (i.e. offset + vector index < dim), the mask is set to 1.
A notable fact is that the lowering uses llvm.dialect_cast to allow writing code in the simplest form by targeting the simplest mix of vector and LLVM dialects and
letting other conversions kick in.
Differential Revision: https://reviews.llvm.org/D77703
Summary: Some pattern rewriters, like dialect conversion, prohibit the unbounded recursion(or reapplication) of patterns on generated IR. Most patterns are not written with recursive application in mind, so will generally explode the stack if uncaught. This revision adds a hook to RewritePattern, `hasBoundedRewriteRecursion`, to signal that the pattern can safely be applied to the generated IR of a previous application of the same pattern. This allows for establishing a contract between the pattern and rewriter that the pattern knows and can handle the potential recursive application.
Differential Revision: https://reviews.llvm.org/D77782
Minor fixes and cleanup for ShapedType accessors, use
ShapedType::kDynamicSize, add ShapedType::isDynamicDim.
Differential Revision: https://reviews.llvm.org/D77710
Summary: ClassID is used as a type id and must be unique in the face of shared libraries to ensure correctness. This fixes failures related to BUILD_SHARED_LIBs on macos.
Differential Revision: https://reviews.llvm.org/D77764
Summary: Pass options are a better choice for various reasons and avoid the need for static constructors.
Differential Revision: https://reviews.llvm.org/D77707
Support to recognize and deal with aligned_alloc was recently added to
LLVM's TLI/MemoryBuiltins and its various optimization passes. This
revision adds support for generation of aligned_alloc's when lowering
AllocOp from std to LLVM. Setting 'use-aligned_alloc=1' will lead to
aligned_alloc being used for all heap allocations. An alignment and size
that works with the constraints of aligned_alloc is chosen.
Using aligned_alloc is preferable to "using malloc and adjusting the
allocated pointer to align for indexing" because the pointer access
arithmetic done for the latter only makes it harder for LLVM passes to
deal with for analysis, optimization, attribute deduction, and rewrites.
Differential Revision: https://reviews.llvm.org/D77528
Summary:
This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex.
Differential Revision: https://reviews.llvm.org/D77367
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
ModulePass doesn't provide any special utilities and thus doesn't give enough benefit to warrant a special pass class. This revision replaces all usages with the more general OperationPass.
Differential Revision: https://reviews.llvm.org/D77339
Summary:
Add directive to indicate the location to give to op being created. This
directive is optional and if unused the location will still be the fused
location of all source operations.
Currently this directive only works with other op locations, reusing an
existing op location or a fusion of op locations. But doesn't yet support
supplying metadata for the FusedLoc.
Based off initial revision by antiagainst@ and effectively mirrors GlobalIsel
debug_locations directive.
Differential Revision: https://reviews.llvm.org/D77649
Summary:
* Removal of FxpMathOps was discussed on the mailing list.
* Will send a courtesy note about also removing the Quantizer (which had some dependencies on FxpMathOps).
* These were only ever used for experimental purposes and we know how to get them back from history as needed.
* There is a new proposal for more generalized quantization tooling, so moving these older experiments out of the way helps clean things up.
Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77479
Summary: Diagnostics may be cached in the parallel diagnostic handler to preserve proper ordering. Storing the Operation as a DiagnosticArgument is problematic as the operation may be erased or changed before it finally gets printed.
Differential Revision: https://reviews.llvm.org/D77675
Introduce the alloca op for stack memory allocation. When converting to the
LLVM dialect, this is lowered to an llvm.alloca. Refactor the std to
llvm conversion for alloc op to reuse with alloca. Drop useAlloca option
with alloc op lowering.
Differential Revision: https://reviews.llvm.org/D76602
Fix point-wise copy generation to work with bounds that have max/min.
Change structure of copy loop nest to use absolute loop indices and
subtracting base from the indexes of the fast buffers. Update supporting
utilities: Fix FlatAffineConstraints::getLowerAndUpperBound to look at
equalities as well and for a missing division. Update unionBoundingBox
to not discard common constraints (leads to a tighter system). Update
MemRefRegion::getConstantBoundingSizeAndShape to add memref dimension
constraints. Run removeTrivialRedundancy at the end of
MemRefRegion::compute. Run single iteration loop promotion and
load/store canonicalization after affine data copy (in its test pass as
well).
Differential Revision: https://reviews.llvm.org/D77320
Summary: This revision updates the value numbering when printing to number from the next parent operation that is isolated from above. This is the highest level to number from that still ensures thread-safety. This revision also changes the behavior of Operator::operator<< to use local scope to avoid thread races when numbering operations.
Differential Revision: https://reviews.llvm.org/D77525
Summary:
This revision adds a tensor_reshape operation that operates on tensors.
In the tensor world the constraints are less stringent and we can allow more
arbitrary dynamic reshapes, as long as they are contractions.
The expansion of a dynamic dimension into multiple dynamic dimensions is under-specified and is punted on for now.
Differential Revision: https://reviews.llvm.org/D77360
The current return type sometimes leads to code like
to_vector<2>(ValueRange(loop.getInductionIvs())). It would be nice to
shorten it. Users who need access to Block::BlockArgListType (if there
are any), can always call getBody()->getArguments(); if needed.
Also remove getNumInductionVars(), since there is getNumLoops().
Differential Revision: https://reviews.llvm.org/D77526
Summary:
This revision performs several cleanups on the translation infra:
* Removes the TranslateCLParser library and consolidates into Translation
- This was a weird library that existed in Support, and didn't really justify being a standalone library.
* Cleans up the internal registration and consolidates all of the translation functions within one registry.
Differential Revision: https://reviews.llvm.org/D77514
Add a pattern rewriter utility to erase blocks (while notifying the
pattern rewriting driver of the erased ops). Use this to remove trivial
else blocks in affine.if ops.
Differential Revision: https://reviews.llvm.org/D77083
The ForOp::build ensures that there is a block terminator which is great for
the default use case when there are no iter_args and loop.for returns no
results. In non-zero results case we always need to call replaceOpWithNewOp
which is not the nicest thing in the world. We can stop inserting YieldOp when
iter_args is non-empty. IfOp::build already behaves similarly.
Summary: This revision adds support for marking the last region as variadic in the ODS region list with the VariadicRegion directive.
Differential Revision: https://reviews.llvm.org/D77455
Summary: The attribute grammar includes an optional trailing colon type, so for attributes without a constant buildable type this will generally lead to unexpected and undesired behavior. Given that, it's better to just error out on these cases.
Differential Revision: https://reviews.llvm.org/D77293
Summary: It is a very common user trap to think that the location printed along with the diagnostic is the same as the current operation that caused the error. This revision changes the behavior to always print the current operation, except for when diagnostics are being verified. This is achieved by moving the command line flags in IR/ to be options on the MLIRContext.
Differential Revision: https://reviews.llvm.org/D77095
PatternRewriter and derived classes provide a set of virtual methods to
manipulate blocks, which ConversionPatternRewriter overrides to keep track of
the manipulations and undo them in case the conversion fails. However, one can
currently create a block only by splitting another block into two. This not
only makes the API inconsistent (`splitBlock` is allowed in conversion
patterns, but `createBlock` is not), but it also make it impossible for one to
create blocks with argument lists different from those of already existing
blocks since in-place block updates are not supported either. Such
functionality precludes dialect conversion infrastructure from being used more
extensively on region-containing ops, for example, for value-returning "if"
operations. At the same time, ConversionPatternRewriter already allows one to
undo block creation as block creation is one of the primitive operations in
already supported region inlining.
Support block creation in conversion patterns by hooking `createBlock` on the
block action undo mechanism. This requires to make `Builder::createBlock`
virtual, similarly to Op insertion. This is a minimal change to the Builder
infrastructure that will later help support additional use cases such as block
signature changes. `createBlock` now additionally takes the types of the block
arguments that are added immediately so as to avoid in-place argument list
manipulation that would be illegal in conversion patterns.
Two back-to-back transpose operations are combined into a single transpose, which uses a combination of their permutation vectors.
Differential Revision: https://reviews.llvm.org/D77331
A certain number of EDSCs have a named form (e.g. `linalg.matmul`) and a generic form (e.g. `linalg.generic` with matmul traits).
Despite living in different namespaces, using the same name is confusiong in clients.
Rename them as `linalg_matmul` and `linalg_generic_matmul` respectively.
Summary:
LLVM IR functions can have arbitrary attributes attached to them, some of which
affect may affect code transformations. Until we can model all attributes
consistently, provide a pass-through mechanism that forwards attributes from
the LLVMFuncOp in MLIR to LLVM IR functions during translation. This mechanism
relies on LLVM IR being able to recognize string representations of the
attributes and performs some additional checking to avoid hitting assertions
within LLVM code.
Differential Revision: https://reviews.llvm.org/D77072
Add a method that given an affine map returns another with just its unique
results. Use this to drop redundant bounds in max/min for affine.for. Update
affine.for's canonicalization pattern and createCanonicalizedForOp to use
this.
Differential Revision: https://reviews.llvm.org/D77237
Modernize/cleanup code in loop transforms utils - a lot of this code was
written prior to the currently available IR support / code style. This
patch also does some variable renames including inst -> op, comment
updates, turns getCleanupLoopLowerBound into a local function.
Differential Revision: https://reviews.llvm.org/D77175
Summary:
This is to allow optimizations like loop invariant code motion to work
on the ParallelOp.
Additional small cleanup on the ForOp implementation of
LoopLikeInterface and the test file of loop-invariant-code-motion.
Differential Revision: https://reviews.llvm.org/D77128
Summary:
This revision adds support for auto-generating pass documentation, replacing the need to manually keep Passes.md up-to-date. This matches the behavior already in place for dialect and interface documentation.
Differential Revision: https://reviews.llvm.org/D76660
This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).
Differential Revision: https://reviews.llvm.org/D76659
This removes the need to statically register conversion passes, and also puts all of the conversions within one centralized file.
Differential Revision: https://reviews.llvm.org/D76658
This generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes.
Differential Revision: https://reviews.llvm.org/D76657
This will greatly simplify a number of things related to passes:
* Enables generation of pass registration
* Enables generation of boiler plate pass utilities
* Enables generation of pass documentation
This revision focuses on adding the basic structure and adds support for generating the registration for passes in the Transforms/ directory. Future revisions will add more support and move more passes over.
Differential Revision: https://reviews.llvm.org/D76656
Summary:
The commit provides a single method to build affine maps with zero or more
results. Users of mlir::AffineMap previously had to dispatch between two methods
depending on the number of results.
At the same time, this commit fixes the method for building affine map with zero
results that was previously ignoring its `symbolCount` argument.
Differential Revision: https://reviews.llvm.org/D77126
Summary:
OpBuilder(Block) is specifically replaced with
OpBuilder::atBlockEnd(Block);
This is to make insertion behavior clear due to there being no one
correct answer for which location in a block the default insertion
point should be.
Differential Revision: https://reviews.llvm.org/D77060
Summary:
The RAW fusion happens only if the produecer block dominates the consumer block.
The WAW pattern also works with the precondition. I.e., if a producer can
dominate the consumer, they can fairly fuse together.
Since they are all tilable, we can think the pattern like this way:
Input:
```
linalg_op1 view
tile_loop
subview_2
linalg_op2 subview_2
```
Tile the first Linalg op as same as the second Linalg.
```
tile_loop
subview_1
linalg_op1 subview_1
tile_loop
subview_2
liangl_op2 subview_2
```
Since the first Linalg op is tilable in the same way and the computation are
independently, it's fair to fuse it with the second Linalg op.
```
tile_loop
subview_1
linalg_op1 subview_1
linalg_op2 subview_2
```
In short, this patch includes:
- Handling both RAW and WAW pattern.
- Adding a interface method to get input and output buffers.
- Exposing a method to get a StringRef of a dependency type.
- Fixing existing WAW tests and add one more use case: initialize the buffer
before conv op.
Differential Revision: https://reviews.llvm.org/D76897
Summary:
Performs an N-D pooling operation similarly to the description in the TF
documentation:
https://www.tensorflow.org/api_docs/python/tf/nn/pool
Different from the description, this operation doesn't perform on batch and
channel. It only takes tensors of rank `N`.
```
output[x[0], ..., x[N-1]] =
REDUCE_{z[0], ..., z[N-1]}
input[
x[0] * strides[0] - pad_before[0] + dilation_rate[0]*z[0],
...
x[N-1]*strides[N-1] - pad_before[N-1] + dilation_rate[N-1]*z[N-1]
],
```
The required optional arguments are:
- strides: an i64 array specifying the stride (i.e. step) for window
loops.
- dilations: an i64 array specifying the filter upsampling/input
downsampling rate
- padding: an i64 array of pairs (low, high) specifying the number of
elements to pad along a dimension.
If strides or dilations attributes are missing then the default value is
one for each of the input dimensions. Similarly, padding values are zero
for both low and high in each of the dimensions, if not specified.
Differential Revision: https://reviews.llvm.org/D76414
This commit changes the separator line for dividing auto-generated
docs from spec and manually added appendix from "### Custom assembly
form" to "<!-- End of AutoGen section -->". This is in preparation
to use the declarative assembly form in MLIR core. We will replace
more and more manually written assembly forms to be autogenerated.
Differential Revision: https://reviews.llvm.org/D77158
Summary:
Add support for TupleGetOp folding through InsertSlicesOp and ExtractSlicesOp.
Vector-to-vector transformations for unrolling and lowering to hardware vectors
can generate chains of structured vector operations (InsertSlicesOp,
ExtractSlicesOp and ShapeCastOp) between the producer of a hardware vector
value and its consumer. Because InsertSlicesOp, ExtractSlicesOp and ShapeCastOp
are structured, we can track the location (tuple index and vector offsets) of
the consumer vector value through the chain of structured operations to the
producer, enabling a much more powerful producer-consumer fowarding of values
through structured ops and tuple, which in turn enables a more powerful
TupleGetOp folding transformation.
Reviewers: nicolasvasilache, aartbik
Reviewed By: aartbik
Subscribers: grosul1, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76889
Rewrite mlir::permuteLoops (affine loop permutation utility) to fix
incorrect approach. Avoiding using sinkLoops entirely - use single move
approach. Add test pass.
This fixes https://bugs.llvm.org/show_bug.cgi?id=45328
Depends on D77003.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D77004
Summary:
This revision performs a lot of different cleanups on operation documentation to ensure that they are consistent, e.g. using mlir code blocks, formatting, etc.
This revision also includes the auto-generated documentation into the hand-written documentation for the dialects that have a specific top-level dialect file. This updates the documentation for all dialects aside from SPIRV and STD. These dialects will be updated in a followup.
Differential Revision: https://reviews.llvm.org/D76734
Summary: This revision updates the dialect documentation to use the auto-generated markdown for operations. This allows for updating some out-of-date bits of documentation, and allows for displaying a large of number of newly added operations that did not have a counter part in Standard.md.
Differential Revision: https://reviews.llvm.org/D76743
Summary: This revision updates the SourceMgrDiagnosticHandler to not print the source location of a note if it is the same location as the previously printed diagnostic. This helps avoid redundancy, and potential confusion, when looking at the diagnostic output.
Differential Revision: https://reviews.llvm.org/D76787
Add missing assert checks for input to mlir::interchangeLoops utility.
Rename interchangeLoops -> permuteLoops; update doc comments to clarify
inputs / return val. Other than the assert checks, this is NFC.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D77003
This patch introduces a utility to separate full tiles from partial
tiles when tiling affine loop nests where trip counts are unknown or
where tile sizes don't divide trip counts. A conditional guard is
generated to separate out the full tile (with constant trip count loops)
into the then block of an 'affine.if' and the partial tile to the else
block. The separation allows the 'then' block (which has constant trip
count loops) to be optimized better subsequently: for eg. for
unroll-and-jam, register tiling, vectorization without leading to
cleanup code, or to offload to accelerators. Among techniques from the
literature, the if/else based separation leads to the most compact
cleanup code for multi-dimensional cases (because a single version is
used to model all partial tiles).
INPUT
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
"foo"() : () -> ()
}
}
OUTPUT AFTER TILING W/O SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
OUTPUT AFTER TILING WITH SEPARATION
map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0) -> (d0 + 32)>
map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)>
#set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)>
affine.for %arg2 = 0 to %M step 32 {
affine.for %arg3 = 0 to %N step 32 {
affine.if #set0(%arg2, %arg3)[%M, %N] {
// Full tile.
affine.for %arg4 = #map0(%arg2) to #map1(%arg2) {
affine.for %arg5 = #map0(%arg3) to #map1(%arg3) {
"foo"() : () -> ()
}
}
} else {
// Partial tile.
affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] {
affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] {
"foo"() : () -> ()
}
}
}
}
}
The separation is tested via a cmd line flag on the loop tiling pass.
The utility itself allows one to pass in any band of contiguously nested
loops, and can be used by other transforms/utilities. The current
implementation works for hyperrectangular loop nests.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76700
- add `to_extent_tensor`
- rename `create_shape` to `from_extent_tensor` for symmetry
- add `split_at` and `concat` ops for basic shape manipulations
This set of ops is inspired by the requirements of lowering a dynamic-shape-aware batch matmul op. For such an op, the "matrix" dimensions aren't subject to broadcasting but the others are, and so we need to slice, broadcast, and reconstruct the final output shape. Furthermore, the actual broadcasting op used downstream uses a tensor of extents as its preferred shape interface for the actual op that does the broadcasting.
However, this functionality is quite general. It's obvious that `to_extent_tensor` is needed long-term to support many common patterns that involve computations on shapes. We can evolve the shape manipulation ops introduced here. The specific choices made here took into consideration the potentially unranked nature of the !shape.shape type, which means that a simple listing of dimensions to extract isn't possible in general.
Differential Revision: https://reviews.llvm.org/D76817
This fixes a number of warnings, where a function is re-defined after it is tagged as "being imported":
D:\llvm-project\mlir\lib\ExecutionEngine\CRunnerUtils.cpp(24,17): warning: 'print_i32' redeclared without 'dllimport' attribute: 'dllexport' attribute added [-Winconsistent-dllimport]
extern "C" void print_i32(int32_t i) { fprintf(stdout, "%" PRId32, i); }
^
D:\llvm-project\mlir\include\mlir/ExecutionEngine/CRunnerUtils.h(168,42): note: previous declaration is here
extern "C" MLIR_CRUNNERUTILS_EXPORT void print_i32(int32_t i);
^
Differential Revision: https://reviews.llvm.org/D76654
The Dominance analysis currently misses a utility function to find the nearest common dominator of two given blocks. This is required for a huge variety of different control-flow analyses and transformations. This commit adds this function and moves the getNode function from DominanceInfo to DominanceInfoBase, as it also works for post dominators.
Differential Revision: https://reviews.llvm.org/D75507
This change adds a new option to the StandardToLLVM lowering to configure
the bitwidth of the index type independently of the target architecture's
pointer size.
Differential revision: https://reviews.llvm.org/D76353
It's common in many dialects to use tensors to themselves hold tensor shapes (for example, the shape is itself the result of some non-trivial calculation). Currently, such dialects have to use `tensor<?xi64>` or worse (like allowing either i32 or i64 tensors to represent shapes). `tensor<?xindex>` is the natural type to represent this, but is currently disallowed. This patch allows it.
Differential Revision: https://reviews.llvm.org/D76726
Summary:
Provide a public VectorConvertToLLVMPattern utility class to implement
conversions with automatic unrolling of operation on multidimensional vectors
to lists of operations on single-dimensional vectors when lowering to the LLVM
dialect. Drop the template-based check on the number of operands since the
actual implementation does not depend on the operand number anymore. This check
only creates spurious concepts (UnaryOpLowering, BinaryOpLowering, etc).
Differential Revision: https://reviews.llvm.org/D76865
Summary:
The Standard-to-LLVM dialect convresion has a set of utility classes that
simplify conversions, including patterns that provide one-to-one conversion
operation conversion with optional result packing. Expose these classes in a
public header so that conversions other than Standard-to-LLVM (e.g. vectors, or
LLVM-based intrinsics) could also use them. Since the patterns are implemented
as class templates and in order to keep the code size limited, keep the
implementation private by resorting to op identifiers instead of template-based
builders.
Differential Revision: https://reviews.llvm.org/D76864
This allows conversion of a ParallelLoop from N induction variables to
some nuber of induction variables less than N.
The first intended use of this is for the GPUDialect to convert
ParallelLoops to iterate over 3 dimensions so they can be launched as
GPU Kernels.
To implement this:
- Normalize each iteration space of the ParallelLoop
- Use the same induction variable in a new ParallelLoop for multiple
original iterations.
- Split the new induction variable back into the original set of values
inside the body of the ParallelLoop.
Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76363
- add method to get back an integer set from flat affine constraints;
this allows a round trip
- use this to complete the simplification of integer sets in
-simplify-affine-structures
- update FlatAffineConstraints::removeTrivialRedundancy to also do GCD
tightening and normalize by GCD (while still keeping it linear time).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Summary:
Some operations have custom syntax where an operand is always followed by a
specific token of streams if the operand is present. Parsing such operations
requires the ability to optionally parse an operand. Provide a relevant
function in the custom Op parser.
Differential Revision: https://reviews.llvm.org/D76779
gpu.launch
Current implementation of lowering from loop.parallel to gpu.launch
uses a DictionaryAttr to specify the mapping. Moving this attribute to
be auto-generated from specification as a StructAttr. This simplifies
a lot the logic of looking up and creating this attribute.
Differential Revision: https://reviews.llvm.org/D76165
Summary:
The restriction that a derived attribute should represent an
attribute/be materializable as an attribute was not made clear.
Differential Revision: https://reviews.llvm.org/D76715
Summary:
This revisions performs several cleanups to the generated dialect documentation:
* Standardizes format of attributes/operands/results sections
* Splits out operation/type/dialect documentation generation to allow for composing generated and hand-written documentation
* Add section for declarative assembly syntax and successors
* General cleanup
Differential Revision: https://reviews.llvm.org/D76573
When trying to fold an operation during operation creation check that
the operation folding succeeds before inserting the op.
Differential Revision: https://reviews.llvm.org/D76415
Summary:
This allows the custom parser/printer hooks to do interesting things with
the SSA names. This patch:
- Adds a new 'getResultName' method to OpAsmParser that allows a parser
implementation to get information about its result names, along with
a getNumResults() method that allows op parser impls to know how many
results are expected.
- Adds a OpAsmPrinter::printOperand overload that takes an explicit stream.
- Adds a test.string_attr_pretty_name operation that uses these hooks to
do fancy things with the result name.
Reviewers: rriddle!
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76205
Move some of the affine transforms and their test cases to their
respective dialect directory. This patch does not complete the move, but
takes care of a good part.
Renames: prefix 'affine' to affine loop tiling cl options,
vectorize -> super-vectorize
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76565
Summary:
This removes the static pass registration, and also cleans up some lingering technical debt.
Differential Revision: https://reviews.llvm.org/D76554
Summary:
This file only contains references to test passes, and was never removed when the test passes were moved to the test/ directory.
Differential Revision: https://reviews.llvm.org/D76553
Fix memref type doc comment to state that -1 indicates a dynamically
shaped dimension and not any negative number.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76557
Summary:
Change AffineOps Dialect structure to better group both IR and Tranforms. This included extracting transforms directly related to AffineOps. Also move AffineOps to Affine.
Differential Revision: https://reviews.llvm.org/D76161
The Vector Dialect [document](https://mlir.llvm.org/docs/Dialects/Vector/) discusses the vector abstractions that MLIR supports and the various tradeoffs involved.
One of the layer that is missing in OSS atm is the Hardware Vector Ops (HWV) level.
This revision proposes an AVX512-specific to add a new Dialect/Targets/AVX512 Dialect that would directly target AVX512-specific intrinsics.
Atm, we rely too much on LLVM’s peephole optimizer to do a good job from small insertelement/extractelement/shufflevector. In the future, when possible, generic abstractions such as VP intrinsics should be preferred.
The revision will allow trading off HW-specific vs generic abstractions in MLIR.
Differential Revision: https://reviews.llvm.org/D75987
Builder::get{I32,I64}VectorAttr are actually of limited applicability since
vector types can't have zero elements, whereas many uses of this kind of
attribute (such as dimension lists for "transpose"-like and other tensor
ops) often can result in empty lists.
Differential Revision: https://reviews.llvm.org/D76403
Summary:
Utility to perform CallOp Dialect conversion, specifically handling cases where
an argument type has changed and the corresponding CallOp needs to be updated.
Differential Revision: https://reviews.llvm.org/D76326
Summary:
With the move towards dialect registration that does not depend only use
static initialization, we are running into more cases where the dialects
are registered by different methods. For example, TensorFlow still uses
static initialization to register all MLIR core dialects, which prevents
explicit registration of any of them when linking it in. We ran into this
issue in https://github.com/google/iree/pull/982.
To address potential issues with conflicts from non-standard
allocators passed to registerDialectAllocator, made this method
private. Now all dialects can only be registered with their
constructor.
Similarly deduplicates DialectHooks for consistency and makes their
registration follow the same pattern.
Differential Revision: https://reviews.llvm.org/D76329
Non-32-bit scalar types requires special hardware support that may
not exist on all Vulkan-capable GPUs. This is reflected as non-32-bit
scalar types require special capabilities or extensions to be used.
This commit makes SPIRVTypeConverter target environment aware so
that it can properly convert standard types to what is accepted on
the target environment.
Right now if a scalar type bitwidth is not supported in the target
environment, we use 32-bit unconditionally. This requires Vulkan
runtime to also feed in data with a matched bitwidth and layout,
especially for interface types. The Vulkan runtime can do that by
inspecting the SPIR-V module. Longer term, we might want to introduce
a way to control how such case are handled and explicitly fail
if wanted.
Differential Revision: https://reviews.llvm.org/D76244
Types should be checked with the type hierarchy. This should result in
better responsibility division and API surface.
Differential Revision: https://reviews.llvm.org/D76243
This commit unifies target environment queries into a new wrapper
class spirv::TargetEnv and shares across various places needing
the functionality. We still create multiple instances of TargetEnv
though given the parent components (type converters, passes,
conversion targets) have different lifetimes.
In the meantime, LowerABIAttributesPass is updated to take into
consideration the target environment, which requires updates to
tests to provide that.
Differential Revision: https://reviews.llvm.org/D76242
Previously in SPIRVTypeConverter, we always convert memref types
to StorageBuffer regardless of their memory spaces. This commit
fixes that to let the conversion to look into memory space
properly. For this purpose, a mapping between SPIR-V storage class
and memref memory space is introduced. The mapping is arbitary
decided at the moment and the hope is that we can leverage
string memory space later to be more clear.
Now spv.interface_var_abi cannot contain storage class unless it's
attached to a scalar value, where we need the storage class as side
channel information. Verifications and tests are properly adjusted.
Differential Revision: https://reviews.llvm.org/D76241
Summary: The usage story in for NDEBUG isn't fleshed out yet, so this revision ensures that none of the diagnostic code exists in the binary.
Differential Revision: https://reviews.llvm.org/D76372
Summary: This is somewhat complex(annoying) as it involves directly tracking the uses within each of the callgraph nodes, and updating them as needed during inlining. The benefit of this is that we can have a more exact cost model, enable inlining some otherwise non-inlinable cases, and also ensure that newly dead callables are properly disposed of.
Differential Revision: https://reviews.llvm.org/D75476
Summary:
This revision restructures the calling of vector transforms to make it more flexible to ask for lowering through LLVM matrix intrinsics.
This also makes sure we bail out in degenerate cases (i.e. 1) in which LLVM complains about not being able to scalarize.
Differential Revision: https://reviews.llvm.org/D76266
Summary:
Renamed QuantOps to Quant to avoid the Ops suffix. All dialects will contain
ops, so the Ops suffix is redundant.
Differential Revision: https://reviews.llvm.org/D76318
Summary:
This revision adds a new hook, `notifyMatchFailure`, that allows for notifying the rewriter that a match failure is coming with the provided reason. This hook takes as a parameter a callback that fills a `Diagnostic` instance with the reason why the match failed. This allows for the rewriter to decide how this information can be displayed to the end-user, and may completely ignore it if desired(opt mode). For now, DialectConversion is updated to include this information in the debug output.
Differential Revision: https://reviews.llvm.org/D76203
MLIR supports terminators that have the same successor block with different
block operands, which cannot be expressed in the LLVM's phi-notation as the
block identifier is used to tell apart the predecessors. This limitation can be
worked around by branching to a new block instead, with this new block
unconditionally branching to the original successor and forwarding the
argument. Until now, this transformation was performed during the conversion
from the Standard to the LLVM dialect. This does not scale well to multiple
dialects targeting the LLVM dialect as all of them would have to be aware of
this limitation and perform the preparatory transformation. Instead, do it as a
separate pass and run it immediately before the translation.
Differential Revision: https://reviews.llvm.org/D75619
Summary: This adds bitfields that map to the dialect attribute verifier hooks. This also moves over the Test dialect to have its declaration generated.
Differential Revision: https://reviews.llvm.org/D76254
Summary: PatternState was a mechanism to pass state between the match and rewrite calls of a RewritePattern. With the rise of matchAndRewrite, this class is unused and unnecessary. This revision removes PatternState and simplifies PatternMatchResult to just be a LogicalResult. A future revision will replace all usages of PatternMatchResult/matchSuccess/matchFailure with LogicalResult equivalents.
Differential Revision: https://reviews.llvm.org/D76202
- rename vars that had inst suffixes (due to ops earlier being
known as insts); other renames for better readability
- drop unnecessary matches in test cases
- iterate without block terminator
- comment/doc updates
- instBodySkew -> affineForOpBodySkew
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76214
Summary:
This regional op in the QuantOps dialect will be used to wrap
high-precision ops into atomic units for quantization. All the values
used by the internal ops are captured explicitly by the op inputs. The
quantization parameters of the inputs and outputs are stored in the
attributes.
Subscribers: jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75972
Summary: This generates the class declarations for dialects using the existing 'Dialect' tablegen classes.
Differential Revision: https://reviews.llvm.org/D76185
Summary:
- remove stale declarations on flat affine constraints
- avoid allocating small vectors where possible
- clean up code comments, rename some variables
Differential Revision: https://reviews.llvm.org/D76117
Summary:
To enable this, two changes are needed:
1) Add an optional attribute `padding` to linalg.conv.
2) Compute if the indices accessing is out of bound in the loops. If so, use the
padding value `0`. Otherwise, use the value derived from load.
In the patch, the padding only works for lowering without other transformations,
e.g., tiling, fusion, etc.
Differential Revision: https://reviews.llvm.org/D75722
Summary:
This revision adds lowering of vector.contract to llvm.intr.matrix_multiply.
Note that there is currently a mismatch between the MLIR vector dialect which
expects row-major layout and the LLVM matrix intrinsics which expect column
major layout.
As a consequence, we currently only match a vector.contract with indexing maps
that express column-major matrix multiplication.
Other cases would require additional transposes and it is better to wait for
LLVM intrinsics to provide a per-operation attribute that would specify which
layout is expected.
A separate integration test, not submitted to MLIR core, has independently
verified that correct execution occurs on a 2x2x2 matrix multiplication.
Differential Revision: https://reviews.llvm.org/D76014
Previously we only consider the version/capability/extension requirements
on ops themselves. Some types in SPIR-V also require special extensions
or capabilities to be used. For example, non-32-bit integers/floats
will require different capabilities and/or extensions depending on
where they are used because it may mean special hardware abilities.
This commit adds query methods to SPIR-V type class hierarchy to support
querying extensions and capabilities. We don't go through ODS for
auto-generating such information given that we don't have them in
SPIR-V machine readable grammar and there are just a few types.
Differential Revision: https://reviews.llvm.org/D75875
Previously extensions and capabilities requirements are returned as
SmallVector<SmallVector>. It's an anti-pattern; this commit improves
a bit by returning as SmallVector<ArrayRef>. This is possible because
the internal sequence is always known statically (from the spec)
so that we can use a static constant array for it and get an ArrayRef.
Differential Revision: https://reviews.llvm.org/D75874
This commits changes the definition of spv.module to use the #spv.vce
attribute for specifying (version, capabilities, extensions) triple
so that we can have better API and custom assembly form. Since now
we have proper modelling of the triple, (de)serialization is wired up
to use them.
With the new UpdateVCEPass, we don't need to manually specify the
required extensions and capabilities anymore when creating a spv.module.
One just need to call UpdateVCEPass before serialization to get the
needed version/extensions/capabilities.
Differential Revision: https://reviews.llvm.org/D75872
Creates an operation pass that deduces and attaches the minimal version/
capabilities/extensions requirements for spv.module ops.
For each spv.module op, this pass requires a `spv.target_env` attribute on
it or an enclosing module-like op to drive the deduction. The reason is
that an op can be enabled by multiple extensions/capabilities. So we need
to know which one to pick. `spv.target_env` gives the hard limit as for
what the target environment can support; this pass deduces what are
actually needed for a specific spv.module op.
Differential Revision: https://reviews.llvm.org/D75870
Previously we only look at the directly passed-in op for a potential
spv.target_env attribute. This commit switches to use a larger range
and recursively check enclosing symbol tables.
Differential Revision: https://reviews.llvm.org/D75869
We also need the (version, capabilities, extensions) triple on the
spv.module op. Thus far we have been using separate 'extensions'
and 'capabilities' attributes there and 'version' is missing. Creating
a separate attribute for the trip allows us to reuse the assembly
form and verification.
Differential Revision: https://reviews.llvm.org/D75868
HasNoSideEffect can now be implemented using the MemoryEffectInterface, removing the need to check multiple things for the same information. This also removes an easy foot-gun for users as 'Operation::hasNoSideEffect' would ignore operations that dynamically, or recursively, have no side effects. This also leads to an immediate improvement in some of the existing users, such as DCE, now that they have access to more information.
Differential Revision: https://reviews.llvm.org/D76036
The current mechanism for identifying is a bit hacky and extremely adhoc, i.e. we explicit check 1-result, 0-operand, no side-effect, and always foldable and then assume that this is a constant. Adding a trait adds structure to this, and makes checking for a constant much more efficient as we can guarantee that all of these things have already been verified.
Differential Revision: https://reviews.llvm.org/D76020
These terminator operations don't really have any side effects, and this allows for more accurate side-effect analysis for region operations. For example, currently we can't detect like a loop.for or affine.for are dead because the affine.terminator is "side effecting".
Note: Marking as NoSideEffect doesn't mean that these operations can be opaquely erased.
Differential Revision: https://reviews.llvm.org/D75888
Summary:
This replaces the direct lowering of vector.outerproduct to LLVM with progressive lowering into elementary vectors ops to avoid having the similar lowering logic at several places.
NOTE1: with the new progressive rule, the lowered llvm is slightly more elaborate than with the direct lowering, but the generated assembly is just as optimized; still if we want to stay closer to the original, we should add a "broadcast on extract" to shuffle rewrite (rather than special cases all the lowering steps)
NOTE2: the original outerproduct lowering code should now be removed but some linalg test work directly on vector and contain some dead code, so this requires another CL
Reviewers: nicolasvasilache, andydavis1
Reviewed By: nicolasvasilache, andydavis1
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75956
Summary:
The C runner utils API was still not vanilla enough for certain use
cases on embedded ARM SDKs, this enables such cases.
Adding people more widely for historical Windows related build issues.
Differential Revision: https://reviews.llvm.org/D76031
Summary:
affineDataCopyGenerate is a monolithinc function that
combines several steps for good reasons, but it makes customizing
the behaivor even harder. The major two steps by affineDataCopyGenerate are:
a) Identify interesting memrefs and collect their uses.
b) Create new buffers to forward these uses.
Step (a) actually has requires tremendous customization options. One could see
that from the recently added filterMemRef parameter.
This patch adds a function that only does (b), in the hope that (a)
can be directly implemented by the callers. In fact, (a) is quite
simple if the caller has only one buffer to consider, or even one use.
Differential Revision: https://reviews.llvm.org/D75965
Summary:
This patch add some builtin operation for the gpu.all_reduce ops.
- for Integer only: `and`, `or`, `xor`
- for Float and Integer: `min`, `max`
This is useful for higher level dialect like OpenACC or OpenMP that can lower to the GPU dialect.
Differential Revision: https://reviews.llvm.org/D75766
Summary:
This patch add some builtin operation for the gpu.all_reduce ops.
- for Integer only: `and`, `or`, `xor`
- for Float and Integer: `min`, `max`
This is useful for higher level dialect like OpenACC or OpenMP that can lower to the GPU dialect.
Differential Revision: https://reviews.llvm.org/D75766
* Adds GpuLaunchFuncToVulkanLaunchFunc conversion pass.
* Moves a serialization of the `spirv::Module` from LaunchFuncToVulkanCalls pass to newly created pass.
* Updates LaunchFuncToVulkanCalls instrumentation pass, adds `initVulkan` and `deinitVulkan` runtime calls.
* Adds `bindResource` call to bind specifc resource by the given descriptor set and descriptor binding.
* Eliminates static construction and desctruction of `VulkanRuntimeManager`.
Differential Revision: https://reviews.llvm.org/D75192
Summary:
Interfaces/ is the designated directory for these types of interfaces, and also removes the need for including them directly in IR/.
Differential Revision: https://reviews.llvm.org/D75886
The interfaces themselves aren't really analyses, they may be used by analyses though. Having them in Analysis can also create cyclic dependencies if an analysis depends on a specific dialect, that also provides one of the interfaces.
Differential Revision: https://reviews.llvm.org/D75867
This revision takes advantage of the empty AffineMap to specify the
0-D edge case. This allows removing a bunch of annoying corner cases
that ended up impacting users of Linalg.
Differential Revision: https://reviews.llvm.org/D75831
Summary:
The old interface was a temporary stopgap to allow for implementing simple LICM that took side effects of region operations into account. Now that MLIR has proper support for specifying memory effects, this interface can be deleted.
Differential Revision: https://reviews.llvm.org/D74441
Summary: This op mirrors the llvm.intr counterpart and allows lowering + type conversions in a progressive fashion.
Differential Revision: https://reviews.llvm.org/D75775