Make UnrollVectorPattern inherit from RewritePattern instead of
OpRewritePattern so that we don't need to create many patterns when applying to
many different type of ops. Since we may want to apply the pattern to all
arithmetic op, it is more convenient to filter dynamically.
Differential Revision: https://reviews.llvm.org/D92635
- Instead of hardcoding the parameters and return types of 'inferReturnTypes', use the
InferTypeOpInterface trait to generate the method declaration.
- Fix InferTypeOfInterface to use fully qualified type for inferReturnTypes results.
Differential Revision: https://reviews.llvm.org/D92585
In the past, the reshape op can be folded only if the indexing map is
permutation in consumer's usage. We can relax to condition to be projected
permutation.
This patch still limits the fusion for scalar cases. Scalar case is a corner
case, because we need to decide where to put extra dims.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D92466
This is part of a larger refactoring the better congregates the builtin structures under the BuiltinDialect. This also removes the problematic "standard" naming that clashes with the "standard" dialect, which is not defined within IR/. A temporary forward is placed in StandardTypes.h to allow time for downstream users to replaced references.
Differential Revision: https://reviews.llvm.org/D92435
The definitions of ModuleOp and FuncOp are now within BuiltinOps.h, making the individual files obsolete.
Differential Revision: https://reviews.llvm.org/D92622
A separate AVX512 lowering pass does not compose well with the regular
vector lowering pass. As such, it is at risk of code duplication and
lowering inconsistencies. This change removes the separate AVX512 lowering
pass and makes it an "option" in the regular vector lowering pass
(viz. vector dialect "augmented" with AVX512 dialect).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D92614
This was important when ModuleOp was the only top level operation, but that isn't necessarily the case anymore. This is one of the last remaining aspects of the infrastructure that is hardcoded to ModuleOp.
Differential Revision: https://reviews.llvm.org/D92605
This was a somewhat important restriction in the past when ModuleOp was distinctly the top-level container operation, as well as before the pass manager had support for running nested pass managers natively. With these two issues fading away, there isn't really a good reason to enforce that a ModuleOp is the thing running within a pass manager. As such, this revision removes the restriction and allows for users to pass in the name of the operation that the pass manager will be scheduled on.
The only remaining dependency on BuiltinOps from Pass after this revision is due to FunctionPass, which will be resolved in a followup revision.
Differential Revision: https://reviews.llvm.org/D92450
There isn't a good reason for anything within IR to specifically reference any of the builtin operations. The only place that had a good reason in the past was AsmPrinter, but the behavior there doesn't need to hardcode ModuleOp anymore.
Differential Revision: https://reviews.llvm.org/D92448
Add support for vectorization for linalg.generic representing element-wise ops.
Those are converted to transfer_read + vector ops + transfer_write.
Also re-organize the vectorization tests to be together.
Implementation derived from the work of @burmako, @agrue and
@fedelebron.
Differential Revision: https://reviews.llvm.org/D92540
This fixes the build on gcc5 toolchain where sizeof is required for
types used in SmallVector now.
This is a consequence of using std::is_trivially_copy_constructible
instead of the LLVM variant: https://reviews.llvm.org/D92543
This reduces the chances of segfault. While it is a good practice to ensure
robust custom printers, it is unfortunately common to have them crash on
invalid input.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D92536
Memrefs with affine_map in the results of normalizable operation were
not normalized by `--normalize-memrefs` option. This patch normalizes
them.
Differential Revision: https://reviews.llvm.org/D88719
Extended promote buffers to stack pass to support dynamically shaped allocas.
The conversion is limited by the rank of the underlying tensor.
An option is added to the pass to adjust the given rank.
Differential Revision: https://reviews.llvm.org/D91969
Given that OpState already implicit converts to Operator*, this seems reasonable.
The alternative would be to add more functions to OpState which forward to Operation.
Reviewed By: rriddle, ftynse
Differential Revision: https://reviews.llvm.org/D92266
OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.
Reviewed By: jdoerfert, llitchev
Differential Revision: https://reviews.llvm.org/D92189
The test process of the ir_array_attributes.py depends on numpy. This patch checks numpy in Python bindings configuration.
- Add NumPy in find_package as a required component to check numpy.
- If numpy is found, print the version and include directory.
Differential Revision: https://reviews.llvm.org/D92276
PDL patterns are now supported via a new `PDLPatternModule` class. This class contains a ModuleOp with the pdl::PatternOp operations representing the patterns, as well as a collection of registered C++ functions for native constraints/creations/rewrites/etc. that may be invoked via the pdl patterns. Instances of this class are added to an OwningRewritePatternList in the same fashion as C++ RewritePatterns, i.e. via the `insert` method.
The PDL bytecode is an in-memory representation of the PDL interpreter dialect that can be efficiently interpreted/executed. The representation of the bytecode boils down to a code array(for opcodes/memory locations/etc) and a memory buffer(for storing attributes/operations/values/any other data necessary). The bytecode operations are effectively a 1-1 mapping to the PDLInterp dialect operations, with a few exceptions in cases where the in-memory representation of the bytecode can be more efficient than the MLIR representation. For example, a generic `AreEqual` bytecode op can be used to represent AreEqualOp, CheckAttributeOp, and CheckTypeOp.
The execution of the bytecode is split into two phases: matching and rewriting. When matching, all of the matched patterns are collected to avoid the overhead of re-running parts of the matcher. These matched patterns are then considered alongside the native C++ patterns, which rewrite immediately in-place via `RewritePattern::matchAndRewrite`, for the given root operation. When a PDL pattern is matched and has the highest benefit, it is passed back to the bytecode to execute its rewriter.
Differential Revision: https://reviews.llvm.org/D89107
- Change InferTypeOpInterface::inferResultTypes to use fully qualified types matching
the ones generated by genTypeInterfaceMethods, so the redundancy can be detected.
- Move genTypeInterfaceMethods() before genOpInterfaceMethods() so that the
inferResultTypes method generated by genTypeInterfaceMethods() takes precedence
over the declaration that might be generated by genOpInterfaceMethods()
- Modified an op in the test dialect to exercise this (the modified op would fail to
generate valid C++ code due to duplicate inferResultTypes methods).
Differential Revision: https://reviews.llvm.org/D92414
ExecutionEngine/LLJIT do not run globals destructors in loaded dynamic libraries when destroyed, and threads managed by ThreadPool can race with program termination, and it leads to segfaults.
TODO: Re-enable threading after fixing a problem with destructors, or removing static globals from dynamic library.
Differential Revision: https://reviews.llvm.org/D92368
- Address TODO in scf-bufferize: the argument materialization issue is
now fixed and the code is now in Transforms/Bufferize.cpp
- Tighten up finalizing-bufferize to avoid creating invalid IR when
operand types potentially change
- Tidy up the testing of func-bufferize, and move appropriate tests
to a new finalizing-bufferize.mlir
- The new stricter checking in finalizing-bufferize revealed that we
needed a DimOp conversion pattern (found when integrating into npcomp).
Previously, the converion infrastructure was blindly changing the
operand type during finalization, which happened to work due to
DimOp's tensor/memref polymorphism, but is generally not encouraged
(the new pattern is the way to tell the conversion infrastructure that
it is legal to change that type).
The InlineAsmOp mirrors the underlying LLVM semantics with a notable
exception: the embedded `asm_string` is not allowed to define or reference
any symbol or any global variable: only the operands of the op may be read,
written, or referenced.
Attempting to define or reference any symbol or any global behavior is
considered undefined behavior at this time.
The asm dialect syntax is currently specified with an integer (0 [default] for the "att dialect", 1 for the intel dialect) to circumvent the ODS limitation on string enums.
Translation to LLVM is provided and raises the fact that the asm constraints string must be well-formed with respect to in/out operands. No check is performed on the asm_string.
An InlineAsm instruction in LLVM is a special call operation to a function that is constructed on the fly.
It does not fit the current model of MLIR calls with symbols.
As a consequence, the current implementation constructs the function type in ModuleTranslation.cpp.
This should be refactored in the future.
The mlir-cpu-runner is augmented with the global initialization of the X86 asm parser to allow proper execution in JIT mode. Previously, only the X86 asm printer was initialized.
Differential revision: https://reviews.llvm.org/D92166
* If ODS redefines this, it is fine, but I have found this accessor to be universally useful in the old npcomp bindings and I'm closing gaps that will let me switch.
Differential Revision: https://reviews.llvm.org/D92287
* Add capsule get/create for Attribute and Type, which already had capsule interop defined.
* Add capsule interop and get/create for Location.
* Add Location __eq__.
* Use get() and implicit cast to go from PyAttribute, PyType, PyLocation to MlirAttribute, MlirType, MlirLocation (bundled with this change because I didn't want to continue the pattern one more time).
Differential Revision: https://reviews.llvm.org/D92283
Op with mapping from ops to corresponding shape functions for those op
in the library and mechanism to associate shape functions to functions.
The mapping of operand to shape function is kept separate from the shape
functions themselves as the operation is associated to the shape
function and not vice versa, and one could have a common library of
shape functions that can be used in different contexts.
Use fully qualified names and require a name for shape fn lib ops for
now and an explicit print/parse (based around the generated one & GPU
module op ones).
This commit reverts d9da4c3e73. Fixes
missing headers (don't know how that was working locally).
Differential Revision: https://reviews.llvm.org/D91672
Op with mapping from ops to corresponding shape functions for those op
in the library and mechanism to associate shape functions to functions.
The mapping of operand to shape function is kept separate from the shape
functions themselves as the operation is associated to the shape
function and not vice versa, and one could have a common library of
shape functions that can be used in different contexts.
Use fully qualified names and require a name for shape fn lib ops for
now and an explicit print/parse (based around the generated one & GPU
module op ones).
Differential Revision: https://reviews.llvm.org/D91672
A splat attribute have a single element during printing so we should
treat it as such when we decide if we elide it or not based on the flag
intended to elide large attributes.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D92165
Many pages have had their titles renamed over time,
causing broken links to spread throughout the documentation.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D92093
The ops are very similar to the std variants, but support async GPU execution.
gpu.alloc does not currently support an alignment attribute, and the new ops do not have
canonicalizers/folders like their std siblings do.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91698
The rewrite logic has an optimization to drop a cast operation after
rewriting block arguments if the cast operation has no users. This is
unsafe as there might be a pending rewrite that replaced the cast operation
itself and hence would trigger a second free.
Instead, do not remove the casts and leave it up to a later canonicalization
to do so.
Differential Revision: https://reviews.llvm.org/D92184
This change is required so that bufferization can properly identify
the linalg.yield as a terminator with an associated parent op.
Differential Revision: https://reviews.llvm.org/D92173
This enables partial bufferization that includes function signatures. To test this, this
change also makes the func-bufferize partial and adds a dedicated finalizing-bufferize pass.
Differential Revision: https://reviews.llvm.org/D92032
This change gives sparse compiler clients more control over selecting
individual types for the pointers and indices in the sparse storage schemes.
Narrower width obviously results in smaller memory footprints, but the
range should always suffice for the maximum number of entries or index value.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D92126
Adding missing custom builders for AffineVectorLoadOp & AffineVectorStoreOp. In practice, it is difficult to correctly construct these ops without these builders (because the AffineMap is not included at construction time).
Differential Revision: https://reviews.llvm.org/D86380
This CL adds the ability to request different parallelization strategies
for the generate code. Every "parallel" loop is a candidate, and converted
to a parallel op if it is an actual for-loop (not a while) and the strategy
allows dense/sparse outer/inner parallelization.
This will connect directly with the work of @ezhulenev on parallel loops.
Still TBD: vectorization strategy
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91978
CHECK_* directives for message() where added in Cmake 3.17, LLVM
requires 3.14 as minimum so they may not be intepreted correctly and
just print "CHECK_*" into the message stream. Replace them with STATUS.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D91959
SameOperandsAndResultShape and ElementwiseMappable have similar
verification, but in general neither is strictly redundant with the
other.
Examples:
- SameOperandsAndResultShape allows
`"foo"(%0) : tensor<2xf32> -> tensor<?xf32> but ElementwiseMappable
does not.
- ElementwiseMappable allows
`select %scalar_pred, %true_tensor, %false_tensor` but
SameOperandsAndResultShape does not.
SameOperandsAndResultShape is redundant with ElementwiseMappable when
we can prove that the mixed scalar/non-scalar case cannot happen. In
those situations, `ElementwiseMappable & SameOperandsAndResultShape ==
ElementwiseMappable`:
- Ops with 1 operand: the case of mixed scalar and non-scalar operands
cannot happen since there is only one operand.
- When SameTypeOperands is also present, the mixed scalar/non-scalar
operand case cannot happen.
Differential Revision: https://reviews.llvm.org/D91396
Generalizes invariant handling to anything defined outside the Linalg op
(parameters and SSA computations). Fixes bug that was using parameter number
as tensor number.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91985
Introduce a conversion pass from SCF parallel loops to OpenMP dialect
constructs - parallel region and workshare loop. Loops with reductions are not
supported because the OpenMP dialect cannot model them yet.
The conversion currently targets only one level of parallelism, i.e. only
one top-level `omp.parallel` operation is produced even if there are nested
`scf.parallel` operations that could be mapped to `omp.wsloop`. Nested
parallelism support is left for future work.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91982
Print part of an op of the form:
```
<optional-offset-prefix>`[` offset-list `]`
<optional-size-prefix>`[` size-list `]`
<optional-stride-prefix>[` stride-list `]`
```
Also address some leftover nits.
Differential revision: https://reviews.llvm.org/D92031
Parse trailing part of an op of the form:
```
<optional-offset-prefix>`[` offset-list `]`
<optional-size-prefix>`[` size-list `]`
<optional-stride-prefix>[` stride-list `]`
```
Each entry in the offset, size and stride list either resolves to an integer
constant or an operand of index type.
Constants are added to the `result` as named integer array attributes with
name `OffsetSizeAndStrideOpInterface::getStaticOffsetsAttrName()` (resp.
`getStaticSizesAttrName()`, `getStaticStridesAttrName()`).
Append the number of offset, size and stride operands to `segmentSizes`
before adding it to `result` as the named attribute:
`OpTrait::AttrSizedOperandSegments<void>::getOperandSegmentSizeAttr()`.
Offset, size and stride operands resolution occurs after `preResolutionFn`
to give a chance to leading operands to resolve first, after parsing the
types.
```
ParseResult parseOffsetsSizesAndStrides(
OpAsmParser &parser, OperationState &result, ArrayRef<int> segmentSizes,
llvm::function_ref<ParseResult(OpAsmParser &, OperationState &)>
preResolutionFn = nullptr,
llvm::function_ref<ParseResult(OpAsmParser &)> parseOptionalOffsetPrefix =
nullptr,
llvm::function_ref<ParseResult(OpAsmParser &)> parseOptionalSizePrefix =
nullptr,
llvm::function_ref<ParseResult(OpAsmParser &)> parseOptionalStridePrefix =
nullptr);
```
Differential revision: https://reviews.llvm.org/D92030
MLIR C API use the `MlirStringRef` instead of `const char *` for the string type now. This patch sync the Python bindings with the C API modification.
Differential Revision: https://reviews.llvm.org/D92007
* Was missed in the initial submission and is required for a ConstantLike op.
* Also adds a materializeConstant hook to preserve it.
* Tightens up the argument constraint on tosa.const to match what is actually legal.
Differential Revision: https://reviews.llvm.org/D92040
This revision will make it easier to create new ops base on the strided memref abstraction outside of the std dialect.
OffsetSizeAndStrideOpInterface is an interface for ops that allow specifying mixed dynamic and static offsets, sizes and strides variadic operands.
Ops that implement this interface need to expose the following methods:
1. `getArrayAttrRanks` to specify the length of static integer
attributes.
2. `offsets`, `sizes` and `strides` variadic operands.
3. `static_offsets`, resp. `static_sizes` and `static_strides` integer
array attributes.
The invariants of this interface are:
1. `static_offsets`, `static_sizes` and `static_strides` have length
exactly `getArrayAttrRanks()`[0] (resp. [1], [2]).
2. `offsets`, `sizes` and `strides` have each length at most
`getArrayAttrRanks()`[0] (resp. [1], [2]).
3. if an entry of `static_offsets` (resp. `static_sizes`,
`static_strides`) is equal to a special sentinel value, namely
`ShapedType::kDynamicStrideOrOffset` (resp. `ShapedType::kDynamicSize`,
`ShapedType::kDynamicStrideOrOffset`), then the corresponding entry is
a dynamic offset (resp. size, stride).
4. a variadic `offset` (resp. `sizes`, `strides`) operand must be present
for each dynamic offset (resp. size, stride).
This interface is useful to factor out common behavior and provide support
for carrying or injecting static behavior through the use of the static
attributes.
Differential Revision: https://reviews.llvm.org/D92011
Use the correct interface base type name when generating attribute interfaces
with TabeGen.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D92023
1. Move ThreadPool ownership to the runtime, and wait for the async tasks completion in the destructor.
2. Remove MLIR_ASYNCRUNTIME_EXPORT from method definitions because they are unnecessary in .cpp files, as only function declarations need to be exported, not their definitions.
3. Fix concurrency bugs in group emplace and potential use-after-free in token emplace.
Tested internally 10k runs in `async.mlir` and `async-group.mlir`.
Fixed: https://bugs.llvm.org/show_bug.cgi?id=48267
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91988
This file is intended to be included by other files, including
out-of-tree dialects, and makes more sense in `include` than in `lib`.
Depends On D91652
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91961
Attributes represent additional data about an operation and are intended to be
modifiable during the lifetime of the operation. In the dialect-specific Python
bindings, attributes are exposed as properties on the operation class. Allow
for assigning values to these properties. Also support creating new and
deleting existing attributes through the generic "attributes" property of an
operation. Any validity checking must be performed by the op verifier after the
mutation, similarly to C++. Operations are not invalidated in the process: no
dangling pointers can be created as all attributes are owned by the context and
will remain live even if they are not used in any operation.
Introduce a Python Test dialect by analogy with the Test dialect and to avoid
polluting the latter with Python-specific constructs. Use this dialect to
implement a test for the attribute access and mutation API.
Reviewed By: stellaraccident, mehdi_amini
Differential Revision: https://reviews.llvm.org/D91652
It is a simple conversion that only requires to change the region argument
types, generalize it from ParallelOp.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91989
While this makes the unit tests a bit more verbose, this simplifies the creation of bindings because only the bidirectional mapping between the host language's string type and MlirStringRef need to be implemented.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91905
Exposing some utility functions from Linalg to allow for promotion of
fused views outside of the core tile+fuse logic.
This is an alternative to patch D91322 which adds the promotion logic
to the tileAndFuse method. Downside with that approach is that it is
not easily customizable based on needs.
Differential Revision: https://reviews.llvm.org/D91503
Enhance the tile+fuse logic to allow fusing a sequence of operations.
Make sure the value used to obtain tile shape is a
SubViewOp/SubTensorOp. Current logic used to get the bounds of loop
depends on the use of `getOrCreateRange` method on `SubViewOp` and
`SubTensorOp`. Make sure that the value/dim used to compute the range
is from such ops. This fix is a reasonable WAR, but a btter fix would
be to make `getOrCreateRange` method be a method of `ViewInterface`.
Differential Revision: https://reviews.llvm.org/D90991
Previously, there was no way to add context to the diagnostic engine via the C API. Adding this ability makes it much easier to reason about memory ownership, particularly in reference-counted languages such as Swift. There are more details in the review comments.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D91738
An SCF 'for' loop does not iterate if its lower bound is equal to its upper
bound. Remove loops where both bounds are the same SSA value as such bounds are
guaranteed to be equal. Similarly, remove 'parallel' loops where at least one
pair of respective lower/upper bounds is specified by the same SSA value.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D91880
The existing implementation of the conversion from SCF Parallel operation to
SCF "for" loops in order to further convert those loops to branch-based CFG has
been cloning the loop and reduction body operations into the new loop because
ConversionPatternRewriter was missing support for moving blocks while replacing
their arguments. This functionality now available, use it to implement the
conversion and avoid cloning operations, which may lead to doubling of the IR
size during the conversion.
In addition, this fixes an issue with converting nested SCF "if" conditionals
present in "parallel" operations that would cause the conversion infrastructure
to stop because of the repeated application of the pattern converting "newly"
created "if"s (which were in fact just moved). Arguably, this should be fixed
at the infrastructure level and this fix is a workaround.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91955
This revision refactors code used in various Linalg transformations and makes it a first class citizen to the LinalgStructureOpInterface. This is in preparation to allowing more advanced Linalg behavior but is otherwise NFC.
Differential revision: https://reviews.llvm.org/D91863
- Fixes bug 48242 point 3 crash.
- Makes the improvments from points 1 & 2.
https://bugs.llvm.org/show_bug.cgi?id=48262
```
def RTLValueType : Type<CPred<"isRTLValueType($_self)">, "Type"> {
string cppType = "::mlir::Type";
}
```
Works now, but merely by happenstance. Parameters expects a `TypeParameter` class def or a string representing a c++ type but doesn't enforce it.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D91939