Writes into tensors that are definied outside of a repetitive region, but with the write happening inside of the repetitive region were previously not considered conflicts. This was incorrect.
E.g.:
```
%0 = ... : tensor<?xf32>
scf.for ... {
"reading_op"(%0) : tensor<?xf32>
%1 = "writing_op"(%0) : tensor<?xf32> -> tensor<?xf32>
...
}
```
In the above example, "writing_op" should be out-of-place.
This commit fixes the bufferization for any op that declares its repetitive semantics via RegionBranchOpInterface.
This patch adds check of supported reduction kind for ScanOp to avoid using and/or/xor for floating point type.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D123977
Introduce a method on PyMlirContext (and plumb it through to Python) to
invalidate all of the operations in the live operations map and clear
it. Since Python has no notion of private data, an end-developer could
reach into some 3rd party API which uses the MLIR Python API (that is
behaving correctly with regard to holding references) and grab a
reference to an MLIR Python Operation, preventing it from being
deconstructed out of the live operations map. This allows the API
developer to clear the map when it calls C++ code which could delete
operations, protecting itself from its users.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D123895
getUpperBound is analogous to getLowerBound(), except for the upper
bound, and is used in range analysis.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124020
Sequence is an important transform combination primitive that just indicates
transform ops being applied in a row. The simplest version requires fails
immediately if any transformation in the sequence fails. Introducing this
operation allows one to start placing transform IR within other IR.
Depends On D123135
Reviewed By: Mogball, rriddle
Differential Revision: https://reviews.llvm.org/D123664
This patch adds a new function `mlirDenseElementsAttrBFloat16Get()`,
which accepts the shaped type, the number of BFloat16 values, and a
pointer to an array of BFloat16 values, each of which is a `uint16_t`
value.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D123981
The printer is now resilient to invalid IR and will already automatically
fallback to the generic form on invalid IR. Using the generic printer on
pass failure was a conservative option before the printer was made
failsafe.
Reviewed By: lattner, rriddle, jpienaar, bondhugula
Differential Revision: https://reviews.llvm.org/D123915
Fold away gpu.memcpy op when only uses of dest are
the memcpy op in question, its allocation and deallocation
ops.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D121279
Add helper functions to check if an op may be executed multiple times based on RegionBranchOpInterface.
Differential Revision: https://reviews.llvm.org/D123789
This reverts commit af0285122f.
The test "libomp::loop_dispatch.c" on builder
openmp-gcc-x86_64-linux-debian fails from time-to-time.
See #54969. This patch is unrelated.
This patch removes inheritence of MultiAffineFunction from IntegerPolyhedron
and instead makes IntegerPolyhedron as a member.
This patch removes virtualization in MultiAffineFunction and also removes
unnecessary functions inherited from IntegerPolyhedron.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D123921
The OMPScheduleType enum stores the constants from libomp's internal sched_type in kmp.h and are used by several kmp API functions. The enum values have an internal structure, namely each scheduling algorithm (e.g.) exists in four variants: unordered, orderend, normerge unordered, and nomerge ordered.
This patch (basically a followup to D114940) splits the "ordered" and "nomerge" bits into separate flags, as was already done for the "monotonic" and "nonmonotonic", so we can apply bit flags operations on them. It also now contains all possible combinations according to kmp's sched_type. Deriving of the OMPScheduleType enum from clause parameters has been moved form MLIR's OpenMPToLLVMIRTranslation.cpp to OpenMPIRBuilder to make available for clang as well. Since the primary purpose of the flag is the binary interface to libomp, it has been made more private to LLVMFrontend. The primary interface for generating worksharing-loop using OpenMPIRBuilder code becomes `applyWorkshareLoop` which derives the OMPScheduleType automatically and calls the appropriate emitter function.
While this is mostly a NFC refactor, it still applies the following functional changes:
* The logic from OpenMPToLLVMIRTranslation to derive the OMPScheduleType also applies to clang. Most notably, it now applies the nonmonotonic flag for non-static schedules by default.
* In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was previously not applied if the simd modifier was used. I assume this was a bug, since the effect was due to `loop.schedule_modifier()` returning `mlir::omp::ScheduleModifier::none` instead of `llvm::Optional::None`.
* In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was set even if ordered was specified, in breach to what the comment before citing the OpenMP specification says. I assume this was an oversight.
The ordered flag with parameter was not considered in this patch. Changes will need to be made (e.g. adding/modifying function parameters) when support for it is added. The lengthy names of the enum values can be discussed, for the moment this is avoiding reusing previously existing enum value names such as `StaticChunked` to avoid confusion.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D123403
Reproducers that resulted in triggering the following asserts
mlir::NamedAttribute::NamedAttribute(mlir::StringAttr, mlir::Attribute)
mlir/lib/IR/Attributes.cpp:29:3
consumeToken
mlir/lib/Parser/Parser.h:126
Differential Revision: https://reviews.llvm.org/D122240
This patch modifies mergeLocalIds to not delete duplicate local ids in
`this` relation. This allows the ordering of the final local ids for `this`
to be determined more easily, which is generally required when other objects
refer to these local ids.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D123866
This revision folds transpose splat to a new splat with the transposed vector type. For a splat, there is no need to actually do transpose for it, it would be more effective to just build a new splat as the result.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D123765
The generic form of the op is too verbose and in some cases not
readable. On pass failure, ops have been so far printed in generic form
to provide a (stronger) guarantee that the IR print succeeds. However,
in a large number of pass failure cases, the IR is still valid and
the custom printers for the ops will succeed. In fact, readability is
highly desirable post pass failure. This revision provides an option to
print ops in their custom/pretty-printed form on IR failure -- this
option is unsafe and there is no guarantee it will succeed. It's
disabled by default and can be turned on only if needed.
Differential Revision: https://reviews.llvm.org/D123893
This patch takes advantage of the Commutative trait on operation
to remove identical commutative operations where the operands are swapped.
The second operation below can be removed since `arith.addi` is commutative.
```
%1 = arith.addi %a, %b : i32
%2 = arith.addi %b, %a : i32
```
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123492
This helps to prevent tsan failures when users inadvertantly mutate the
context in a non-safe way.
Differential Revision: https://reviews.llvm.org/D112021
Previously this checked if the entire symbolic numerator was divisible by the
denominator, which is never the case when this function is called. Fixed this to
check only the non-const coefficients in the numerator, which was what was
intended and documented.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123592
extract was incorrectly folded when the source was coming from a
broadcast that was both adding new rank and broadcasting the inner
dimension.
Differential Revision: https://reviews.llvm.org/D123867
When the sample value is zero, everything is the same except that failure to
pivot does not imply emptiness. So, leave it to the user to mark as empty if
necessary, if they know the sample value is strictly negative. This is needed
for an upcoming symbolic lexmin heuristic.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123604
Changes the algorithm of LICM to support graph regions (no guarantee of topologically sorted order). Also fixes an issue where ops with recursive side effects and regions would not be hoisted if any nested ops used operands that were defined within the nested region.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D122465
Operation clone is currently faulty.
Suppose you have a block like as follows:
```
(%x0 : i32) {
%x1 = f(%x0)
return %x1
}
```
The test case we have is that we want to "unroll" this, in which we want to change this to compute `f(f(x0))` instead of just `f(x0)`. We do so by making a copy of the body at the end of the block and set the uses of the argument in the copy operations with the value returned from the original block.
This is implemented as follows:
1) map to the block arguments to the returned value (`map[x0] = x1`).
2) clone the body
Now for this small example, this works as intended and we get the following.
```
(%x0 : i32) {
%x1 = f(%x0)
%x2 = f(%x1)
return %x2
}
```
This is because the current logic to clone `x1 = f(x0)` first looks up the arguments in the map (which finds `x0` maps to `x1` from the initialization), and then sets the map of the result to the cloned result (`map[x1] = x2`).
However, this fails if `x0` is not an argument to the op, but instead used inside the region, like below.
```
(%x0 : i32) {
%x1 = f() {
yield %x0
}
return %x1
}
```
This is because cloning an op currently first looks up the args (none), sets the map of the result (`map[%x1] = %x2`), and then clones the regions. This results in the following, which is clearly illegal:
```
(%x0 : i32) {
%x1 = f() {
yield %x0
}
%x2 = f() {
yield %x2
}
return %x2
}
```
Diving deeper, this is partially due to the ordering (how this PR fixes it), as well as how region cloning works. Namely it will first clone with the mapping, and then it will remap all operands. Since the ordering above now has a map of `x0 -> x1` and `x1 -> x2`, we end up with the incorrect behavior here.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D122531
LLVM IR is moving towards adoption of opaque pointer types. These require extra
information to be passed when constructing some operations, in particular GEP
and Alloca. Adapt the builders of said operations and modify the translation
code to handle both opaque and non-opaque pointers.
This incidentally adds the translation for Alloca alignment and fixes the translation
of struct-related GEP indices that must be constant.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D123792
Similar to the existing pattern for reodering cast(transpose),
this makes transpose following transpose and increases the chance
of embedding the transposition inside contraction op. Actually
cast ops are just special instances of elementwise ops.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D123596
In order to increase parallism, certain ops with regions and have the
IsIsolatedFromAbove trait will have their verification delayed. That
means the region verifier may access the invalid ops and may lead to a
crash.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D122771
This dialect provides operations that can be used to control transformation of
the IR using a different portion of the IR. It refers to the IR being
transformed as payload IR, and to the IR guiding the transformation as
transform IR.
The main use case for this dialect is orchestrating fine-grain transformations
on individual operations or sets thereof. For example, it may involve finding
loop-like operations with specific properties (e.g., large size) in the payload
IR, applying loop tiling to those and only those operations, and then applying
loop unrolling to the inner loops produced by the previous transformations. As
such, it is not intended as a replacement for the pass infrastructure, nor for
the pattern rewriting infrastructure. In the most common case, the transform IR
will be processed and applied to payload IR by a pass. Transformations
expressed by the transform dialect may be implemented using the pattern
infrastructure or any other relevant MLIR component.
This dialect is designed to be extensible, that is, clients of this dialect are
allowed to inject additional operations into this dialect using the newly
introduced in this patch `TransformDialectExtension` mechanism. This allows the
dialect to avoid a dependency on the implementation of the transformation as
well as to avoid introducing dialect-specific transform dialects.
See https://discourse.llvm.org/t/rfc-interfaces-and-dialects-for-precise-ir-transformation-control/60927.
Reviewed By: nicolasvasilache, Mogball, rriddle
Differential Revision: https://reviews.llvm.org/D123135
Move the operations that correspond to LLVM IR intrinsics in a separate .td
file. This makes it easier to maintain the intrinsics and decreases the compile
time of LLVMDialect.cpp by ~25%.
Depends On D123310
Reviewed By: wsmoses, jacquesguan
Differential Revision: https://reviews.llvm.org/D123315
LLVM IR has introduced and is moving forward with the concept of opaque
pointers, i.e. pointer types that are not carrying around the pointee type.
Instead, memory-related operations indicate the type of the data being accessed
through the opaque pointer. Introduce the initial support for opaque pointers
in the LLVM dialect:
- `LLVMPointerType` to support omitting the element type;
- alloca/load/store/gep to support opaque pointers in their operands and
results; this requires alloca and gep to store the element type as an
attribute;
- memory-related intrinsics to support opaque pointers in their operands;
- translation to LLVM IR for the ops above is no longer using methods
deprecated in LLVM API due to the introduction of opaque pointers.
Unlike LLVM IR, MLIR can afford to support both opaque and non-opaque pointers
at the same time and simplify the transition. Translation to LLVM IR of MLIR
that involves opaque pointers requires the LLVMContext to be configured to
always use opaque pointers.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D123310
This change adds three new operations to the GPU dialect: gpu.mma.sync,
gpu.mma.ldmatrix, and gpu.lane_id. The former two are meant to target
the lower level nvvm.mma.sync and nvvm.ldmatrix instructions, respectively.
Lowerings are added for the new GPU operations for conversion to
NVVM.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D123647
Support unrolling for vector.transpose following the same interface as
other vector unrolling ops.
Differential Revision: https://reviews.llvm.org/D123688
StrEnumAttr has been deprecated in favour of EnumAttr, a solution based on AttrDef (https://reviews.llvm.org/D115181). This patch removes StrEnumAttr, along with all the custom ODS logic required to handle it.
See https://discourse.llvm.org/t/psa-stop-using-strenumattr-do-use-enumattr/5710 on how to transition to EnumAttr. In short,
```
// Before
def MyEnumAttr : StrEnumAttr<"MyEnum", "", [
StrEnumAttrCase<"A">,
StrEnumAttrCase<"B">
]>;
// After (pick an integer enum of your choice)
def MyEnum : I32EnumAttr<"MyEnum", "", [
I32EnumAttrCase<"A", 0>,
I32EnumAttrCase<"B", 1>
]> {
// Don't generate a C++ class! We want to use the AttrDef
let genSpecializedAttr = 0;
}
// Define the AttrDef
def MyEnum : EnumAttr<MyDialect, MyEnum, "my_enum">;
```
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D120834
Enable specifying additional include directories to search. This is
consistent with what one can do with clangd (although there it is more
general compilation options) and Python LSP. We would in general expect
these to be provided by compilation database equivalent.
Differential Revision: https://reviews.llvm.org/D123474
This patch adds thread_local to llvm.mlir.global and adds translation for dso_local and addr_space to and from LLVM IR.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D123412
This revision replaces current type cast constant folder with a new common type cast constant folder function template.
It will cover all former folder and support fold the constant splat and vector.
Differential Revision: https://reviews.llvm.org/D123489
This change generalizes the fusion of `tensor.expand_shape` ->
`linalg.generic` op by collapsing to handle cases where only a subset
of the reassociations specified in the `tensor.expand_shape` are valid
to be collapsed.
The method that does the collapsing is refactored to allow it to be a
generic utility when required.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D123153
This patch adds tasking construct according to Section 2.10.1 of OpenMP 5.0
Reviewed By: peixin, kiranchandramohan, abidmalikwaterloo
Differential Revision: https://reviews.llvm.org/D123575
This patch removes inheritence from PresburgerSpace in IntegerRelation and
instead makes it a member of these classes.
This is required for three reasons:
- It prevents implicit casting to PresburgerSpace.
- Not all functions of PresburgerSpace need to be exposed by the deriving classes.
- IntegerRelation and IntegerPolyhedron are defined in a PresburgerSpace. It
makes more sense for the space to be a member instead of them inheriting from
a space.
Reviewed By: arjunp, ftynse
Differential Revision: https://reviews.llvm.org/D123585
With this change, there's going to be a clear distinction between LLVM
and MLIR pass maanger options (e.g. `-mlir-print-after-all` vs
`-print-after-all`). This change is desirable from the point of view of
projects that depend on both LLVM and MLIR, e.g. Flang.
For consistency, all pass manager options in MLIR are prefixed with
`mlir-`, even options that don't have equivalents in LLVM .
Differential Revision: https://reviews.llvm.org/D123495
Lookup iter_arg buffers using `lookupBuffer` instead of always creating a new `ToMemrefOp`. Also cast all yielded buffers (if necessary), regardless of whether they are an equivalent buffer or a new allocation.
Note: This should have been part of D123369.
Differential Revision: https://reviews.llvm.org/D123383
Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D123482
This supports the threadprivate directive in OpenMP dialect following
the OpenMP 5.1 [2.21.2] standard. Also lowering to LLVM IR using OpenMP
IRBduiler.
Reviewed By: kiranchandramohan, shraiysh, arnamoy10
Differential Revision: https://reviews.llvm.org/D123350
Adding annotations on as-needed bases, currently only for memrefCopy, but in general all C API functions that take pointers to memory allocated/initialized inside the jit-compiled code must be annotated, to be able to run with msan.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123557
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reland: Fixed custom calls to `-lower-matrix-intrinsics` in integration tests by replacing them with `-O0 -enable-matrix`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425
The method to add elementwise ops fusion patterns pulls in many other
patterns by default. The patterns to pull in along with the
elementwise op fusion should be upto the caller. Split the method to
pull in just the elementwise ops fusion pattern. Other cleanup changes
include
- Move the pattern for constant folding of generic ops (currently only
constant folds transpose) into a separate file, cause it is not
related to fusion
- Drop the uber LinalgElementwiseFusionOptions. With the
populateElementwiseOpsFusionPatterns being split, this has no
utility now.
- Drop defaults for the control function.
- Fusion of splat constants with generic ops doesnt need a control
function. It is always good to do.
Differential Revision: https://reviews.llvm.org/D123236
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425