This change adds three new operations to the GPU dialect: gpu.mma.sync,
gpu.mma.ldmatrix, and gpu.lane_id. The former two are meant to target
the lower level nvvm.mma.sync and nvvm.ldmatrix instructions, respectively.
Lowerings are added for the new GPU operations for conversion to
NVVM.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D123647
Support unrolling for vector.transpose following the same interface as
other vector unrolling ops.
Differential Revision: https://reviews.llvm.org/D123688
StrEnumAttr has been deprecated in favour of EnumAttr, a solution based on AttrDef (https://reviews.llvm.org/D115181). This patch removes StrEnumAttr, along with all the custom ODS logic required to handle it.
See https://discourse.llvm.org/t/psa-stop-using-strenumattr-do-use-enumattr/5710 on how to transition to EnumAttr. In short,
```
// Before
def MyEnumAttr : StrEnumAttr<"MyEnum", "", [
StrEnumAttrCase<"A">,
StrEnumAttrCase<"B">
]>;
// After (pick an integer enum of your choice)
def MyEnum : I32EnumAttr<"MyEnum", "", [
I32EnumAttrCase<"A", 0>,
I32EnumAttrCase<"B", 1>
]> {
// Don't generate a C++ class! We want to use the AttrDef
let genSpecializedAttr = 0;
}
// Define the AttrDef
def MyEnum : EnumAttr<MyDialect, MyEnum, "my_enum">;
```
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D120834
Enable specifying additional include directories to search. This is
consistent with what one can do with clangd (although there it is more
general compilation options) and Python LSP. We would in general expect
these to be provided by compilation database equivalent.
Differential Revision: https://reviews.llvm.org/D123474
This patch adds thread_local to llvm.mlir.global and adds translation for dso_local and addr_space to and from LLVM IR.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D123412
This revision replaces current type cast constant folder with a new common type cast constant folder function template.
It will cover all former folder and support fold the constant splat and vector.
Differential Revision: https://reviews.llvm.org/D123489
This change generalizes the fusion of `tensor.expand_shape` ->
`linalg.generic` op by collapsing to handle cases where only a subset
of the reassociations specified in the `tensor.expand_shape` are valid
to be collapsed.
The method that does the collapsing is refactored to allow it to be a
generic utility when required.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D123153
This patch adds tasking construct according to Section 2.10.1 of OpenMP 5.0
Reviewed By: peixin, kiranchandramohan, abidmalikwaterloo
Differential Revision: https://reviews.llvm.org/D123575
This patch removes inheritence from PresburgerSpace in IntegerRelation and
instead makes it a member of these classes.
This is required for three reasons:
- It prevents implicit casting to PresburgerSpace.
- Not all functions of PresburgerSpace need to be exposed by the deriving classes.
- IntegerRelation and IntegerPolyhedron are defined in a PresburgerSpace. It
makes more sense for the space to be a member instead of them inheriting from
a space.
Reviewed By: arjunp, ftynse
Differential Revision: https://reviews.llvm.org/D123585
With this change, there's going to be a clear distinction between LLVM
and MLIR pass maanger options (e.g. `-mlir-print-after-all` vs
`-print-after-all`). This change is desirable from the point of view of
projects that depend on both LLVM and MLIR, e.g. Flang.
For consistency, all pass manager options in MLIR are prefixed with
`mlir-`, even options that don't have equivalents in LLVM .
Differential Revision: https://reviews.llvm.org/D123495
Lookup iter_arg buffers using `lookupBuffer` instead of always creating a new `ToMemrefOp`. Also cast all yielded buffers (if necessary), regardless of whether they are an equivalent buffer or a new allocation.
Note: This should have been part of D123369.
Differential Revision: https://reviews.llvm.org/D123383
Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D123482
This supports the threadprivate directive in OpenMP dialect following
the OpenMP 5.1 [2.21.2] standard. Also lowering to LLVM IR using OpenMP
IRBduiler.
Reviewed By: kiranchandramohan, shraiysh, arnamoy10
Differential Revision: https://reviews.llvm.org/D123350
Adding annotations on as-needed bases, currently only for memrefCopy, but in general all C API functions that take pointers to memory allocated/initialized inside the jit-compiled code must be annotated, to be able to run with msan.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123557
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reland: Fixed custom calls to `-lower-matrix-intrinsics` in integration tests by replacing them with `-O0 -enable-matrix`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425
The method to add elementwise ops fusion patterns pulls in many other
patterns by default. The patterns to pull in along with the
elementwise op fusion should be upto the caller. Split the method to
pull in just the elementwise ops fusion pattern. Other cleanup changes
include
- Move the pattern for constant folding of generic ops (currently only
constant folds transpose) into a separate file, cause it is not
related to fusion
- Drop the uber LinalgElementwiseFusionOptions. With the
populateElementwiseOpsFusionPatterns being split, this has no
utility now.
- Drop defaults for the control function.
- Fusion of splat constants with generic ops doesnt need a control
function. It is always good to do.
Differential Revision: https://reviews.llvm.org/D123236
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425
This avoids emitting errors in situations where the user doesn't have a server
setup, and doesn't mean to (e.g. when they merely want syntax highlighting).
Differential Revision: https://reviews.llvm.org/D123240
We currently proactively create language clients for every workspace folder,
and every language. This makes startup time more costly, and also emits errors
for missing language servers in contexts that the user currently isn't in. For example,
if a user opens a .mlir file we don't want to emit errors about .pdll files. We also don't
want to emit errors for missing servers in workspace folders that don't even utilize
MLIR.
This commit refactors client creation to lazy-load when a document that requires the
server is opened.
Differential Revision: https://reviews.llvm.org/D123184
In a previous commit we added proper support for separate configurations
per workspace folder, but that effectively broke support for processing out-of-workspace
files. Given how useful this is (e.g. when iterating on a test case in /tmp), this
commit refactors server creation to support this again. We support this case using
a "fallback" server that specifically handles files not within the workspace. This uses
the configuration settings for the current workspace itself (not the specific folder).
Differential Revision: https://reviews.llvm.org/D123183
We don't actually have any documentation today for how to
declaratively define a dialect. This commit rectifies that and properly
documents how to define a Dialect in tablegen, and details all of
the possible fields.
Differential Revision: https://reviews.llvm.org/D123258
OpBase is currently extremely overbloated with constructs. This
commit continues the current process of cleaning this up, by splitting
out dialect definition constructs. This maps the ODS side more closely
to the C++ side.
Differential Revision: https://reviews.llvm.org/D123257
Normalize some of the division and inequality expressions used,
which can improve performance. Also deduplicate some of the
normalization functionality throughout the Presburger library.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123314
When making the subtract implementation non-recursive, tail calls were
implemented by incrementing the level but not pushing a frame, and returning
was implemented as returning to the level corresponding to the number of frames in the stack.
This is incorrect, as there could be a case where we tail-recurse at `level`,
and then recurse at `level + 1`, pushing a frame. However, because the previous
frame was missing, this new frame would be interpreted as corresponding to
`level` and not `level + 1`. Fix this by removing the special handling of tail
calls and just doing them as normal recursion, as this is the simplest correct
implementation and handling them specifically would be a premature optimization.
The impact of this bug is only on performance as this can only lead to
unnecessary subtractions of the same disjuncts multiples times. As subtraction
is idempotent, and rationally empty disjuncts are always discarded, this
does not affect the output, so this patch does not include a regression test.
(This also does not affect termination.)
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D123327
This patch contains several ODS-level optimizations to attribute getters and getting.
1. OpAdaptors, when provided a DictionaryAttr, will instantiate an OperationName so that adaptor attribute getters can used cached identifiers.
2. Verifiers will take advantage of attributes stored in sorted order to get all required (non-optional, non-default valued, and non-derived) attributes in one pass over the attribute dictionary and verify that they are present.
3. ODS-generated attribute getters will use "subrange" lookup. Because the attributes are stored in sorted order and ODS knows which attributes are required, the number of required attributes less than and greater than each attribute can be computed. When searching for an attribute, the ends of the search range can be dropped.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D122430