Port the translation of five dialects that define LLVM IR intrinsics
(LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect
interface-based mechanism. This allows us to remove individual translations
that were created for each of these dialects and just use one common
MLIR-to-LLVM-IR translation that potentially supports all dialects instead,
based on what is registered and including any combination of translatable
dialects. This removal was one of the main goals of the refactoring.
To support the addition of GPU-related metadata, the translation interface is
extended with the `amendOperation` function that allows the interface
implementation to post-process any translated operation with dialect attributes
from the dialect for which the interface is implemented regardless of the
operation's dialect. This is currently applied to "kernel" functions, but can
be used to construct other metadata in dialect-specific ways without
necessarily affecting operations.
Depends On D96591, D96504
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96592
Dialects themselves do not support repeated addition of interfaces with the
same TypeID. However, in case of delayed registration, the registry may contain
such an interface, or have the same interface registered several times due to,
e.g., dependencies. Make sure we delayed registration does not attempt to add
an interface with the same TypeID more than once.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96606
The patch extends the runner utils by verification methods that compare two memrefs. The methods compare the content of the two memrefs and print success if the data is identical up to a small numerical error. The methods are meant to simplify the development of integration tests that compare the results against a reference implementation (cf. the updates to the linalg matmul integration tests).
Originally landed in 5fa893c (https://reviews.llvm.org/D96326) and reverted in dd719fd due to a Windows build failure.
Changes:
- Remove the max function that requires the "algorithm" header on Windows
- Eliminate the truncation warning in the float specialization of verifyElem by using a float constant
Reviewed By: Kayjukh
Differential Revision: https://reviews.llvm.org/D96593
Currently, vector.contract joins the intermediate result and the accumulator
argument (of ranks K) using summation. We desire more joining operations ---
such as max --- to help vector.contract express reductions. This change extends
Vector_ContractionOp to take an optional attribute (called "kind", of enum type
CombiningKind) specifying the joining operation to be add/mul/min/max for int/fp
, and and/or/xor for int only. By default this attribute has value "add".
To implement this we also need to extend vector.outerproduct, since
vector.contract gets transformed to vector.outerproduct (and that to
vector.fma). The extension for vector.outerproduct is also an optional kind
attribute that uses the same enum type and possible values. The default is
"add". In case of max/min we transform vector.outerproduct to a combination of
compare and select.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D93280
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
This reverts commit 3f22547fd1 and reland 973e133b76 with a workaround for
a gcc bug that does not accept lambda default parameters:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59949
Differential Revision: https://reviews.llvm.org/D96598
Align the vector gather/scatter/expand/compress API with
the vector load/store/maskedload/maskedstore API.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D96396
This patch adds the 'vector.load' and 'vector.store' ops to the Vector
dialect [1]. These operations model *contiguous* vector loads and stores
from/to memory. Their semantics are similar to the 'affine.vector_load' and
'affine.vector_store' counterparts but without the affine constraints. The
most relevant feature is that these new vector operations may perform a vector
load/store on memrefs with a non-vector element type, unlike 'std.load' and
'std.store' ops. This opens the representation to model more generic vector
load/store scenarios: unaligned vector loads/stores, perform scalar and vector
memory access on the same memref, decouple memory allocation constraints from
memory accesses, etc [1]. These operations will also facilitate the progressive
lowering of both Affine vector loads/stores and Vector transfer reads/writes
for those that read/write contiguous slices from/to memory.
In particular, this patch adds the 'vector.load' and 'vector.store' ops to the
Vector dialect, implements their lowering to the LLVM dialect, and changes the
lowering of 'affine.vector_load' and 'affine.vector_store' ops to the new vector
ops. The lowering of Vector transfer reads/writes will be implemented in the
future, probably as an independent pass. The API of 'vector.maskedload' and
'vector.maskedstore' has also been changed slightly to align it with the
transfer read/write ops and the vector new ops. This will improve reusability
among all these operations. For example, the lowering of 'vector.load',
'vector.store', 'vector.maskedload' and 'vector.maskedstore' to the LLVM dialect
is implemented with a single template conversion pattern.
[1] https://llvm.discourse.group/t/memref-type-and-data-layout/
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96185
This reverts commit 973e133b76.
It triggers an issue in gcc5 that require investigation, the build is
broken with:
/tmp/ccdpj3B9.s: Assembler messages:
/tmp/ccdpj3B9.s:5821: Error: symbol `_ZNSt17_Function_handlerIFvjjEUljjE2_E9_M_invokeERKSt9_Any_dataOjS6_' is already defined
/tmp/ccdpj3B9.s:5860: Error: symbol `_ZNSt14_Function_base13_Base_managerIUljjE2_E10_M_managerERSt9_Any_dataRKS3_St18_Manager_operation' is already defined
Migrate the translation of the OpenMP dialect operations to LLVM IR to the new
dialect-based mechanism.
Depends On D96503
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96504
The existing approach to translation to the LLVM IR relies on a single
translation supporting the base LLVM dialect, extensible through inheritance to
support intrinsic-based dialects also derived from LLVM IR such as NVVM and
AVX512. This approach does not scale well as it requires additional
translations to be created for each new intrinsic-based dialect and does not
allow them to mix in the same module, contrary to the rest of the MLIR
infrastructure. Furthermore, OpenMP translation ingrained itself into the main
translation mechanism.
Start refactoring the translation to LLVM IR to operate using dialect
interfaces. Each dialect that contains ops translatable to LLVM IR can
implement the interface for translating them, and the top-level translation
driver can operate on interfaces without knowing about specific dialects.
Furthermore, the delayed dialect registration mechanism allows one to avoid a
dependency on LLVM IR in the dialect that is translated to it by implementing
the translation as a separate library and only registering it at the client
level.
This change introduces the new mechanism and factors out the translation of the
"main" LLVM dialect. The remaining dialects will follow suit.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96503
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
Differential Revision: https://reviews.llvm.org/D96598
Until now, the GPU translation to NVVM or ROCDL intrinsics relied on the
presence of the generic `gpu.kernel` attribute to attach additional LLVM IR
metadata to the relevant functions. This would be problematic if each dialect
were to handle the conversion of its own options, which is the intended
direction for the translation infrastructure. Introduce `nvvm.kernel` and
`rocdl.kernel` in addition to `gpu.kernel` and base translation on these new
attributes instead.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D96591
Rationale:
This computation failed ASAN for the following input
(integer overflow during 4032000000000000000 * 100):
tensor<100x200x300x400x500x600x700x800xf32>
This change adds a simple overflow detection during
debug mode (which we run more regularly than ASAN).
Arguably this is an unrealistic tensor input, but
in the context of sparse tensors, we may start to
see cases like this.
Bug:
https://bugs.llvm.org/show_bug.cgi?id=49136
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96530
The AffineMap in the MemRef inferred by SubViewOp may have uncompressed symbols which result in type mismatch on otherwise unused symbols. Make the computation of the AffineMap compress those unused symbols which results in better canonical types.
Additionally, improve the error message to report which inferred type was expected.
Differential Revision: https://reviews.llvm.org/D96551
Multi-configuration generators (such as Visual Studio and Xcode) allow the specification of a build flavor at build time instead of config time, so the lit configuration files need to support that - and they do for the most part. There are several places that had one of two issues (or both!):
1) Paths had %(build_mode)s set up, but then not configured, resulting in values that would not work correctly e.g. D:/llvm-build/%(build_mode)s/bin/dsymutil.exe
2) Paths did not have %(build_mode)s set up, but instead contained $(Configuration) (which is the value for Visual Studio at configuration time, for Xcode they would have had the equivalent) e.g. "D:/llvm-build/$(Configuration)/lib".
This seems to indicate that we still have a lot of fragility in the configurations, but also that a number of these paths are never used (at least on Windows) since the errors appear to have been there a while.
This patch fixes the configurations and it has been tested with Ninja and Visual Studio to generate the correct paths. We should consider removing some of these settings altogether.
Reviewed By: JDevlieghere, mehdi_amini
Differential Revision: https://reviews.llvm.org/D96427
ModuleTranslation contains multiple fields that keep track of the mappings
between various MLIR and LLVM IR components. The original ModuleTranslation
extension model was based on inheritance, with these fields being protected and
thus accessible in the ModuleTranslation and derived classes. The
inheritance-based model doesn't scale to translation of more than one derived
dialect and will be progressively replaced with a more flexible one based on
dialect interfaces and a translation state that is separate from
ModuleTranslation. This change prepares the replacement by making the mappings
private and providing public methods to access them.
Depends On D96436
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96437
Historically, JitRunner has been registering all available dialects with the
context and depending on them without the real need. Make it take a registry
that contains only the dialects that are expected in the input and stop linking
in all dialects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96436
With the standard dialect being split up, the set of dialects that are
used when converting to GPU is growing. This change modifies the
SCFToGpu pass to allow all operations inside launch bodies.
Differential Revision: https://reviews.llvm.org/D96480
The dimension order of a filter in tensorflow is
[filter_height, filter_width, in_channels, out_channels], which is different
from current definition. The current definition follows TOSA spec. Add TF
version conv ops to .tc, so we do not have to insert a transpose op around a
conv op.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D96038
The CMake changes in 2aa1af9b1d to make it possible to build MLIR as a
standalone project unfortunately disabled all unit-tests from the
regular in-tree build.
After discussion, it seems like we want to go with
"inherent/discardable". These seem to best capture the relationship with
the op semantics and don't conflict with other terms.
Please let me know your preferences. Some of the other contenders are:
```
"intrinsic" side | "annotation" side
-----------------+------------------
characteristic | annotation
closed | open
definitional | advisory
essential | discardable
expected | unexpected
innate | acquired
internal | external
intrinsic | extrinsic
known | unknown
local | global
native | foreign
inherent | acquired
```
Rationale:
- discardable: good. discourages use for stable data.
- inherent: good
- annotation: redundant and doesn't convey difference
- intrinsic: confusable with "compiler intrinsics".
- definitional: too much of a mounthful
- extrinsic: too exotic of a word and hard to say
- acquired: doesn't convey the relationship to the semantics
- internal/external: not immediately obvious: what is internal to what?
- innate: similar to intrinsic but worse
- acquired: we don't typically think of an op as "acquiring" things
- known/unknown: by who?
- local/global: to what?
- native/foreign: to where?
- advisory: confusing distinction: is the attribute itself advisory or
is the information it provides advisory?
- essential: an intrinsic attribute need not be present.
- expected: same issue as essential
- unexpected: by who/what?
- closed/open: whether the set is open or closed doesn't seem essential
to the attribute being intrinsic. Also, in theory an op can have an
unbounded set of intrinsic attributes (e.g. `arg<N>` for func).
- characteristic: unless you have a math background this probably
doesn't make as much sense
Differential Revision: https://reviews.llvm.org/D96093
This revision connects the generated sparse code with an actual
sparse storage scheme, which can be initialized from a test file.
Lacking a first-class citizen SparseTensor type (with buffer),
the storage is hidden behind an opaque pointer with some "glue"
to bring the pointer back to tensor land. Rather than generating
sparse setup code for each different annotated tensor (viz. the
"pack" methods in TACO), a single "one-size-fits-all" implementation
has been added to the runtime support library. Many details and
abstractions need to be refined in the future, but this revision
allows full end-to-end integration testing and performance
benchmarking (with on one end, an annotated Lingalg
op and, on the other end, a JIT/AOT executable).
Reviewed By: nicolasvasilache, bixia
Differential Revision: https://reviews.llvm.org/D95847
This reverts commit 11f32a41c2.
The build is broken because this commit conflits with the refactoring of
the DialectRegistry APIs in the context. It'll reland shortly after
fixing the API usage.
This revision fixes the indexing logic into the packed tensor that result from hoisting padding. Previously, the index was incorrectly set to the loop induction variable when in fact we need to compute the iteration count (i.e. `(iv - lb).ceilDiv(step)`).
Differential Revision: https://reviews.llvm.org/D96417
MLIRContext allows its users to access directly to the DialectRegistry it
contains. While sometimes useful for registering additional dialects on an
already existing context, this breaks the encapsulation by essentially giving
raw accesses to a part of the context's internal state. Remove this mutable
access and instead provide a method to append a given DialectRegistry to the
one already contained in the context. Also provide a shortcut mechanism to
construct a context from an already existing registry, which seems to be a
common use case in the wild. Keep read-only access to the registry contained in
the context in case it needs to be copied or used for constructing another
context.
With this change, DialectRegistry is no longer concerned with loading the
dialects and deciding whether to invoke delayed interface registration. Loading
is concentrated in the MLIRContext, and the functionality of the registry
better reflects its name.
Depends On D96137
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96331
This introduces a mechanism to register interfaces for a dialect without making
the dialect itself depend on the interface. The registration request happens on
DialectRegistry and, if the dialect has not been loaded yet, the actual
registration is delayed until the dialect is loaded. It requires
DialectRegistry to become aware of the context that contains it and the context
to expose methods for querying if a dialect is loaded.
This mechanism will enable a simple extension mechanism for dialects that can
have interfaces defined outside of the dialect code. It is particularly helpful
for, e.g., translation to LLVM IR where we don't want the dialect itself to
depend on LLVM IR libraries.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96137