17 KiB
'llvm' Dialect
This dialect maps LLVM IR into MLIR by defining the corresponding operations and types. LLVM IR metadata is usually represented as MLIR attributes, which offer additional structure verification.
We use "LLVM IR" to designate the intermediate representation of LLVM and "LLVM dialect" or "LLVM IR dialect" to refer to this MLIR dialect.
Unless explicitly stated otherwise, the semantics of the LLVM dialect operations
must correspond to the semantics of LLVM IR instructions and any divergence is
considered a bug. The dialect also contains auxiliary operations that smoothen
the differences in the IR structure, e.g., MLIR does not have phi
operations
and LLVM IR does not have a constant
operation. These auxiliary operations are
systematically prefixed with mlir
, e.g. llvm.mlir.constant
where llvm.
is
the dialect namespace prefix.
[TOC]
Dependency on LLVM IR
LLVM dialect is not expected to depend on any object that requires an
LLVMContext
, such as an LLVM IR instruction or type. Instead, MLIR provides
thread-safe alternatives compatible with the rest of the infrastructure. The
dialect is allowed to depend on the LLVM IR objects that don't require a
context, such as data layout and triple description.
Module Structure
IR modules use the built-in MLIR ModuleOp
and support all its features. In
particular, modules can be named, nested and are subject to symbol visibility.
Modules can contain any operations, including LLVM functions and globals.
Data Layout and Triple
An IR module may have an optional data layout and triple information attached
using MLIR attributes llvm.data_layout
and llvm.triple
, respectively. Both
are string attributes with the
same syntax as in LLVM IR and
are verified to be correct. They can be defined as follows.
module attributes {llvm.data_layout = "e",
llvm.target_triple = "aarch64-linux-android"} {
// module contents
}
Functions
LLVM functions are represented by a special operation, llvm.func
, that has
syntax similar to that of the built-in function operation but supports
LLVM-related features such as linkage and variadic argument lists. See detailed
description in the operation list below.
PHI Nodes and Block Arguments
MLIR uses block arguments instead of PHI nodes to communicate values between
blocks. Therefore, the LLVM dialect has no operation directly equivalent to
phi
in LLVM IR. Instead, all terminators can pass values as successor operands
as these values will be forwarded as block arguments when the control flow is
transferred.
For example:
^bb1:
%0 = llvm.addi %arg0, %cst : i32
llvm.br ^bb2[%0: i32]
// If the control flow comes from ^bb1, %arg1 == %0.
^bb2(%arg1: i32)
// ...
is equivalent to LLVM IR
%0:
%1 = add i32 %arg0, %cst
br %3
%3:
%arg1 = phi [%1, %0], //...
Since there is no need to use the block identifier to differentiate the source of different values, the LLVM dialect supports terminators that transfer the control flow to the same block with different arguments. For example:
^bb1:
llvm.cond_br %cond, ^bb2[%0: i32], ^bb2[%1: i32]
^bb2(%arg0: i32):
// ...
Context-Level Values
Some value kinds in LLVM IR, such as constants and undefs, are uniqued in
context and used directly in relevant operations. MLIR does not support such
values for thread-safety and concept parsimony reasons. Instead, regular values
are produced by dedicated operations that have the corresponding semantics:
llvm.mlir.constant
,
llvm.mlir.undef
,
llvm.mlir.null
. Note how these operations are
prefixed with mlir.
to indicate that they don't belong to LLVM IR but are only
necessary to model it in MLIR. The values produced by these operations are
usable just like any other value.
Examples:
// Create an undefined value of structure type with a 32-bit integer followed
// by a float.
%0 = llvm.mlir.undef : !llvm.struct<(i32, f32)>
// Null pointer to i8.
%1 = llvm.mlir.null : !llvm.ptr<i8>
// Null pointer to a function with signature void().
%2 = llvm.mlir.null : !llvm.ptr<func<void ()>>
// Constant 42 as i32.
%3 = llvm.mlir.constant(42 : i32) : i32
// Splat dense vector constant.
%3 = llvm.mlir.constant(dense<1.0> : vector<4xf32>) : vector<4xf32>
Note that constants use built-in types within the initializer definition: MLIR attributes are typed and the attributes used for constants require a built-in type.
Globals
Global variables are also defined using a special operation,
llvm.mlir.global
, located at the module
level. Globals are MLIR symbols and are identified by their name.
Since functions need to be isolated-from-above, i.e. values defined outside the
function cannot be directly used inside the function, an additional operation,
llvm.mlir.addressof
, is provided to
locally define a value containing the address of a global. The actual value
can then be loaded from that pointer, or a new value can be stored into it if
the global is not declared constant. This is similar to LLVM IR where globals
are accessed through name and have a pointer type.
Linkage
Module-level named objects in the LLVM dialect, namely functions and globals,
have an optional linkage attribute derived from LLVM IR
linkage types. Linkage is
specified by the same keyword as in LLVM IR and is located between the operation
name (llvm.func
or llvm.global
) and the symbol name. If no linkage keyword
is present, external
linkage is assumed by default. Linakge is distinct from
MLIR symbol visibility.
Attribute Pass-Through
The LLVM dialect provides a mechanism to forward function-level attributes to
LLVM IR using the passthrough
attribute. This is an array attribute containing
either string attributes or array attributes. In the former case, the value of
the string is interpreted as the name of LLVM IR function attribute. In the
latter case, the array is expected to contain exactly two string attributes, the
first corresponding to the name of LLVM IR function attribute, and the second
corresponding to its value. Note that even integer LLVM IR function attributes
have their value represented in the string form.
Example:
llvm.func @func() attributes {
passthrough = ["noinline", // value-less attribute
["alignstack", "4"], // integer attribute with value
["other", "attr"]] // attribute unknown to LLVM
} {
llvm.return
}
If the attribute is not known to LLVM IR, it will be attached as a string attribute.
Types
LLVM dialect defines a set of types that correspond to LLVM IR types. The dialect type system is closed: types from other dialects are not allowed within LLVM dialect aggregate types. This property allows for more concise custom syntax and ensures easy translation to LLVM IR.
Similarly to other MLIR context-owned objects, the creation and manipulation of LLVM dialect types is thread-safe.
MLIR does not support module-scoped named type declarations, e.g. %s = type {i32, i32}
in LLVM IR. Instead, types must be fully specified at each use,
except for recursive types where only the first reference to a named type needs
to be fully specified. MLIR type aliases are supported for top-level types, i.e.
they cannot be used inside the type due to type system closedness.
The general syntax of LLVM dialect types is !llvm.
, followed by a type kind
identifier (e.g., ptr
for pointer or struct
for structure) and by an
optional list of type parameters in angle brackets. The dialect follows MLIR
style for types with nested angle brackets and keyword specifiers rather than
using different bracket styles to differentiate types. Inside angle brackets,
the !llvm
prefix is omitted for brevity; thanks to closedness of the type
system, all types are assumed to be defined in the LLVM dialect. For example,
!llvm.ptr<struct<packed, (i8, i32)>>
is a pointer to a packed structure type
containing an 8-bit and a 32-bit integer.
Simple Types
The following non-parametric types are supported.
!llvm.fp128
(LLVMFP128Type
) - 128-bit floating-point value as per IEEE-754-2008.!llvm.x86_fp80
(LLVMX86FP80Type
) - 80-bit floating-point value (x87).!llvm.x86_mmx
(LLVMX86MMXType
) - value held in an MMX register on x86 machine.!llvm.ppc_fp128
(LLVMPPCFP128Type
) - 128-bit floating-point value (two 64 bits).!llvm.token
(LLVMTokenType
) - a non-inspectable value associated with an operation.!llvm.metadata
(LLVMMetadataType
) - LLVM IR metadata, to be used only if the metadata cannot be represented as structured MLIR attributes.!llvm.void
(LLVMVoidType
) - does not represent any value; can only appear in function results.
These types represent a single value (or an absence thereof in case of void
)
and correspond to their LLVM IR counterparts.
Parametric Types
Integer Types
Integer types are parametric in MLIR terminology, with their bitwidth being a type parameter. They are expressed as follows:
llvm-int-type ::= `!llvm.i` integer-literal
and represented internally as LLVMIntegerType
. For example, i1
is a 1-bit
integer type (bool) and i32
as a 32-bit integer type.
Pointer Types
Pointer types specify an address in memory.
Pointer types are parametric types parameterized by the element type and the address space. The address space is an integer, but this choice may be reconsidered if MLIR implements named address spaces. Their syntax is as follows:
llvm-ptr-type ::= `!llvm.ptr<` llvm-type (`,` integer-literal)? `>`
where the optional integer literal corresponds to the memory space. Both cases
are represented by LLVMPointerType
internally.
Vector Types
Vector types represent sequences of elements, typically when multiple data elements are processed by a single instruction (SIMD). Vectors are thought of as stored in registers and therefore vector elements can only be addressed through constant indices.
Vector types are parameterized by the size, which may be either fixed or a multiple of some fixed size in case of scalable vectors, and the element type. Vectors cannot be nested and only 1D vectors are supported. Scalable vectors are still considered 1D. Their syntax is as follows:
llvm-vec-type ::= `vector<` (`?` `x`)? integer-literal `x` llvm-type `>`
Internally, fixed vector types are represented as LLVMFixedVectorType
and
scalable vector types are represented as LLVMScalableVectorType
. Both classes
deriveLLVMVectorType
.
Array Types
Array types represent sequences of elements in memory. Unlike vectors, array elements can be addressed with a value unknown at compile time, and can be nested. Only 1D arrays are allowed though.
Array types are parameterized by the fixed size and the element type. Syntactically, their representation is close to vectors:
llvm-array-type ::= `!llvm.array<` integer-literal `x` llvm-type `>`
and are internally represented as LLVMArrayType
.
Function Types
Function types represent the type of a function, i.e. its signature.
Function types are parameterized by the result type, the list of argument types
and by an optional "variadic" flag. Unlike built-in FunctionType
, LLVM dialect
functions (LLVMFunctionType
) always have single result, which may be
!llvm.void
if the function does not return anything. The syntax is as follows:
llvm-func-type ::= `!llvm.func<` llvm-type `(` llvm-type-list (`,` `...`)?
`)` `>`
For example,
!llvm.func<void ()> // a function with no arguments;
!llvm.func<i32 (f32, i32)> // a function with two arguments and a result;
!llvm.func<void (i32, ...)> // a variadic function with at least one argument.
In the LLVM dialect, functions are not first-class objects and one cannot have a value of function type. Instead, one can take the address of a function and operate on pointers to functions.
Structure Types
The structure type is used to represent a collection of data members together in memory. The elements of a structure may be any type that has a size.
Structure types are represented in a single dedicated class mlir::LLVM::LLVMStructType. Internally, the struct type stores a (potentially empty) name, a (potentially empty) list of contained types and a bitmask indicating whether the struct is named, opaque, packed or uninitialized. Structure types that don't have a name are referred to as literal structs. Such structures are uniquely identified by their contents. Identified structs on the other hand are uniquely identified by the name.
Identified Structure Types
Identified structure types are uniqued using their name in a given context. Attempting to construct an identified structure with the same name a structure that already exists in the context will result in the existing structure being returned. MLIR does not auto-rename identified structs in case of name conflicts because there is no naming scope equivalent to a module in LLVM IR since MLIR modules can be arbitrarily nested.
Programmatically, identified structures can be constructed in an uninitialized state. In this case, they are given a name but the body must be set up by a later call, using MLIR's type mutation mechanism. Such uninitialized types can be used in type construction, but must be eventually initialized for IR to be valid. This mechanism allows for constructing recursive or mutually referring structure types: an uninitialized type can be used in its own initialization.
Once the type is initialized, its body cannot be changed anymore. Any further attempts to modify the body will fail and return failure to the caller unless the type is initialized with the exact same body. Type initialization is thread-safe; however, if a concurrent thread initializes the type before the current thread, the initialization may return failure.
The syntax for identified structure types is as follows.
llvm-ident-struct-type ::= `!llvm.struct<` string-literal, `opaque` `>`
| `!llvm.struct<` string-literal, `packed`?
`(` llvm-type-or-ref-list `)` `>`
llvm-type-or-ref-list ::= <maybe empty comma-separated list of llvm-type-or-ref>
llvm-type-or-ref ::= <any llvm type>
| `!llvm.struct<` string-literal >
The body of the identified struct is printed in full unless the it is
transitively contained in the same struct. In the latter case, only the
identifier is printed. For example, the structure containing the pointer to
itself is represented as !llvm.struct<"A", (ptr<"A">)>
, and the structure A
containing two pointers to the structure B
containing a pointer to the
structure A
is represented as !llvm.struct<"A", (ptr<"B", (ptr<"A">)>, ptr<"B", (ptr<"A">))>
. Note that the structure B
is "unrolled" for both
elements. A structure with the same name but different body is a syntax error.
The user must ensure structure name uniqueness across all modules processed in
a given MLIR context. Structure names are arbitrary string literals and may
include, e.g., spaces and keywords.
Identified structs may be opaque. In this case, the body is unknown but the structure type is considered initialized and is valid in the IR.
Literal Structure Types
Literal structures are uniqued according to the list of elements they contain, and can optionally be packed. The syntax for such structs is as follows.
llvm-literal-struct-type ::= `!llvm.struct<` `packed`? `(` llvm-type-list `)`
`>`
llvm-type-list ::= <maybe empty comma-separated list of llvm types w/o `!llvm`>
Literal structs cannot be recursive, but can contain other structs. Therefore, they must be constructed in a single step with the entire list of contained elements provided.
Examples of Structure Types
!llvm.struct<> // NOT allowed
!llvm.struct<()> // empty, literal
!llvm.struct<(i32)> // literal
!llvm.struct<(struct<(i32)>)> // struct containing a struct
!llvm.struct<packed (i8, i32)> // packed struct
!llvm.struct<"a"> // recursive reference, only allowed within
// another struct, NOT allowed at top level
!llvm.struct<"a", ptr<struct<"a">>> // supported example of recursive reference
!llvm.struct<"a", ()> // empty, named (necessary to differentiate from
// recursive reference)
!llvm.struct<"a", opaque> // opaque, named
!llvm.struct<"a", (i32)> // named
!llvm.struct<"a", packed (i8, i32)> // named, packed
Unsupported Types
LLVM IR label
type does not have a counterpart in the LLVM dialect since, in
MLIR, blocks are not values and don't need a type.
Operations
All operations in the LLVM IR dialect have a custom form in MLIR. The mnemonic
of an operation is that used in LLVM IR prefixed with "llvm.
".
[include "Dialects/LLVMOps.md"]