Vector to SCF conversion still had issues due to the interaction with the natural alignment derived by the LLVM data layout. One traditional workaround is to allocate aligned. However, this does not always work for vector sizes that are non-powers of 2.
This revision implements a more portable mechanism where the intermediate allocation is always a memref of elemental vector type. AllocOp is extended to use the natural LLVM DataLayout alignment for non-scalar types, when the alignment is not specified in the first place.
An integration test is added that exercises the transfer to scf.for + scalar lowering with a 5x5 transposition.
Differential Revision: https://reviews.llvm.org/D87150
When allowed, use 32-bit indices rather than 64-bit indices in the
SIMD computation of masks. This runs up to 2x and 4x faster on
a number of AVX2 and AVX512 microbenchmarks.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D87116
Added 128 byte alignment to alloc ops created in VectorToSCF pass.
128b alignment was already introduced to this pass but not to all alloc
ops. This commit changes that by adding 128b alignment to the remaining ops.
The point of specifying alignment is to prevent possible memory alignment errors
on weakly tested architectures.
Differential Revision: https://reviews.llvm.org/D86454
Adding a conversion pattern for the parallel Operation. This will
help the conversion of parallel operation with standard dialect to
parallel operation with llvm dialect. The type conversion of the block
arguments in a parallel region are controlled by the pattern for the
parallel Operation. Without this pattern, a parallel Operation with
block arguments cannot be converted from standard to LLVM dialect.
Other OpenMP operations without regions are marked as legal. When
translation of OpenMP operations with regions are added then patterns
for these operations can also be added.
Also uses all the standard to llvm patterns. Patterns of other dialects
can be added later if needed.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D86273
This patch allows to pass the gpu module name to SPIR-V
module during conversion. This has many benefits as we can lookup
converted to SPIR-V kernel in the symbol table.
In order to avoid symbol conflicts, `"__spv__"` is added to the
gpu module name to form the new one.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86384
This patch introduces a hook to encode descriptor set
and binding number into `spv.globalVariable`'s symbolic name. This
allows to preserve this information, and at the same time legalize
the global variable for the conversion to LLVM dialect.
This is required for `mlir-spirv-cpu-runner` to convert kernel
arguments into LLVM.
Also, a couple of some nits added:
- removed unused comment
- changed to a capital letter in the comment
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86515
Binding MemRefs of f16 needs special handling as the type is not supported on
CPU. There was a bug in the type used.
Differential Revision: https://reviews.llvm.org/D86328
Removed the Standard to LLVM conversion patterns that were previously
pulled in for testing purposes. This helps to separate the conversion
to LLVM dialect of the MLIR module with both SPIR-V and Standard
dialects in it (particularly helpful for SPIR-V cpu runner). Also,
tests were changed accordingly.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86285
Add the unsigned complements to the existing FPToSI and SIToFP operations in the
standard dialect, with one-to-one lowerings to the corresponding LLVM operations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85557
If Memref has rank > 1 this pass emits N-1 loops around
TransferRead op and transforms the op itself to 1D read. Since vectors
must have static shape while memrefs don't the pass emits if condition
to prevent out of bounds accesses in case some memref dimension is smaller
than the corresponding dimension of targeted vector. This logic is fine
but authors forgot to apply `permutation_map` on loops upper bounds and
thus if condition compares induction variable to incorrect loop upper bound
(dimension of the memref) in case `permutation_map` is not identity map.
This commit aims to fix that.
There should be an equivalent std.floor op to std.ceil. This includes
matching lowerings for SPIRV, NVVM, ROCDL, and LLVM.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85940
This patch adds more op/type conversion support
necessary for `spirv-runner`:
- EntryPoint/ExecutionMode: currently removed since we assume
having only one kernel function in the kernel module.
- StorageBuffer storage class is now supported. We are not
concerned with multithreading so this is fine for now.
- Type conversion enhanced, now regular offsets and strides
for structs and arrays are supported (based on
`VulkanLayoutUtils`).
- Support of `spc.AccessChain` that is modelled with GEP op
in LLVM dialect.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86109
According to the LLVM Language Reference, 'cmpxchg' accepts integer or pointer
types. Several MLIR tests were using it with floats as it appears possible to
programmatically construct and print such an instruction, but it cannot be
parsed back. Use integers instead.
Depends On D85899
Reviewed By: flaub, rriddle
Differential Revision: https://reviews.llvm.org/D85900
Legacy implementation of the LLVM dialect in MLIR contained an instance of
llvm::Module as it was required to parse LLVM IR types. The access to the data
layout of this module was exposed to the users for convenience, but in practice
this layout has always been the default one obtained by parsing an empty layout
description string. Current implementation of the dialect no longer relies on
wrapping LLVM IR types, but it kept an instance of DataLayout for
compatibility. This effectively forces a single data layout to be used across
all modules in a given MLIR context, which is not desirable. Remove DataLayout
from the LLVM dialect and attach it as a module attribute instead. Since MLIR
does not yet have support for data layouts, use the LLVM DataLayout in string
form with verification inside MLIR. Introduce the layout when converting a
module to the LLVM dialect and keep the default "" description for
compatibility.
This approach should be replaced with a proper MLIR-based data layout when it
becomes available, but provides an immediate solution to compiling modules with
different layouts, e.g. for GPUs.
This removes the need for LLVMDialectImpl, which is also removed.
Depends On D85650
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D85652
The convresion of memref cast operaitons from the Standard dialect to the LLVM
dialect has been emitting bitcasts from a struct type to itself. Beyond being
useless, such casts are invalid as bitcast does not operate on aggregate types.
This kept working by accident because LLVM IR bitcast construction API skips
the construction if types are equal before it verifies that the types are
acceptable in a bitcast. Do not emit such bitcasts, the memref cast that only
adds/erases size information is in fact a noop on the current descriptor as it
always contains dynamic values for all sizes.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D85899
Inital conversion of `spv._address_of` and `spv.globalVariable`.
In SPIR-V, the global returns a pointer, whereas in LLVM dialect
the global holds an actual value. This difference is handled by
`spv._address_of` and `llvm.mlir.addressof`ops that both return
a pointer. Moreover, only current invocation is in conversion's
scope.
Reviewed By: antiagainst, mravishankar
Differential Revision: https://reviews.llvm.org/D84626
This change adds initial support needed to generate OpenCL compliant SPIRV.
If Kernel capability is declared then memory model becomes OpenCL.
If Addresses capability is declared then addressing model becomes Physical64.
Additionally for Kernel capability interface variable ABI attributes are not
generated as entry point function is expected to have normal arguments.
Differential Revision: https://reviews.llvm.org/D85196
Using a shuffle for the last recursive step in progressive lowering not only
results in much more compact IR, but also more efficient code (since the
backend is no longer confused on subvector aliasing for longer vectors).
E.g. the following
%f = vector.shape_cast %v0: vector<1024xf32> to vector<32x32xf32>
yields much better x86-64 code that runs 3x faster than the original.
Reviewed By: bkramer, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D85482
The intrinsics were already supported and vector.transfer_read/write lowered
direclty into these operations. By providing them as individual ops, however,
clients can used them directly, and it opens up progressively lowering transfer
operations at higher levels (rather than direct lowering to LLVM IR as done now).
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D85357
Per Vulkan's SPIR-V environment spec: "While the OpSRem and OpSMod
instructions are supported by the Vulkan environment, they require
non-negative values and thus do not enable additional functionality
beyond what OpUMod provides."
The `getOffsetForBitwidth` function is used for lowering std.load
and std.store, whose indices are of `index` type and cannot be
negative. So we should be okay to use spv.UMod directly here to
be exact. Also made the comment explicit about the assumption.
Differential Revision: https://reviews.llvm.org/D83714
If Int16 is not available, 16-bit integers inside Workgroup storage
class should be emulated via 32-bit integers. This was previously
broken because the capability querying logic was incorrectly
intercepting all storage classes where it meant to only handle
interface storage classes. Adjusted where we return to fix this.
Differential Revision: https://reviews.llvm.org/D85308
Handle the case where the ViewOp takes in a memref that has
an memory space.
Reviewed By: ftynse, bondhugula, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D85048
This patch introduces a conversion of `spv.loop` to LLVM dialect.
Similarly to `spv.selection`, op's control attributes are not mapped
to LLVM yet and therefore the conversion fails if the loop control is
not `None`. Also, all blocks within the loop should be reachable in
order for conversion to succeed.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D84245
Introduces the expand and compress operations to the Vector dialect
(important memory operations for sparse computations), together
with a first reference implementation that lowers to the LLVM IR
dialect to enable running on CPU (and other targets that support
the corresponding LLVM IR intrinsics).
Reviewed By: reidtatge
Differential Revision: https://reviews.llvm.org/D84888
A new first-party modeling for LLVM IR types in the LLVM dialect has been
developed in parallel to the existing modeling based on wrapping LLVM `Type *`
instances. It resolves the long-standing problem of modeling identified
structure types, including recursive structures, and enables future removal of
LLVMContext and related locking mechanisms from LLVMDialect.
This commit only switches the modeling by (a) renaming LLVMTypeNew to LLVMType,
(b) removing the old implementaiton of LLVMType, and (c) updating the tests. It
is intentionally minimal. Separate commits will remove the infrastructure built
for the transition and update API uses where appropriate.
Depends On D85020
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D85021
This is a first patch that sweeps over tests to fix
indentation (tabs to spaces). It also adds label checks and
removes redundant matching of `%{{.*}} = `.
The following tests have been fixed:
- arithmetic-ops-to-llvm
- bitwise-ops-to-llvm
- cast-ops-to-llvm
- comparison-ops-to-llvm
- logical-ops-to-llvm (renamed to match the rest)
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85181
Now that we can have a memref of index type, we no longer need to materialize shapes in i64 and then index_cast.
Differential Revision: https://reviews.llvm.org/D84938
When lowering to the standard dialect, we currently support only the extent
tensor variant of the shape.rank operation. This change lets the conversion
pattern fail in a well-defined manner.
Differential Revision: https://reviews.llvm.org/D84852
This patch introduces new intrinsics in LLVM dialect:
- `llvm.intr.floor`
- `llvm.intr.maxnum`
- `llvm.intr.minnum`
- `llvm.intr.smax`
- `llvm.intr.smin`
These intrinsics correspond to SPIR-V ops from GLSL
extended instruction set (`spv.GLSL.Floor`, `spv.GLSL.FMax`,
`spv.GLSL.FMin`, `spv.GLSL.SMax` and `spv.GLSL.SMin`
respectively). Also conversion patterns for them were added.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84661
This is a second patch on conversion of GLSL ops to LLVM dialect.
It introduces patterns to convert `spv.InverseSqrt` and `spv.Tanh`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84633
This is the first patch that adds support for GLSL extended
instruction set ops. These are direct conversions, apart from `spv.Tan`
that is lowered to `sin() / cos()`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84627
The lowering does not support all types for its source operations. This change
makes the patterns fail in a well-defined manner.
Differential Revision: https://reviews.llvm.org/D84443
Operating on indices and extent tensors directly, the type conversion is no
longer needed for the supported cases.
Differential Revision: https://reviews.llvm.org/D84442
This adds conversions for const_size and to_extent_tensor. Also, cast-like operations are now folded away if the source and target types are the same.
Differential Revision: https://reviews.llvm.org/D84745
Conversion of `spv.BranchConditional` now supports branch weights
that are mapped to weights vector in `llvm.cond_br`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84657
This patch adds support of Volatile and Nontemporal
memory accesses to `spv.Load` and `spv.Store`. These attributes are
modelled with a `volatile` and `nontemporal` flags.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84739
functions.
This allows using command line flags to lowere from GPU to SPIR-V. The
pass added is only for testing/example purposes. Most uses cases will
need more fine-grained control on setting workgroup sizes for kernel
functions.
Differential Revision: https://reviews.llvm.org/D84619
Do not return error code, instead return created resource handles or void. Error reporting is done by the library function.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D84660
The LowerAffine psas was a FunctionPass only for legacy
reasons. Making this Op-agnostic allows it to be used from command
line when affine expressions are within operations other than
`std.func`.
Differential Revision: https://reviews.llvm.org/D84590
Previous changes generalized some of the operands and results. Complete
a larger group of those to simplify progressive lowering. Also update
some of the declarative asm form due to generalization. Tried to keep it
mostly mechanical.
This patch introduces conversion pattern for `spv.Store` and `spv.Load`.
Only op with `Function` Storage Class is supported at the moment
because `spv.GlobalVariable` has not been introduced yet. If the op
has memory access attribute, then there are the following cases.
If the access is `Aligned`, add alignment to the op builder. Otherwise
the conversion fails as other cases are not supported yet because they
require additional attributes for `llvm.store`/`llvm.load` ops: e.g.
`volatile` and `nontemporal`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84236
The patch introduces the conversion pattern for function-level
`spv.Variable`. It is modelled as `llvm.alloca` op. If initialized, then
additional store instruction is used. Note that there is no initialization
for arrays and structs since constants of these types are not supported in
LLVM dialect yet. Also, at the moment initialisation is only possible via
`spv.constant` (since `spv.GlobalVariable` conversion is not implemented
yet).
The input code has some scoping is not taken into account and will be
addressed in a different patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D84224
This concerns `from/to_extent_tensor`, `size_to_index`, `index_to_size`, and
`const_size` conversion patterns. The new lowering will work directly on indices
and extent tensors. The shape and size values will allow for error values but
are not yet supported by the dialect conversion.
Differential Revision: https://reviews.llvm.org/D84436
The operation `shape.shape_of` now returns an extent tensor `tensor<?xindex>` in
cases when no error are possible. All consuming operation will eventually accept
both, shapes and extent tensors.
Differential Revision: https://reviews.llvm.org/D84160
The default lowering of `assert` calls `abort` in case the assertion is
violated. The failure message is ignored but should be used by custom lowerings
that can assume more about their environment.
Differential Revision: https://reviews.llvm.org/D83886
This revision adds support for much deeper type conversion integration into the conversion process, and enables auto-generating cast operations when necessary. Type conversions are now largely automatically managed by the conversion infra when using a ConversionPattern with a provided TypeConverter. This removes the need for patterns to do type cast wrapping themselves and moves the burden to the infra. This makes it much easier to perform partial lowerings when type conversions are involved, as any lingering type conversions will be automatically resolved/legalized by the conversion infra.
To support this new integration, a few changes have been made to the type materialization API on TypeConverter. Materialization has been split into three separate categories:
* Argument Materialization: This type of materialization is used when converting the type of block arguments when calling `convertRegionTypes`. This is useful for contextually inserting additional conversion operations when converting a block argument type, such as when converting the types of a function signature.
* Source Materialization: This type of materialization is used to convert a legal type of the converter into a non-legal type, generally a source type. This may be called when uses of a non-legal type persist after the conversion process has finished.
* Target Materialization: This type of materialization is used to convert a non-legal, or source, type into a legal, or target, type. This type of materialization is used when applying a pattern on an operation, but the types of the operands have not yet been converted.
Differential Revision: https://reviews.llvm.org/D82831
Introduces the scatter/gather operations to the Vector dialect
(important memory operations for sparse computations), together
with a first reference implementation that lowers to the LLVM IR
dialect to enable running on CPU (and other targets that support
the corresponding LLVM IR intrinsics).
The operations can be used directly where applicable, or can be used
during progressively lowering to bring other memory operations closer to
hardware ISA support for a gather/scatter. The semantics of the operation
closely correspond to those of the corresponding llvm intrinsics.
Note that the operation allows for a dynamic index vector (which is
important for sparse computations). However, this first reference
lowering implementation "serializes" the address computation when
base + index_vector is converted to a vector of pointers. Exploring
how to use SIMD properly during these step is TBD. More general
memrefs and idiomatic versions of striding are also TBD.
Reviewed By: arpith-jacob
Differential Revision: https://reviews.llvm.org/D84039
This patch introduces conversion pattern for `spv.selection` op.
The conversion can only be applied to selection with all blocks being
reachable. Moreover, selection with control attributes "Flatten" and
"DontFlatten" is not supported.
Since the `PatternRewriter` hook for block merging has not been implemented
for `ConversionPatternRewriter`, merge and continue blocks are kept
separately.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D83860
This patch introduces conversion for `spv.Branch` and `spv.BranchConditional`
ops. Branch weigths for `spv.BranchConditional` are not supported at the
moment, and conversion in this case fails.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D83784
Summary: The logic was conservative but inverted: cases that should remain unmasked became 1-D masked.
Differential Revision: https://reviews.llvm.org/D84051
Lower `shape.shape_eq` to the `scf` (and `std`) dialect. For now, this lowering
is limited to extent tensor operands.
Differential Revision: https://reviews.llvm.org/D82530
To make it clear when shape error values cannot occur the shape operations can
operate on extent tensors. This change updates the lowering for `shape.reduce`
accordingly.
Differential Revision: https://reviews.llvm.org/D83944
Summary: The native alignment may generally not be used when lowering a vector.transfer to the underlying load/store operation. This revision fixes the unmasked load/store alignment to match that of the masked path.
Differential Revision: https://reviews.llvm.org/D83684
Per the Vulkan's SPIR-V environment spec, "for the OpSRem and OpSMod
instructions, if either operand is negative the result is undefined."
So we cannot directly use spv.SRem/spv.SMod if either operand can be
negative. Emulate it via spv.UMod.
Because the emulation uses spv.SNegate, this commit also defines
spv.SNegate.
Differential Revision: https://reviews.llvm.org/D83679
Summary:
These are semantically equivalent, but fmuladd allows decaying the op
into fmul+fadd if there is no fma instruction available. llvm.fma lowers
to scalar calls to libm fmaf, which is a lot slower.
Reviewers: nicolasvasilache, aartbik, ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D83666
This patch introduces type conversion for SPIR-V structs. Since
handling offset case requires thorough testing, it was left out
for now. Hence, only structs with no offset are currently
supported. Also, structs containing member decorations cannot
be translated.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D83403
This patch adds type conversion for 4 SPIR-V types: array, runtime array, pointer
and struct. This conversion is integrated using a separate function
`populateSPIRVToLLVMTypeConversion()` that adds new type conversions. At the moment,
this is a basic skeleton that allows to perfom conversion from SPIR-V array,
runtime array and pointer types to LLVM typesystem. There is no support of array
strides or storage classes. These will be supported on the case by case basis.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D83399
This patch adds conversion patterns for `spv.BitFieldSExtract` and `spv.BitFieldUExtract`.
As in the patch for `spv.BitFieldInsert`, `offset` and `count` have to be broadcasted in
vector case and casted to match the type of the base.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D82640
This patch introduces 3 new direct conversions for SPIR-V ops:
- `spv.Select`
- `spv.Undef`
- `spv.FMul` that was skipped in the patch with arithmetic ops
Differential Revision: https://reviews.llvm.org/D83291
scf.if currently lacks folding on true / false conditionals.
Such foldings are a bit more involved than can be addressed immediately.
This revision introduces an eager folding for lowering vector.transfer operations in the presence of unrolling.
Differential revision: https://reviews.llvm.org/D83146
This patch introduces conversion pattern for `spv.constant` with scalar
and vector types. There is a special case when the constant value is a
signed/unsigned integer (vector of integers). Since LLVM dialect does not
have signedness semantics, the types had to be converted to signless ints.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D82936
Added conversion pattern for SPIR-V `FunctionCallOp`. Based on
specification, it returns no results or a single result, so
can be mapped directly to LLVM dialect's `llvm.call`.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D83030
This patch introduces conversion pattern for `spv.BitFiledInsert` op,
as well as some utility functions to facilitate code reading.
Since `spv.BitFiledInsert` may take both vector and integer operands,
this case was specifically handled by broadcasting values (`count`
and `offset` here) to vectors. Moreover, the types had to be converted
to same bitwidth in order to conform with LLVM dialect rules.
This was done with `zext` when extending (Note that `count` and
`offset` are treated as unsigned) and `trunc` in the opposite case.
For the latter one, truncation is safe since the op is defined only when
`count`/`offset`/their sum is less than the bitwidth of the result.
This introduces a natural bound of the value of 64, which can be
expressed as `i8`.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D82639
This allow lowering to support scf.for and scf.if with results. As right now
spv region operations don't have return value the results are demoted to
Function memory. We create one allocation per result right before the region
and store the yield values in it. Then we can load back the value from
allocation to be able to use the results.
Differential Revision: https://reviews.llvm.org/D82246
Added conversion pattern and tests for `spv.Bitcast` op. This one has
a direct mapping in LLVM dialect so `DirectConversionPattern` was used.
Differential Revision: https://reviews.llvm.org/D82748
This patch introduces new conversion patterns for bit and logical
negation op: `spv.Not` and `spv.LogicalNot`. They are implemented
by applying xor on the operand and mask with all bits set.
Differential Revision: https://reviews.llvm.org/D82637
Summary:
The patch makes the index type lowering of the GPU to NVVM/ROCDL conversion configurable. It introduces a pass option that controls the bitwidth used when lowering index computations and uses the LowerToLLVMOptions structure to control the Standard to LLVM lowering.
This commit fixes a use-after-free bug introduced by the reverted commit d10b1a3. It implements the following changes:
- Added a getDefaultOptions method to the LowerToLLVMOptions struct that returns a reference to statically allocated default options.
- Use the getDefaultOptions method to provide default LowerToLLVMOptions (instead of an initializer list).
- Added comments to clarify the required lifetime of the LowerToLLVMOptions
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D82475
`llvm.mlir.constant` was originally introduced as an LLVM dialect counterpart
to `std.constant`. As such, it was supporting "function pointer" constants
derived from the symbol name. This is different from `std.constant` that allows
for creation of a "function" constant since MLIR, unlike LLVM IR, supports
this. Later, `llvm.mlir.addressof` was introduced as an Op that obtains a
constant pointer to a global in the LLVM dialect. It naturally extends to
functions (in LLVM IR, functions are globals) and should be used for defining
"function pointer" values instead.
Fixes PR46344.
Differential Revision: https://reviews.llvm.org/D82667
When the origin of a shape is an extent tensor the operation `get_extent` can be
lowered directly to `extract_element`.
This choice circumvents the necessity to materialize the shape in memory.
Differential Revision: https://reviews.llvm.org/D82645
When the shape is derived from a tensor argument the shape extent can be derived
directly from that tensor with `std.dim`.
This lowering pattern circumvents the necessity to materialize the shape in
memory.
Differential Revision: https://reviews.llvm.org/D82644
Rationale:
In general, passing "fastmath" from MLIR to LLVM backend is not supported, and even just providing such a feature for experimentation is under debate. However, passing fine-grained fastmath related attributes on individual operations is generally accepted. This CL introduces an option to instruct the vector-to-llvm lowering phase to annotate floating-point reductions with the "reassociate" fastmath attribute, which allows the LLVM backend to use SIMD implementations for such constructs. Oher lowering passes can start using this mechanism right away in cases where reassociation is allowed.
Benefit:
For some microbenchmarks on x86-avx2, speedups over 20 were observed for longer vector (due to cleaner, spill-free and SIMD exploiting code).
Usage:
mlir-opt --convert-vector-to-llvm="reassociate-fp-reductions"
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D82624
Implemented conversion for `spv.BitReverse` and `spv.BitCount`. Since ODS
generates builders in a different way for LLVM dialect intrinsics, I
added attributes to build method in `DirectConversionPattern` class. The
tests for these ops are in `bitwise-ops-to-llvm.mlir`.
Differential Revision: https://reviews.llvm.org/D82286
Initially, unranked memref descriptors in the LLVM dialect were designed only
to be passed into functions. An assertion was guarding against returning
unranked memrefs from functions in the standard-to-LLVM conversion. This is
insufficient for functions that wish to return an unranked memref such that the
caller does not know the rank in advance, and hence cannot allocate the
descriptor and pass it in as an argument.
Introduce a calling convention for returning unranked memref descriptors as
follows. An unranked memref descriptor always points to a ranked memref
descriptor stored on stack of the current function. When an unranked memref
descriptor is returned from a function, the ranked memref descriptor it points
to is copied to dynamically allocated memory, the ownership of which is
transferred to the caller. The caller is responsible for deallocating the
dynamically allocated memory and for copying the pointed-to ranked memref
descriptor onto its stack.
Provide default lowerings for std.return, std.call and std.indirect_call that
maintain the conversion defined above.
This convention is additionally exercised by a runtime test to guard against
memory errors.
Differential Revision: https://reviews.llvm.org/D82647
Lower `shape.rank` to standard dialect.
A shape's size is the same as the extent of the first and only dimension of the
`tensor<?xindex>` it is represented by.
Differential Revision: https://reviews.llvm.org/D82080
This patch introduces conversion patterns for `spv.module` and `spv._module_end`.
SPIR-V module is converted into `ModuleOp`. This will play a role of enclosing
scope to LLVM ops. At the moment, SPIR-V module attributes (such as memory model,
etc) are ignored.
Differential Revision: https://reviews.llvm.org/D82468
This patch provides an implementation for `spv.func` conversion. The pattern
is populated in a separate method added to the pass. At the moment, the type
signature conversion only includes the supported types. The conversion pattern
also matches SPIR-V function control attributes to LLVM function attributes.
Those are modelled as `passthrough` attributes in LLVM dialect. The following
mapping are used:
- None: no attributes passed
- Inline: `alwaysinline` seems to be the right equivalent (`inlinehint` is
semantically weaker in my opinion)
- DontInline: `noinline`
- Pure and Const: I think those can be modelled as `readonly` and `readnone`
attributes respectively.
Also, 2 patterns added for return ops conversion (`spv.Return` for void return
and `spv.ReturnValue` for a single value return).
Differential Revision: https://reviews.llvm.org/D81931
This patch extends the AccessChainOp index type handling to be able to deal with
all Integer type indices (i.e., all bit-widths and signedness symantics).
There were two ways of achieving this:
1- Backward compatible: The new way of handling the indices will assume that
an index type is i32 by default if not specified in the assembly format,
this way all the old tests would pass correctly.
2- Enforce the format: This unifies the spv.AccessChain Op format and all the old
tests had to be updated to reflect this change or else they fail.
I picked option-2 to unify the Op format and avoid having optional index-type fields
that can lead to somewhat confusing tests format and multiple representations for
the same Op with undocumented assumption that an index is i32 unless stated.
Nonetheless, reverting to option-1 should be straightforward if preferred or needed.
Differential Revision: https://reviews.llvm.org/D81763
The patch makes the index type lowering of the GPU to NVVM/ROCDL
conversion configurable. It introduces a pass option that controls the
bitwidth used when lowering index computations.
Differential Revision: https://reviews.llvm.org/D80285
Subview operations are not natively supported downstream in the spirv path.
This change allows removing subview when used by vector transfer the same way
we already do it when they are used by LoadOp/StoreOp
Differential Revision: https://reviews.llvm.org/D82106
Use direct vector constants for the 1-D case. This approach
scales much better than generating elaborate insertion operations
that are eventually folded into a constant. We could of course
generalize the 1-D case to higher ranks, but this simplification
already helps in scaling some microbenchmarks that would formerly
crash on the intermediate IR length.
Reviewed By: reidtatge
Differential Revision: https://reviews.llvm.org/D82144
Lower `shape.shape_of` to standard dialect.
This lowering supports statically and dynamically shaped tensors.
Support for unranked tensors will be added as part of the lowering to `scf`.
Differential Revision: https://reviews.llvm.org/D82098
Summary:
The "i1" (viz. bool) type does not have a proper equivalent on the "C"
size. So, to avoid any ABIs issues, we simply use print_i32 on an i32
value of one or zero for true and false. This has the added advantage
that one less function needs to be implemented when porting the runtime
support library.
Reviewers: ftynse, bkramer, nicolasvasilache
Reviewed By: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D82048
Added support of simple logical ops: `LogicalAnd`, `LogicalOr`,
`LogicalEqual` and `LogicalNotEqual`. Added a missing conversion
for `UMod` op.
Also, implemented SPIR-V cast ops conversion. There are 4 simple
case where there is a clear equivalent in LLVM (e.g. `ConvertFToS`
is `fptosi`). For `FConvert`, `SConvert` and `UConvert` we
distinguish between truncation and extension based on the bit
width of the operand.
Differential Revision: https://reviews.llvm.org/D81812
Implement the missing lowering from `std.dim` to the LLVM dialect in case of a
dynamic dimension.
Differential Revision: https://reviews.llvm.org/D81834
This patch has shift ops conversion implementation. In SPIR-V dialect,
`Shift` and `Base` may have different bit width. On the contrary,
in LLVM dialect both `Base` and `Shift` have to be of the same bit width.
This leads to the following cases:
- if `Base` has the same bit width as `Shift`, the conversion is
straightforward.
- if `Base` has a greater bit width than `Shift`, shift is sign/zero
extended first. Then the extended value is passed to the shift.
- otherwise the conversion is considered to be illegal.
Differential Revision: https://reviews.llvm.org/D81546
This option avoids to accidentally reuse variable across -LABEL match,
it can be explicitly opted-in by prefixing the variable name with $
Differential Revision: https://reviews.llvm.org/D81531
Implemented `FComparePattern` and `IComparePattern` classes
that provide conversion of SPIR-V comparison ops (such as
`spv.FOrdGreaterThanEqual` and others) to LLVM dialect.
Also added tests in `comparison-ops-to-llvm.mlir`.
Differential Revision: https://reviews.llvm.org/D81487
Following the previous revision `D81100`, this commit implements a templated class
that would provide conversion patterns for “straightforward” SPIR-V ops into
LLVM dialect. Templating allows to abstract away from concrete implementation
for each specific op. Those are mainly binary operations. Currently supported
and tested ops are:
- Arithmetic ops: `IAdd`, `ISub`, `IMul`, `FAdd`, `FSub`, `FMul`, `FDiv`, `FNegate`,
`SDiv`, `SRem` and `UDiv`
- Bitwise ops: `BitwiseAnd`, `BitwiseOr`, `BitwiseXor`
The implementation relies on `SPIRVToLLVMConversion` class that makes use of
`OpConversionPattern`.
Differential Revision: https://reviews.llvm.org/D81305
Allow for dynamic indices in the `dim` operation.
Rather than an attribute, the index is now an operand of type `index`.
This allows to apply the operation to dynamically ranked tensors.
The correct lowering of dynamic indices remains to be implemented.
Differential Revision: https://reviews.llvm.org/D81551
Having the input dumped on failure seems like a better
default: I debugged FileCheck tests for a while without knowing
about this option, which really helps to understand failures.
Remove `-dump-input-on-failure` and the environment variable
FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.
Differential Revision: https://reviews.llvm.org/D81422
Summary:
The NVVM target only provides implementations for tanh etc. on f32 and
f64 operands. To also support f16, we now insert operations to extend to f32
and truncate back to f16 around the intrinsic call.
Differential Revision: https://reviews.llvm.org/D81473
These commits set up the skeleton for SPIR-V to LLVM dialect conversion.
I created SPIR-V to LLVM pass, registered it in Passes.td, InitAllPasses.h.
Added a pattern for `spv.BitwiseAndOp` and tests for it. Integer, float
and vector types are converted through LLVMTypeConverter.
Differential Revision: https://reviews.llvm.org/D81100
The operations `to_extent_tensor` and `from_extent_tensor` become no-ops when
lowered to the standard dialect.
This is possible with a lowering from `shape.shape` to `tensor<?xindex>`.
Differential Revision: https://reviews.llvm.org/D81162
Recently introduced allocation hoisting is quite conservative on the cases when it triggers.
This revision makes it such that the allocations for vector transfer lowerings are hoisted
to the top of the function.
This should be revisited in the context of parallelism and is a temporary workaround.
Differential Revision: https://reviews.llvm.org/D81253
This simplifies a lot of handling of BoolAttr/IntegerAttr. For example, a lot of places currently have to handle both IntegerAttr and BoolAttr. In other places, a decision is made to pick one which can lead to surprising results for users. For example, DenseElementsAttr currently uses BoolAttr for i1 even if the user initialized it with an Array of i1 IntegerAttrs.
Differential Revision: https://reviews.llvm.org/D81047
Add SubgroupId, SubgroupSize and NumSubgroups to GPU dialect ops and add the
lowering of those ops to SPIRV.
Differential Revision: https://reviews.llvm.org/D81042
Add a new pass to lower operations from the `shape` to the `std` dialect.
The conversion applies only to the `size_to_index` and `index_to_size`
operations and affected types.
Other patterns will be added as needed.
Differential Revision: https://reviews.llvm.org/D81091
Keeping in the affine.for to gpu.launch conversions, which should
probably be the affine.parallel to gpu.launch conversion as well.
Differential Revision: https://reviews.llvm.org/D80747
https://reviews.llvm.org/D79246 introduces alignment propagation for vector transfer operations. Unfortunately, the alignment calculation is incorrect and can result in crashes.
This revision fixes the calculation by using the natural alignment of the memref elemental type, instead of the resulting vector type.
If more alignment is desired, it can be done in 2 ways:
1. use a proper vector.type_cast to transform a memref<axbxcxdxf32> into a memref<axbxvector<cxdxf32>> giving a natural alignment of vector<cxdxf32>
2. add an alignment attribute to vector transfer operations and propagate it.
With this change the alignment in the relevant tests goes down from 128 to 4.
Lastly, a few minor cleanups are performed and the custom `isMinorIdentityMap` is deprecated.
Differential Revision: https://reviews.llvm.org/D80734
Make ConvertKernelFuncToCubin pass to be generic:
- Rename to ConvertKernelFuncToBlob.
- Allow specifying triple, target chip, target features.
- Initializing LLVM backend is supplied by a callback function.
- Lowering process from MLIR module to LLVM module is via another callback.
- Change mlir-cuda-runner to adopt the revised pass.
- Add new tests for lowering to ROCm HSA code object (HSACO).
- Tests for CUDA and ROCm are kept in separate directories.
Differential Revision: https://reviews.llvm.org/D80142
This allocation of a workgroup memory is lowered to a
spv.globalVariable. Only static size allocation with element type
being int or float is handled. The lowering does account for the
element type that are not supported in the lowered spv.module based on
the extensions/capabilities and adjusts the number of elements to get
the same byte length.
Differential Revision: https://reviews.llvm.org/D80411
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167
This reverts commit cdb6f05e2d.
The build is broken with:
You have called ADD_LIBRARY for library obj.MLIRGPUtoCUDATransforms without any source files. This typically indicates a problem with your CMakeLists.txt file
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167