Non-32-bit scalar types require special hardware support that may not
exist on all GPUs. This is reflected in SPIR-V as that non-32-bit scalar
types require special capabilities or extensions.
Previously when there is a non-32-bit type and no native support, we
unconditionally emulate it with 32-bit ones. This isn't good given that
it can have implications over ABI and data layout consistency.
This commit introduces an option to control whether to use 32-bit
types to emulate.
Differential Revision: https://reviews.llvm.org/D100059
The patch enables the use of index type in vectors. It is a prerequisite to support vectorization for indexed Linalg operations. This refactoring became possible due to the newly introduced data layout infrastructure. The data layout of a module defines the bitwidth of the index type needed to verify bitcasts and similar vector operations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D99948
Also factors out out-of-bounds mask generation from vector.transfer_read/write into a new MaterializeTransferMask pattern.
Differential Revision: https://reviews.llvm.org/D100001
Table op lowering to linalg.generic for both i8 (behaves like a gather) and a
pair of gathers with a quantized interpolation.
Differential Revision: https://reviews.llvm.org/D99756
The existing implementation was always creating 32-bit constants for
floating-point and integer reductions regardless of the actual type, which
resulted in invalid IR being generated for any types other than f32 and i32
when lowering affine.parallel to SCF. Use the actual type instead.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D99942
Added lowerings for Tosa's reduce boolean operations. This includes a fix to
maintain the output rank of reduce operations.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D99228
This commit add utility functions for creating push constant
storage variable and loading values from it.
Along the way, performs some clean up:
* Deleted `setABIAttrs`, which is just a 4-liner function
with one user.
* Moved `SPIRVConverstionTarget` into `mlir` namespace,
to be consistent with `SPIRVTypeConverter` and
`LLVMConversionTarget`.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D99725
This is in preparation for adding a new "mask" operand. The existing "masked" attribute was used to specify dimensions that may be out-of-bounds. Such transfers can be lowered to masked load/stores. The new "in_bounds" attribute is used to specify dimensions that are guaranteed to be within bounds. (Semantics is inverted.)
Differential Revision: https://reviews.llvm.org/D99639
In particular for Graph Regions, the terminator needs is just a
historical artifact of the generalization of MLIR from CFG region.
Operations like Module don't need a terminator, and before Module
migrated to be an operation with region there wasn't any needed.
To validate the feature, the ModuleOp is migrated to use this trait and
the ModuleTerminator operation is deleted.
This patch is likely to break clients, if you're in this case:
- you may iterate on a ModuleOp with `getBody()->without_terminator()`,
the solution is simple: just remove the ->without_terminator!
- you created a builder with `Builder::atBlockTerminator(module_body)`,
just use `Builder::atBlockEnd(module_body)` instead.
- you were handling ModuleTerminator: it isn't needed anymore.
- for generic code, a `Block::mayNotHaveTerminator()` may be used.
Differential Revision: https://reviews.llvm.org/D98468
Lowering of bitwise_not to linalg dialect using a xor operation with a constant
of all-bits-one.
Differential Revision: https://reviews.llvm.org/D99221
Index type is an integer type of target-specific bitwidth present in many MLIR
operations (loops, memory accesses). Converting values of this type to
fixed-size integers has always been problematic. Introduce a data layout entry
to specify the bitwidth of `index` in a given layout scope, defaulting to 64
bits, which is a commonly used assumption, e.g., in constants.
Port builtin-to-LLVM type conversion to use this data layout entry when
converting `index` type and untie it from pointer size. This is particularly
relevant for GPU targets. Keep a possibility to forcibly override the index
type in lowerings.
Depends On D98525
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98937
Use new `MemRefType::getMemorySpace` method with generic Attribute
in cases, where there is no specific logic around the memory space.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D99154
Tosa's argmax lowering is representable as a linalg.indexed_generic
operation. Include the lowering to this type for both integer and
floating point types.
Differential Revision: https://reviews.llvm.org/D99137
To match an interface or trait, users currently have to use the `MatchAny` tag. This tag can be quite problematic for compile time for things like the canonicalizer, as the `MatchAny` patterns may get applied to *every* operation. This revision adds better support by bucketing interface/trait patterns based on which registered operations have them registered. This means that moving forward we will only attempt to match these patterns to operations that have this interface registered. Two simplify defining patterns that match traits and interfaces, two new utility classes have been added: OpTraitRewritePattern and OpInterfaceRewritePattern.
Differential Revision: https://reviews.llvm.org/D98986
Tiling operations are generic operations with modified indexing. Updated to to
linalg lowerings to perform this lowering.
Differential Revision: https://reviews.llvm.org/D99113
Adds lowerings for matmul and fully_connected. Only supports 2D tensors for inputs and weights, and 1D tensors for bias.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D99211
This nicely aligns the naming with RewritePatternSet. This type isn't
as widely used, but we keep a using declaration in to help with
downstream consumption of this change.
Differential Revision: https://reviews.llvm.org/D99131
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
Multiply-shift requires wider compute types or CPU specific code to avoid
premature truncation, apply_shift fixes this issue
Also, Tosa's mul op supports different input / output types. Added path that
sign-extends input values to int-32 values before multiplying.
Differential Revision: https://reviews.llvm.org/D99011
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
Handles lowering from the tosa CastOp to the equivalent linalg lowering. It
includes support for interchange between bool, int, and floating point.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D98828
Adds lowerings for logical_* boolean operations. Each of these ops only operate
on booleans allowing simple lowerings.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D98910
This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.
We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.
Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D98753
Includes lowering for tosa.concat with indice computation with subtensor insert
operations. Includes tests along two different indices.
Differential Revision: https://reviews.llvm.org/D98813
Add a feature to `EnumAttr` definition to generate
specialized Attribute class for the particular enumeration.
This class will inherit `StringAttr` or `IntegerAttr` and
will override `classof` and `getValue` methods.
With this class the enumeration predicate can be checked with simple
RTTI calls (`isa`, `dyn_cast`) and it will return the typed enumeration
directly instead of raw string/integer.
Based on the following discussion:
https://llvm.discourse.group/t/rfc-add-enum-attribute-decorator-class/2252
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97836
Returning structs directly in LLVM does not necessarily align with the C ABI of
the platform. This might happen to work on Linux but for small structs this
breaks on Windows. With this change, the wrappers work platform independently.
Differential Revision: https://reviews.llvm.org/D98725
This commit fixes the lowering of `Affine.IfOp` to `SCF.IfOp` in the
presence of yield values. These changes have been made as a part of
`-lower-affine` pass.
Differential Revision: https://reviews.llvm.org/D98760
This revision extends the PDL Interpreter dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl_interp.check_types : Compare a range of types with a known range.
* pdl_interp.create_types : Create a constant range of types.
* pdl_interp.get_operands : Get a range of operands from an operation.
* pdl_interp.get_results : Get a range of results from an operation.
* pdl_interp.switch_types : Switch on a range of types.
This revision handles adding support in the interpreter dialect and the conversion from PDL to PDLInterp. Support for variadic operands and results in the bytecode will be added in a followup revision.
Differential Revision: https://reviews.llvm.org/D95722
This has a numerous amount of benefits, given the overly clunky nature of CreateNativeOp:
* Users can now call into arbitrary rewrite functions from inside of PDL, allowing for more natural interleaving of PDL/C++ and enabling for more of the pattern to be in PDL.
* Removes the need for an additional set of C++ functions/registry/etc. The new ApplyNativeRewriteOp will use the same PDLRewriteFunction as the existing RewriteOp. This reduces the API surface area exposed to users.
This revision also introduces a new PDLResultList class. This class is used to provide results of native rewrite functions back to PDL. We introduce a new class instead of using a SmallVector to simplify the work necessary for variadics, given that ranges will require some changes to the structure of PDLValue.
Differential Revision: https://reviews.llvm.org/D95720
Up until now, results have been represented as additional results to a pdl.operation. This is fairly clunky, as it mismatches the representation of the rest of the IR constructs(e.g. pdl.operand) and also isn't a viable representation for operations returned by pdl.create_native. This representation also creates much more difficult problems when factoring in support for variadic result groups, optional results, etc. To resolve some of these problems, and simplify adding support for variable length results, this revision extracts the representation for results out of pdl.operation in the form of a new `pdl.result` operation. This operation returns the result of an operation at a given index, e.g.:
```
%root = pdl.operation ...
%result = pdl.result 0 of %root
```
Differential Revision: https://reviews.llvm.org/D95719
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98470
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203
The dialect separation was introduced to demarkate ops operating in different
type systems. This is no longer the case after the LLVM dialect has migrated to
using built-in vector types, so the original reason for separation is no longer
valid. Squash the two dialects into one.
The code size decrease isn't quite large: the ops originally in LLVM_AVX512 are
preserved because they match LLVM IR intrinsics specialized for vector element
bitwidth. However, it is still conceptually beneficial to have only one
dialect. I originally considered to use Tablegen multiclasses to define both
the type-polymorphic op and its two intrinsic-related instantiations, but
decided against it given both the complexity of the required Tablegen input and
its dissimilarity with the rest of ODS-defined ops, both potentially resulting
in very poor maintainability.
Depends On D98327
Reviewed By: nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D98328
Instead of configuring kernel-to-cubin/rocdl lowering through callbacks, introduce a base class that target-specific passes can derive from.
Put the base class in GPU/Transforms, according to the discussion in D98203.
The mlir-cuda-runner will go away shortly, and the mlir-rocdl-runner as well at some point. I therefore kept the existing code path working and will remove it in a separate step.
Depends On D98168
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98279
Provide default for gpuBinaryAnnotation so that we don't need to specify it in tests.
The annotation likely only needs to be target specific if we want to lower to e.g. both CUDA and ROCDL.
Reviewed By: herhut, bondhugula
Differential Revision: https://reviews.llvm.org/D98168
This is using the new Attribute storage generation support in
TableGen to define the LLVM FastMathFlags.
Differential Revision: https://reviews.llvm.org/D98007
Lowerings for min, max, prod, and sum reduction operations on int and float
values. This includes reduction tests for both cases.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D97893
split_at can return an error if the split index is out of bounds. If the
user knows that the index can never be out of bounds it's safe to use
extent tensors. This has a straight-forward lowering to std.subtensor.
Differential Revision: https://reviews.llvm.org/D98177
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere. For ops that
don't have a SPIR-V spec counterpart, we use spv.mlir.snake_case.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D98014
Normally tensors will be stored in buffers before converting to SPIR-V,
given that is how a large amount of data is sent to the GPU. However,
SPIR-V supports converting from tensors directly too. This is for the
cases where the tensor just contains a small amount of elements and it
makes sense to directly inline them as a small data array in the shader.
To handle this, internally the conversion might create new local
variables. SPIR-V consumers in GPU drivers may or may not optimize that
away. So this has implications over register pressure. Therefore, a
threshold is used to control when the patterns should kick in.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D98052
The two dialects are largely redundant. The former was introduced as a mirror
of the latter operating on LLVM dialect types. This is no longer necessary
since the LLVM dialect operates on built-in types. Combine the two dialects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98060
To unify the naming scheme across all ops in the SPIR-V dialect,
we are moving from spv.camelCase to spv.CamelCase everywhere.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97918
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97919
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from `spv.camelCase` to `spv.CamelCase` everywhere.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97917
There is no need for the interface implementations to be exposed, opaque
registration functions are sufficient for all users, similarly to passes.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97852
This better matches the actual IR concept that is being modeled, and is consistent with how the rest of PDL is structured.
Differential Revision: https://reviews.llvm.org/D95718
The current implementation of Value involves a pointer int pair with several different kinds of owners, i.e. BlockArgumentImpl*, Operation *, TrailingOpResult*. This design arose from the desire to save memory overhead for operations that have a very small number of results (generally 0-2). There are, unfortunately, many problematic aspects of the current implementation that make Values difficult to work with or just inefficient.
Operation result types are stored as a separate array on the Operation. This is very inefficient for many reasons: we use TupleType for multiple results, which can lead to huge amounts of memory usage if multi-result operations change types frequently(they do). It also means that simple methods like Value::getType/Value::setType now require complex logic to get to the desired type.
Value only has one pointer bit free, severely limiting the ability to use it in things like PointerUnion/PointerIntPair. Given that we store the kind of a Value along with the "owner" pointer, we only leave one bit free for users of Value. This creates situations where we end up nesting PointerUnions to be able to use Value in one.
As noted above, most of the methods in Value need to branch on at least 3 different cases which is both inefficient, possibly error prone, and verbose. The current storage of results also creates problems for utilities like ValueRange/TypeRange, which want to efficiently store base pointers to ranges (of which Operation* isn't really useful as one).
This revision greatly simplifies the implementation of Value by the introduction of a new ValueImpl class. This class contains all of the state shared between all of the various derived value classes; i.e. the use list, the type, and the kind. This shared implementation class provides several large benefits:
* Most of the methods on value are now branchless, and often one-liners.
* The "kind" of the value is now stored in ValueImpl instead of Value
This frees up all of Value's pointer bits, allowing for users to take full advantage of PointerUnion/PointerIntPair/etc. It also allows for storing more operation results as "inline", 6 now instead of 2, freeing up 1 word per new inline result.
* Operation result types are now stored in the result, instead of a side array
This drops the size of zero-result operations by 1 word. It also removes the memory crushing use of TupleType for operations results (which could lead up to hundreds of megabytes of "dead" TupleTypes in the context). This also allowed restructured ValueRange, making it simpler and one word smaller.
This revision does come with two conceptual downsides:
* Operation::getResultTypes no longer returns an ArrayRef<Type>
This conceptually makes some usages slower, as the iterator increment is slightly more complex.
* OpResult::getOwner is slightly more expensive, as it now requires a little bit of arithmetic
From profiling, neither of the conceptual downsides have resulted in any perceivable hit to performance. Given the advantages of the new design, most compiles are slightly faster.
Differential Revision: https://reviews.llvm.org/D97804
This gets rid of a dubious shape_eq %a, %a fold, that folds shape_eq
even if %a is not an Attribute.
Differential Revision: https://reviews.llvm.org/D97728
Just a pure method renaming.
It is a preparation step for replacing "memory space as raw integer"
with more generic "memory space as attribute", which will be done in
separate commit.
The `MemRefType::getMemorySpace` method will return `Attribute` and
become the main API, while `getMemorySpaceAsInt` will be declared as
deprecated and will be replaced in all in-tree dialects (also in separate
commits).
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D97476
These warnings are raised when compiling with gcc due to either having too few or too many commas, or in the case of lldb, the possibility of a nullptr.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97586
Lowers the transpose operation to a generic linalg op when permutations
is a constant value.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D97508
Includes a lowering for tosa.const, tosa.if, and tosa.while to Standard/SCF dialects. TosaToStandard is
used for constant lowerings and TosaToSCF handles the if/while ops.
Resubmission of https://reviews.llvm.org/D97518 with ASAN fixes.
Differential Revision: https://reviews.llvm.org/D97529
Both identity ops can be loweried by replacing their results with their
inputs. We keep this as a linalg lowering as other backends may choose to
create copies.
Differential Revision: https://reviews.llvm.org/D97517
Similar to mask-load/store and compress/expand, the gather and
scatter operation now allow for higher dimension uses. Note that
to support the mixed-type index, the new syntax is:
vector.gather %base [%i,%j] [%kvector] ....
The first client of this generalization is the sparse compiler,
which needs to define scatter and gathers on dense operands
of higher dimensions too.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97422
Lowering from the tosa.reshape op to linalg.reshape. For same-rank or
non-collapsed/expanded cases two linalg.reshapes are inserted.
Differential Revision: https://reviews.llvm.org/D97439
'getAttrs' has been explicitly marked deprecated. This patch refactors
to use Operation::getAttrs().
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D97546
Includes a lowering for tosa.const, tosa.if, and tosa.while to Standard/SCF dialects. TosaToStandard is
used for constant lowerings and TosaToSCF handles the if/while ops.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97352
Lower !gpu.async.tokens returned from async.execute regions to events instead of streams.
Make !gpu.async.token returned from !async.execute single-use.
This allows creating one event per use and destroying them without leaking or ref-counting.
Technically we only need this for stream/event-based lowering. I kept the code separate
from the rest of the gpu-async-region pass so that we can make this optional or move
to a separate pass as needed.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D96965
The cuda-runner registers two pass pipelines for nested passes,
so that we don't have to use verbose textual pass pipeline specification.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D97091
We should be ordering predicates with higher primary/secondary sums first, but we are currently ordering them last. This allows for predicates more frequently encountered to be checked first.
Differential Revision: https://reviews.llvm.org/D95715
It's not necessarily the case on all architectures that all memory is
addressable in addrspace 0, so casting the pointer to addrspace 0 is
liable to cause problems.
Reviewed By: aartbik, ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96380
This patch adds lowering to Linalg for the following TOSA ops: negate, rsqrt, mul, select, clamp and reluN and includes support for signless integer and floating point types
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D96924
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
Add a pattern to converting some value to a boolean. spirv.S/UConvert does not
work on i1 types. Thus, the pattern is lowered to cmpi + select.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D96851
This corresponds with the previous work to make shape.broadcast nary.
Additionally, simplify the ConvertShapeConstraints pass. It now doesn't
lower an implicit shape.is_broadcastable. This is still the same in
combination with shape-to-standard when the 2 passes are used in either
order.
Differential Revision: https://reviews.llvm.org/D96401
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
This reverts commit 3f22547fd1 and reland 973e133b76 with a workaround for
a gcc bug that does not accept lambda default parameters:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59949
Differential Revision: https://reviews.llvm.org/D96598
Align the vector gather/scatter/expand/compress API with
the vector load/store/maskedload/maskedstore API.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D96396
This patch adds the 'vector.load' and 'vector.store' ops to the Vector
dialect [1]. These operations model *contiguous* vector loads and stores
from/to memory. Their semantics are similar to the 'affine.vector_load' and
'affine.vector_store' counterparts but without the affine constraints. The
most relevant feature is that these new vector operations may perform a vector
load/store on memrefs with a non-vector element type, unlike 'std.load' and
'std.store' ops. This opens the representation to model more generic vector
load/store scenarios: unaligned vector loads/stores, perform scalar and vector
memory access on the same memref, decouple memory allocation constraints from
memory accesses, etc [1]. These operations will also facilitate the progressive
lowering of both Affine vector loads/stores and Vector transfer reads/writes
for those that read/write contiguous slices from/to memory.
In particular, this patch adds the 'vector.load' and 'vector.store' ops to the
Vector dialect, implements their lowering to the LLVM dialect, and changes the
lowering of 'affine.vector_load' and 'affine.vector_store' ops to the new vector
ops. The lowering of Vector transfer reads/writes will be implemented in the
future, probably as an independent pass. The API of 'vector.maskedload' and
'vector.maskedstore' has also been changed slightly to align it with the
transfer read/write ops and the vector new ops. This will improve reusability
among all these operations. For example, the lowering of 'vector.load',
'vector.store', 'vector.maskedload' and 'vector.maskedstore' to the LLVM dialect
is implemented with a single template conversion pattern.
[1] https://llvm.discourse.group/t/memref-type-and-data-layout/
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96185
This reverts commit 973e133b76.
It triggers an issue in gcc5 that require investigation, the build is
broken with:
/tmp/ccdpj3B9.s: Assembler messages:
/tmp/ccdpj3B9.s:5821: Error: symbol `_ZNSt17_Function_handlerIFvjjEUljjE2_E9_M_invokeERKSt9_Any_dataOjS6_' is already defined
/tmp/ccdpj3B9.s:5860: Error: symbol `_ZNSt14_Function_base13_Base_managerIUljjE2_E10_M_managerERSt9_Any_dataRKS3_St18_Manager_operation' is already defined
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
Differential Revision: https://reviews.llvm.org/D96598
Until now, the GPU translation to NVVM or ROCDL intrinsics relied on the
presence of the generic `gpu.kernel` attribute to attach additional LLVM IR
metadata to the relevant functions. This would be problematic if each dialect
were to handle the conversion of its own options, which is the intended
direction for the translation infrastructure. Introduce `nvvm.kernel` and
`rocdl.kernel` in addition to `gpu.kernel` and base translation on these new
attributes instead.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D96591
The AffineMap in the MemRef inferred by SubViewOp may have uncompressed symbols which result in type mismatch on otherwise unused symbols. Make the computation of the AffineMap compress those unused symbols which results in better canonical types.
Additionally, improve the error message to report which inferred type was expected.
Differential Revision: https://reviews.llvm.org/D96551
With the standard dialect being split up, the set of dialects that are
used when converting to GPU is growing. This change modifies the
SCFToGpu pass to allow all operations inside launch bodies.
Differential Revision: https://reviews.llvm.org/D96480
Previously broadcast was a binary op. Now it can support more inputs.
This has been changed in such a way that for now, this is an NFC for
all broadcast operations that were previously legal.
Differential Revision: https://reviews.llvm.org/D95777
This reverts commit 511dd4f438 along with
a couple fixes.
Original message:
Now the context is the first, rather than the last input.
This better matches the rest of the infrastructure and makes
it easier to move these types to being declaratively specified.
Phabricator: https://reviews.llvm.org/D96111
Now the context is the first, rather than the last input.
This better matches the rest of the infrastructure and makes
it easier to move these types to being declaratively specified.
Differential Revision: https://reviews.llvm.org/D96111
This patch introduces a few more straightforward patterns
to convert vector ops operating on 1-4 element vectors
to their corresponding SPIR-V counterparts.
This patch also enables converting vector<1xT> to T.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D96042
Historically, Linalg To LLVM conversion subsumed numerous other conversions,
including (affine) loop lowerings to CFG and conversions from the Standard and
Vector dialects to the LLVM dialect. This was due to the insufficient support
for partial conversions in the infrastructure that essentially required
conversions that involve type change (in this case, !linalg.range to
!llvm.struct) to be performed in a single conversion sweep. This is no longer
the case so remove the subsumed conversions and run them as separate passes
when necessary.
Depends On D95317
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96008
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
Historically, the Vector to LLVM dialect conversion subsumed the Standard to
LLVM dialect conversion patterns. This was necessary because the conversion
infrastructure did not have sufficient support for reconciling type
conversions. This support is now available. Only keep the patterns related to
the Vector dialect in the Vector to LLVM conversion and require type casts
operations to be inserted if necessary. These casts will be removed by
following conversions if possible. Update integration tests to also run the
Standard to LLVM conversion.
There is a significant amount of test churn, which is due to (a) unnecessarily
strict tests in VectorToLLVM and (b) many patterns actually targeting Standard
dialect ops instead of LLVM dialect ops leading to tests actually exercising a
Vector->Standard->LLVM conversion. This churn is a good illustration of the
reason to make the conversion partial: now the tests only check the code in the
Vector to LLVM conversion and will not be randomly broken by changes in
Standard to LLVM conversion.
Arguably, it may be possible to extract Vector to Standard patterns into a
separate pass, but given the ongoing splitting of the Standard dialect, such
pass will be short-lived and will require further refactoring.
Depends On D95626
Reviewed By: nicolasvasilache, aartbik
Differential Revision: https://reviews.llvm.org/D95685
Add the conversion pattern for vector.bitcast to lower it to
the LLVM Dialect.
Reviewed By: ThomasRaoux, aartbik
Differential Revision: https://reviews.llvm.org/D95579
The __resume function trips up LLVM's 'X86 DAG->DAG Instruction Selection' unless optimizations are disabled.
Only adding the __resume function when it's needed allows lowering through AsyncToLLVM and LLVM without '-O0' as long as the coroutine functionality is not used.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D95868
[s|z]exti ops do not have the same operand and result type.
As a consequence, the lowering of the n-D vector form needs to be relaxed a bit.
This revision additionally performs a few NFC renamings of variables to make them more intuitive.
Differential Revision: https://reviews.llvm.org/D95760
Comitted log, exp, maximum, minimum, comparison, ceil and floor conversions from TOSA to LinAlg. Support for signless integer and floating point.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D95839
It is no longer necessary to also convert other "standard" ops along with the
complex dialect: the element types are now built-in integers or floating point
types, and the top-level cast between complex and struct is automatically
inserted and removed in progressive lowering.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D95625
The subview verifier in the rank-reduced case is plainly skipping verification
when the resulting type is a memref with empty affine map. This is generally incorrect.
Instead, form the actual expected rank-reduced MemRefType that takes into account the projections of 1's dimensions. Then, check the canonicalized expected rank-reduced type against the canonicalized candidate type.
Differential Revision: https://reviews.llvm.org/D95316
OffsetSizeAndStrideOpInterface now have the ability to specify only a leading subset of
offset, sizes, strides operands/attributes.
The size of that leading subset must be limited by the corresponding entry in `getArrayAttrMaxRanks` to avoid overflows.
Missing trailing dimensions are assumed to span the whole range (i.e. [0 .. dim)).
This brings more natural semantics to slice-like op on top of subview and is a simplifies to removing all uses of SliceOp in dependent projects.
Differential revision: https://reviews.llvm.org/D95441
Depends On D95000
Move async.execute outlining and async -> async.runtime lowering into the separate Async transformation pass
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D95311
Adds vp2intersect to the AVX512 dialect and defines a lowering to the
LLVM dialect.
Author: Matthias Springer <springerm@google.com>
Differential Revision: https://reviews.llvm.org/D95301
Instead of using llvm.call operations to call LLVM coro intrinsics use Coro operations from the LLVM dialect.
(This was reviewed as a part of https://reviews.llvm.org/D94923 but was lost in arc land from local branch)
Differential Revision: https://reviews.llvm.org/D95405
[NFC] No new functionality, mostly a cleanup and one more abstraction level between Async and LLVM IR.
Instead of lowering from Async to LLVM coroutines and Async Runtime API in one shot, do it progressively via async.coro and async.runtime operations.
1. Lower from async to async.runtime/coro (e.g. async.execute to function with coro setup and runtime calls)
2. Lower from async.runtime/coro to LLVM intrinsics and runtime API calls
Intermediate coro/runtime operations will allow to run transformations on a higher level IR and do not try to match IR based on the LLVM::CallOp properties.
Although async.coro is very close to LLVM coroutines, it is not exactly the same API, instead it is optimized for usability in async lowering, and misses a lot of details that are present in @llvm.coro intrinsic.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94923
spv.Ordered/spv.Unordered are meant for OpenCL Kernel capability.
For Vulkan Shader capability, we should use spv.IsNan to check
whether a number is NaN.
Add a new pattern for converting `std.cmpf ord|uno` to spv.IsNan
and bumped the pattern converting to spv.Ordered/spv.Unordered
to a higher benefit. The SPIR-V target environment will properly
select between these two patterns.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D95237
- Extend spirv::ConstantOp::getZero/One to handle float, vector of int, and vector of float.
- Refactor ZeroExtendI1Pattern to use getZero/One methods.
- Add one more test for lowering std.zexti which extends vector<4xi1> to vector<4xi64>.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D95120
Define OrderedOp and UnorderedOp instructions in SPIR-V and convert
cmpf operations with `ord` and `uno` tag to these instructions
respectively.
Differential Revision: https://reviews.llvm.org/D95098
TosaToLinalg was depending on its header file indirectly through
Passes.h rather than directly. This removes that indirection.
Differential Revision: https://reviews.llvm.org/D94706
Initial commit to add support for lowering from TOSA to Linalg. The focus is on
the essential infrastructure for these lowerings and integration with existing
passes.
Includes lowerings for a subset of operations including:
abs, add, sub, pow, and, or, xor, left shift, right shift, tanh
Lit tests are used to validate correctness.
Differential Revision: https://reviews.llvm.org/D94247
Continue the convergence between LLVM dialect and built-in types by using the
built-in vector type whenever possible, that is for fixed vectors of built-in
integers and built-in floats. LLVM dialect vector type is still in use for
pointers, less frequent floating point types that do not have a built-in
equivalent, and scalable vectors. However, the top-level `LLVMVectorType` class
has been removed in favor of free functions capable of inspecting both built-in
and LLVM dialect vector types: `LLVM::getVectorElementType`,
`LLVM::getNumVectorElements` and `LLVM::getFixedVectorType`. Additional work is
necessary to design an implemented the extensions to built-in types so as to
remove the `LLVMFixedVectorType` entirely.
Note that the default output format for the built-in vectors does not have
whitespace around the `x` separator, e.g., `vector<4xf32>` as opposed to the
LLVM dialect vector type format that does, e.g., `!llvm.vec<4 x fp128>`. This
required changing the FileCheck patterns in several tests.
Reviewed By: mehdi_amini, silvas
Differential Revision: https://reviews.llvm.org/D94405
Lowering of async dialect uses a fixed type converter and therefore does not support lowering non-standard types.
This revision adds a structural conversion so that non-standard types in `!async.value`s can be lowered to LLVM before lowering the async dialect itself.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D94404
The dialect conversion framework was enhanced to handle type
conversion automatically. OpConversionPattern already contains
a pointer to the TypeConverter. There is no need to duplicate it
in a separate subclass. This removes the only reason for a
SPIRVOpLowering subclass. It adapts to use core infrastructure
and simplifies the code.
Also added a utility function to OpConversionPattern for getting
TypeConverter as a certain subclass.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D94080
Adding the ability to index the base address brings these operations closer
to the transfer read and write semantics (with lowering advantages), ensures
more consistent use in vector MLIR code (easier to read), and reduces the
amount of code duplication to lower memrefs into base addresses considerably
(making codegen less error-prone).
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D94278
Continue the convergence between LLVM dialect and built-in types by replacing
the bfloat, half, float and double LLVM dialect types with their built-in
counterparts. At the API level, this is a direct replacement. At the syntax
level, we change the keywords to `bf16`, `f16`, `f32` and `f64`, respectively,
to be compatible with the built-in type syntax. The old keywords can still be
parsed but produce a deprecation warning and will be eventually removed.
Depends On D94178
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94179
The LLVM dialect type system has been closed until now, i.e. did not support
types from other dialects inside containers. While this has had obvious
benefits of deriving from a common base class, it has led to some simple types
being almost identical with the built-in types, namely integer and floating
point types. This in turn has led to a lot of larger-scale complexity: simple
types must still be converted, numerous operations that correspond to LLVM IR
intrinsics are replicated to produce versions operating on either LLVM dialect
or built-in types leading to quasi-duplicate dialects, lowering to the LLVM
dialect is essentially required to be one-shot because of type conversion, etc.
In this light, it is reasonable to trade off some local complexity in the
internal implementation of LLVM dialect types for removing larger-scale system
complexity. Previous commits to the LLVM dialect type system have adapted the
API to support types from other dialects.
Replace LLVMIntegerType with the built-in IntegerType plus additional checks
that such types are signless (these are isolated in a utility function that
replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility
to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice.
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94178
This commit adds a new trait that can be attached to ops that have
unsigned semantics.
TODO:
- Check if other places in code can use the new attribute (possibly in this patch).
- Add a similar `SignedOp` attribute (in a new patch).
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D94068
BEGIN_PUBLIC
[mlir] Remove LLVMType, LLVM dialect types now derive Type directly
This class has become a simple `isa` hook with no proper functionality.
Removing will allow us to eventually make the LLVM dialect type infrastructure
open, i.e., support non-LLVM types inside container types, which itself will
make the type conversion more progressive.
Introduce a call `LLVM::isCompatibleType` to be used instead of
`isa<LLVMType>`. For now, this is strictly equivalent.
END_PUBLIC
Depends On D93681
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D93713
Implement Bug 46698, making ODS synthesize a getType() method that returns a
specific C++ class for OneResult methods where we know that class. This eliminates
a common source of casts in things like:
myOp.getType().cast<FIRRTLType>().getPassive()
because we know that myOp always returns a FIRRTLType. This also encourages
op authors to type their results more tightly (which is also good for
verification).
I chose to implement this by splitting the OneResult trait into itself plus a
OneTypedResult trait, given that many things are using `hasTrait<OneResult>`
to conditionalize various logic.
While this changes makes many many ops get more specific getType() results, it
is generally drop-in compatible with the previous behavior because 'x.cast<T>()'
is allowed when x is already known to be a T. The one exception to this is that
we need declarations of the types used by ops, which is why a couple headers
needed additional #includes.
I updated a few things in tree to remove the now-redundant `.cast<>`'s, but there
are probably many more than can be removed.
Differential Revision: https://reviews.llvm.org/D93790
1. Add new methods to Async runtime API to support yielding async values
2. Add lowering from `async.yield` with value payload to the new runtime API calls
`async.value` lowering requires that payload type is convertible to LLVM and supported by `llvm.mlir.cast` (DialectCast) operation.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D93592
This commit renames various SPIR-V related conversion files for
consistency. It drops the "Convert" prefix to various files and
fixes various comment headers.
Reviewed By: hanchung, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D93489
Previously all SCF to SPIR-V conversion patterns were tested as
the -convert-gpu-to-spirv pass. That obscured the structure we
want. This commit fixed it.
Reviewed By: ThomasRaoux, hanchung
Differential Revision: https://reviews.llvm.org/D93488
LLVMType contains numerous static constructors that were initially introduced
for API compatibility with LLVM. Most of these merely forward to arguments to
`SpecificType::get` (MLIR defines classes for all types, unlike LLVM IR), while
some introduce subtle semantics differences due to different modeling of MLIR
types (e.g., structs are not auto-renamed in case of conflicts). Furthermore,
these constructors don't match MLIR idioms and actively prevent us from making
the LLVM dialect type system more open. Remove them and use `SpecificType::get`
instead.
Depends On D93680
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D93681
One common situation is to create a lot of IR at a well known location,
e.g. when doing a big rewrite from one dialect to another where you're expanding
ops out into lots of other ops.
For these sorts of situations, it is annoying to pass the location into
every create call. As we discused in a few threads on the forum, a way to help
with this is to produce a new sort of builder that holds a location and provides
it to each of the create<> calls automatically.
This patch implements an ImplicitLocOpBuilder class that does this. We've had
good experience with this in the CIRCT project, and it makes sense to upstream to
MLIR.
I picked a random pass to adopt it to show the impact, but I don't think there is
any particular need to force adopt it in the codebase.
Differential Revision: https://reviews.llvm.org/D93717
LLVMType contains multiple instance methods that were introduced initially for
compatibility with LLVM API. These methods boil down to `cast` followed by
type-specific call. Arguably, they are mostly used in an LLVM cast-follows-isa
anti-pattern. This doesn't connect nicely to the rest of the MLIR
infrastructure and actively prevents it from making the LLVM dialect type
system more open, e.g., reusing built-in types when appropriate. Remove such
instance methods and replaces their uses with apporpriate casts and methods on
derived classes. In some cases, the result may look slightly more verbose, but
most cases should actually use a stricter subtype of LLVMType anyway and avoid
the isa/cast.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D93680
This commit addresses the issue of lowering affine.for and
affine.parallel having return values. Relevant test cases are also
added.
Signed-off-by: Prateek Gupta <prateek@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D93090
Transfer_ops can now work on both buffers and tensor. Right now, lowering of
the tensor case is not supported yet.
Differential Revision: https://reviews.llvm.org/D93500
This class used to serve a few useful purposes:
* Allowed containing a null DictionaryAttr
* Provided some simple mutable API around a DictionaryAttr
The first of which is no longer an issue now that there is much better caching support for attributes in general, and a cache in the context for empty dictionaries. The second results in more trouble than it's worth because it mutates the internal dictionary on every action, leading to a potentially large number of dictionary copies. NamedAttrList is a much better alternative for the second use case, and should be modified as needed to better fit it's usage as a DictionaryAttrBuilder.
Differential Revision: https://reviews.llvm.org/D93442
This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified.
Differential Revision: https://reviews.llvm.org/D93432
This commit shuffles SPIR-V code around to better follow MLIR
convention. Specifically,
* Created IR/, Transforms/, Linking/, and Utils/ subdirectories and
moved suitable code inside.
* Created SPIRVEnums.{h|cpp} for SPIR-V C/C++ enums generated from
SPIR-V spec. Previously they are cluttered inside SPIRVTypes.{h|cpp}.
* Fixed include guards in various header files (both .h and .td).
* Moved serialization tests under test/Target/SPIRV.
* Renamed TableGen backend -gen-spirv-op-utils into -gen-spirv-attr-utils
as it is only generating utility functions for attributes.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D93407
This commit splits SPIR-V's serialization and deserialization code
into separate libraries. The motiviation being that the serializer
is used more often the deserializer and therefore lumping them
together unnecessarily increases binary size for the most common
case.
This commit also moves these libraries into the Target/ directory
to follow MLIR convention.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D91548
This reverts commit 0d48d265db.
This reapplies the following commit, with a fix for CAPI/ir.c:
[mlir] Start splitting the `tensor` dialect out of `std`.
This starts by moving `std.extract_element` to `tensor.extract` (this
mirrors the naming of `vector.extract`).
Curiously, `std.extract_element` supposedly works on vectors as well,
and this patch removes that functionality. I would tend to do that in
separate patch, but I couldn't find any downstream users relying on
this, and the fact that we have `vector.extract` made it seem safe
enough to lump in here.
This also sets up the `tensor` dialect as a dependency of the `std`
dialect, as some ops that currently live in `std` depend on
`tensor.extract` via their canonicalization patterns.
Part of RFC: https://llvm.discourse.group/t/rfc-split-the-tensor-dialect-from-std/2347/2
Differential Revision: https://reviews.llvm.org/D92991
This starts by moving `std.extract_element` to `tensor.extract` (this
mirrors the naming of `vector.extract`).
Curiously, `std.extract_element` supposedly works on vectors as well,
and this patch removes that functionality. I would tend to do that in
separate patch, but I couldn't find any downstream users relying on
this, and the fact that we have `vector.extract` made it seem safe
enough to lump in here.
This also sets up the `tensor` dialect as a dependency of the `std`
dialect, as some ops that currently live in `std` depend on
`tensor.extract` via their canonicalization patterns.
Part of RFC: https://llvm.discourse.group/t/rfc-split-the-tensor-dialect-from-std/2347/2
Differential Revision: https://reviews.llvm.org/D92991
- use ConvertOpToLLVMPattern to avoid explicit casting and in most cases the
constructor can be reused to save a few lines of code.
Differential Revision: https://reviews.llvm.org/D92989
This is needed so a listener hears all changes during the dialect
conversion to allow correct rollbacks upon failure.
Differential Revision: https://reviews.llvm.org/D92685
This is part of a larger refactoring the better congregates the builtin structures under the BuiltinDialect. This also removes the problematic "standard" naming that clashes with the "standard" dialect, which is not defined within IR/. A temporary forward is placed in StandardTypes.h to allow time for downstream users to replaced references.
Differential Revision: https://reviews.llvm.org/D92435
A separate AVX512 lowering pass does not compose well with the regular
vector lowering pass. As such, it is at risk of code duplication and
lowering inconsistencies. This change removes the separate AVX512 lowering
pass and makes it an "option" in the regular vector lowering pass
(viz. vector dialect "augmented" with AVX512 dialect).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D92614
There isn't a good reason for anything within IR to specifically reference any of the builtin operations. The only place that had a good reason in the past was AsmPrinter, but the behavior there doesn't need to hardcode ModuleOp anymore.
Differential Revision: https://reviews.llvm.org/D92448
Given that OpState already implicit converts to Operator*, this seems reasonable.
The alternative would be to add more functions to OpState which forward to Operation.
Reviewed By: rriddle, ftynse
Differential Revision: https://reviews.llvm.org/D92266
The ops are very similar to the std variants, but support async GPU execution.
gpu.alloc does not currently support an alignment attribute, and the new ops do not have
canonicalizers/folders like their std siblings do.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91698
Introduce a conversion pass from SCF parallel loops to OpenMP dialect
constructs - parallel region and workshare loop. Loops with reductions are not
supported because the OpenMP dialect cannot model them yet.
The conversion currently targets only one level of parallelism, i.e. only
one top-level `omp.parallel` operation is produced even if there are nested
`scf.parallel` operations that could be mapped to `omp.wsloop`. Nested
parallelism support is left for future work.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91982
This revision will make it easier to create new ops base on the strided memref abstraction outside of the std dialect.
OffsetSizeAndStrideOpInterface is an interface for ops that allow specifying mixed dynamic and static offsets, sizes and strides variadic operands.
Ops that implement this interface need to expose the following methods:
1. `getArrayAttrRanks` to specify the length of static integer
attributes.
2. `offsets`, `sizes` and `strides` variadic operands.
3. `static_offsets`, resp. `static_sizes` and `static_strides` integer
array attributes.
The invariants of this interface are:
1. `static_offsets`, `static_sizes` and `static_strides` have length
exactly `getArrayAttrRanks()`[0] (resp. [1], [2]).
2. `offsets`, `sizes` and `strides` have each length at most
`getArrayAttrRanks()`[0] (resp. [1], [2]).
3. if an entry of `static_offsets` (resp. `static_sizes`,
`static_strides`) is equal to a special sentinel value, namely
`ShapedType::kDynamicStrideOrOffset` (resp. `ShapedType::kDynamicSize`,
`ShapedType::kDynamicStrideOrOffset`), then the corresponding entry is
a dynamic offset (resp. size, stride).
4. a variadic `offset` (resp. `sizes`, `strides`) operand must be present
for each dynamic offset (resp. size, stride).
This interface is useful to factor out common behavior and provide support
for carrying or injecting static behavior through the use of the static
attributes.
Differential Revision: https://reviews.llvm.org/D92011
It is a simple conversion that only requires to change the region argument
types, generalize it from ParallelOp.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91989
The existing implementation of the conversion from SCF Parallel operation to
SCF "for" loops in order to further convert those loops to branch-based CFG has
been cloning the loop and reduction body operations into the new loop because
ConversionPatternRewriter was missing support for moving blocks while replacing
their arguments. This functionality now available, use it to implement the
conversion and avoid cloning operations, which may lead to doubling of the IR
size during the conversion.
In addition, this fixes an issue with converting nested SCF "if" conditionals
present in "parallel" operations that would cause the conversion infrastructure
to stop because of the repeated application of the pattern converting "newly"
created "if"s (which were in fact just moved). Arguably, this should be fixed
at the infrastructure level and this fix is a workaround.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91955
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.
The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.
Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.
Reviewed By: tejohnson, MaskRay, jpienaar
Differential Revision: https://reviews.llvm.org/D91410
Depends On D89963
**Automatic reference counting algorithm outline:**
1. `ReturnLike` operations forward the reference counted values without
modifying the reference count.
2. Use liveness analysis to find blocks in the CFG where the lifetime of
reference counted values ends, and insert `drop_ref` operations after
the last use of the value.
3. Insert `add_ref` before the `async.execute` operation capturing the
value, and pairing `drop_ref` before the async body region terminator,
to release the captured reference counted value when execution
completes.
4. If the reference counted value is passed only to some of the block
successors, insert `drop_ref` operations in the beginning of the blocks
that do not have reference coutned value uses.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D90716
Null types are commonly used as an error marker. Catch them in the constructor
of Operation if they are present in the result type list, as otherwise this
could lead to further surprising behavior when querying op result types.
Fix AsyncToLLVM and StandardToLLVM that were using null types when constructing
operations.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D91770
Additionally, clear a data structure to ensure a proper state if multiple conversion attempts are needed.
Differential Revision: https://reviews.llvm.org/D91791
Make the interface match the one of ConvertToLLVMPattern::getDataPtr() (to be removed in a separate change).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D91599
std.alloc only supports memrefs with identity layout, which means we can simplify the lowering to LLVM and compute strides only from (static and dynamic) sizes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D91549
This commit does the renaming mentioned in the title in order to bring
`spv` dialect closer to the MLIR naming conventions.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D91609
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.
Differential Revision: https://reviews.llvm.org/D91572
The current code allows strided layouts, but the number of elements allocated is ambiguous. It could be either the number of elements in the shape (the current implementation), or the amount of elements required to not index out-of-bounds with the given maps (which would require evaluating the layout map).
If we require the canonical layouts, the two will be the same.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D91523
The logic of vector on boolean was missed. This patch adds the logic and test on
it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D91403
Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963
This exposes a hook to configure legality of operations such that only
`scf.parallel` operations that have mapping attributes are marked as
illegal. Consequently, the transformation can now also be applied to
mixed forms.
Differential Revision: https://reviews.llvm.org/D91340
- Move isSupportedMemRefType() to ConvertToLLVMPatterns and check if the
memref element type is supported there.
Differential Revision: https://reviews.llvm.org/D91374
This patch introduces a new conversion pattern for `spv.ExecutionMode`.
`spv.ExecutionMode` may contain important information about the entry
point, which we want to preserve. For example, `LocalSize` provides
information about the work-group size that can be reused. Hence, the
pattern for entry-point ops changes to the following:
- `spv.EntryPoint` is still simply removed
- Info from `spv.ExecutionMode` is used to create a global struct variable,
which looks like:
```
struct {
int32_t executionMode;
int32_t values[]; // optional values
};
```
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D89989
VectorInsertDynamicOp in SPIRV dialect
conversion from vector.insertelement to spirv VectorInsertDynamicOp
Differential Revision: https://reviews.llvm.org/D90927
The pass combines patterns of ExpandAtomic, ExpandMemRefReshape,
StdExpandDivs passes. The pass is meant to legalize STD for conversion to LLVM.
Differential Revision: https://reviews.llvm.org/D91082
- Convert `global_memref` to LLVM::GlobalOp.
- Convert `get_global_memref` to a memref descriptor with a pointer to the first element
of the global stashed in it.
- Extend unit test and a mlir-cpu-runner test to validate the generated LLVM IR.
Differential Revision: https://reviews.llvm.org/D90803
Since SPIR-V module has an optional name, this patch
makes a change to pass it to `ModuleOp` during conversion.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D90904
VectorExtractDynamicOp in SPIRV dialect
conversion from vector.extractelement to spirv VectorExtractDynamicOp
Differential Revision: https://reviews.llvm.org/D90679
- Eliminate duplicated information about mapping from memref -> its descriptor fields
by consolidating that mapping in two functions: getMemRefDescriptorFields and
getUnrankedMemRefDescriptorFields.
- Change convertMemRefType() and convertUnrankedMemRefType() to use these
functions.
- Remove convertMemrefSignature and convertUnrankedMemrefSignature.
Differential Revision: https://reviews.llvm.org/D90707
When the "after" region of a WhileOp is merely forwarding its arguments back to
the "before" region, i.e. WhileOp is a canonical do-while loop, a simpler CFG
subgraph that omits the "after" region with its extra branch operation can be
produced. Loop rotation from general "while" to "if { do-while }" is left for a
future canonicalization pattern when it becomes necessary.
Differential Revision: https://reviews.llvm.org/D90604
The lowering is a straightforward inlining of the "before" and "after" regions
connected by (conditional) branches. This plugs the WhileOp into the
progressive lowering scheme. Future commits may choose to target WhileOp
instead of CFG when lowering ForOp.
Differential Revision: https://reviews.llvm.org/D90603
Because cstr operations allow more instruction reordering than asserts, we only
lower cstr_broadcastable to std ops with cstr_require. This ensures that the
more drastic lowering to asserts can happen specifically with the user's desire.
Differential Revision: https://reviews.llvm.org/D89325
For the synchronous case, destroy the stream after synchronization.
Sneak in a unrelated change to report why the gpu.wait conversion pattern didn't match.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89933
This class represents a rewrite pattern list that has been frozen, and thus immutable. This replaces the uses of OwningRewritePatternList in pattern driver related API, such as dialect conversion. When PDL becomes more prevalent, this API will allow for optimizing a set of patterns once without the need to do this per run of a pass.
Differential Revision: https://reviews.llvm.org/D89104
There are several pieces of pattern rewriting infra in IR/ that really shouldn't be there. This revision moves those pieces to a better location such that they are easier to evolve in the future(e.g. with PDL). More concretely this revision does the following:
* Create a Transforms/GreedyPatternRewriteDriver.h and move the apply*andFold methods there.
The definitions for these methods are already in Transforms/ so it doesn't make sense for the declarations to be in IR.
* Create a new lib/Rewrite library and move PatternApplicator there.
This new library will be focused on applying rewrites, and will also include compiling rewrites with PDL.
Differential Revision: https://reviews.llvm.org/D89103
The Pattern class was originally intended to be used for solely matching operations, but that use never materialized. All of the pattern infrastructure uses RewritePattern, and the infrastructure for pure matching(Matchers.h) is implemented inline. This means that this class isn't a useful abstraction at the moment, so this revision refactors it to solely encapsulate the "metadata" of a pattern. The metadata includes the various state describing a pattern; benefit, root operation, etc. The API on PatternApplicator is updated to now operate on `Pattern`s as nothing special from `RewritePattern` is necessary.
This refactoring is also necessary for the upcoming use of PDL patterns alongside C++ rewrite patterns.
Differential Revision: https://reviews.llvm.org/D86258
The conversion between PDL and the interpreter is split into several different parts.
** The Matcher:
The matching section of all incoming pdl.pattern operations is converted into a predicate tree and merged. Each pattern is first converted into an ordered list of predicates starting from the root operation. A predicate is composed of three distinct parts:
* Position
- A position refers to a specific location on the input DAG, i.e. an
existing MLIR entity being matched. These can be attributes, operands,
operations, results, and types. Each position also defines a relation to
its parent. For example, the operand `[0] -> 1` has a parent operation
position `[0]` (the root).
* Question
- A question refers to a query on a specific positional value. For
example, an operation name question checks the name of an operation
position.
* Answer
- An answer is the expected result of a question. For example, when
matching an operation with the name "foo.op". The question would be an
operation name question, with an expected answer of "foo.op".
After the predicate lists have been created and ordered(based on occurrence of common predicates and other factors), they are formed into a tree of nodes that represent the branching flow of a pattern match. This structure allows for efficient construction and merging of the input patterns. There are currently only 4 simple nodes in the tree:
* ExitNode: Represents the termination of a match
* SuccessNode: Represents a successful match of a specific pattern
* BoolNode/SwitchNode: Branch to a specific child node based on the expected answer to a predicate question.
Once the matcher tree has been generated, this tree is walked to generate the corresponding interpreter operations.
** The Rewriter:
The rewriter portion of a pattern is generated in a very straightforward manor, similarly to lowerings in other dialects. Each PDL operation that may exist within a rewrite has a mapping into the interpreter dialect. The code for the rewriter is generated within a FuncOp, that is invoked by the interpreter on a successful pattern match. Referenced values defined in the matcher become inputs the generated rewriter function.
An example lowering is shown below:
```mlir
// The following high level PDL pattern:
pdl.pattern : benefit(1) {
%resultType = pdl.type
%inputOperand = pdl.input
%root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
pdl.rewrite %root {
pdl.replace %root with (%inputOperand)
}
}
// is lowered to the following:
module {
// The matcher function takes the root operation as an input.
func @matcher(%arg0: !pdl.operation) {
pdl_interp.check_operation_name of %arg0 is "foo.op" -> ^bb2, ^bb1
^bb1:
pdl_interp.return
^bb2:
pdl_interp.check_operand_count of %arg0 is 1 -> ^bb3, ^bb1
^bb3:
pdl_interp.check_result_count of %arg0 is 1 -> ^bb4, ^bb1
^bb4:
%0 = pdl_interp.get_operand 0 of %arg0
pdl_interp.is_not_null %0 : !pdl.value -> ^bb5, ^bb1
^bb5:
%1 = pdl_interp.get_result 0 of %arg0
pdl_interp.is_not_null %1 : !pdl.value -> ^bb6, ^bb1
^bb6:
// This operation corresponds to a successful pattern match.
pdl_interp.record_match @rewriters::@rewriter(%0, %arg0 : !pdl.value, !pdl.operation) : benefit(1), loc([%arg0]), root("foo.op") -> ^bb1
}
module @rewriters {
// The inputs to the rewriter from the matcher are passed as arguments.
func @rewriter(%arg0: !pdl.value, %arg1: !pdl.operation) {
pdl_interp.replace %arg1 with(%arg0)
pdl_interp.return
}
}
}
```
Differential Revision: https://reviews.llvm.org/D84580
This patch introduces a pass for running
`mlir-spirv-cpu-runner` - LowerHostCodeToLLVMPass.
This pass emulates `gpu.launch_func` call in LLVM dialect and lowers
the host module code to LLVM. It removes the `gpu.module`, creates a
sequence of global variables that are later linked to the varables
in the kernel module, as well as a series of copies to/from
them to emulate the memory transfer to/from the host or to/from the
device sides. It also converts the remaining Standard dialect into
LLVM dialect, emitting C wrappers.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86112
This reverts commit 4986d5eaff with
proper patches to CMakeLists.txt:
- Add MLIRAsync as a dependency to MLIRAsyncToLLVM
- Add Coroutines as a dependency to MLIRExecutionEngine
Lower from Async dialect to LLVM by converting async regions attached to `async.execute` operations into LLVM coroutines (https://llvm.org/docs/Coroutines.html):
1. Outline all async regions to functions
2. Add LLVM coro intrinsics to mark coroutine begin/end
3. Use MLIR conversion framework to convert all remaining async types and ops to LLVM + Async runtime function calls
All `async.await` operations inside async regions converted to coroutine suspension points. Await operation outside of a coroutine converted to the blocking wait operations.
Implement simple runtime to support concurrent execution of coroutines.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89292
Forward missing attributes when creating the new transfer op otherwise the
builder would use default values.
Differential Revision: https://reviews.llvm.org/D89907
This still satisfies the constraints required by the affine dialect and
gives more flexibility in what iteration bounds can be used when
loewring to the GPU dialect.
Differential Revision: https://reviews.llvm.org/D89782
Now, convert-shape-to-std doesn't internally create memrefs, which was
previously a bit of a layering violation. The conversion to memrefs
should logically happen as part of bufferization.
Differential Revision: https://reviews.llvm.org/D89669
This revision adds a programmable codegen strategy from linalg based on staged rewrite patterns. Testing is exercised on a simple linalg.matmul op.
Differential Revision: https://reviews.llvm.org/D89374
For some reason the variable `cumulativeSizeInBytes` in
`getCumulativeSizeInBytes` was actually storing number of elements. I decided
to fix it and refactor the function a bit.
Differential Revision: https://reviews.llvm.org/D89336
This revision introduces support for buffer allocation for any named linalg op.
To avoid template instantiating many ops, a new ConversionPattern is created to capture the LinalgOp interface.
Some APIs are updated to remain consistent with MLIR style:
`OwningRewritePatternList * -> OwningRewritePatternList &`
`BufferAssignmentTypeConverter * -> BufferAssignmentTypeConverter &`
Differential revision: https://reviews.llvm.org/D89226
This is required or broadcasting with operands of different ranks will lead to
failures as the select op requires both possible outputs and its output type to
be the same.
Differential Revision: https://reviews.llvm.org/D89134
This revision belongs to a series of patches that reduce reliance of Linalg transformations on templated rewrite and conversion patterns.
Instead, this uses a MatchAnyTag pattern for the vast majority of cases and dispatches internally.
Differential Revision: https://reviews.llvm.org/D89133
This revision also inserts an end-to-end test that lowers tensors to buffers all the way to executable code on CPU.
Differential revision: https://reviews.llvm.org/D88998
The patch fixes the types used to access the elements of the kernel parameter structure from a pointer to the structure to a pointer to the actual parameter type.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D88959
Add conversion pass for Vector dialect to SPIR-V dialect and add some simple
conversion pattern for vector.broadcast, vector.insert, vector.extract.
Differential Revision: https://reviews.llvm.org/D88761
A pattern to convert `spv.CompositeInsert` and `spv.CompositeExtract`.
In LLVM, there are 2 ops that correspond to each instruction depending
on the container type. If the container type is a vector type, then
the result of conversion is `llvm.insertelement` or `llvm.extractelement`.
If the container type is an aggregate type (i.e. struct, array), the
result of conversion is `llvm.insertvalue` or `llvm.extractvalue`.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D88205
The previous code did the lowering to alloca, malloc, and aligned_malloc
in a single class with different code paths that are somewhat difficult to
follow.
This change moves the common code to a base class and has a separte
derived class per lowering target that contains the specifics.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D88696
While affine maps are part of the builtin memref type, there is very
limited support for manipulating them in the standard dialect. Add
transpose to the set of ops to complement the existing view/subview ops.
This is a metadata transformation that encodes the transpose into the
strides of a memref.
I'm planning to use this when lowering operations on strided memrefs,
using the transpose to remove the stride without adding a dependency on
linalg dialect.
Differential Revision: https://reviews.llvm.org/D88651
We hit an llvm_unreachable related to unranked memrefs for call ops
with scalar types. Removing the llvm_unreachable since the conversion
should gracefully bail out in the presence of unranked memrefs. Adding
tests to verify that.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D88709
This patch adds support for the 'return' and 'call' ops to the bare-ptr
calling convention. These changes also align the bare-ptr calling
convention code with the latest changes in the default calling convention
and reduce the amount of customization code needed.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D87724
- use select-ops to make the lowering simpler
- change style of FileCheck variables names to be consistent
- change some variable names in the code to be more explicit
Differential Revision: https://reviews.llvm.org/D88258
Recently, restrictions on vector reductions were made more relaxed by
accepting any width signless integer and floating-point. This CL relaxes
the restriction even more by including unsigned and signed integers.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D88442
(1) simplify integer printing logic by always using 64-bit print
(2) add index support (since vector<16xindex> is planned to be added)
(3) adjust naming convention print_x -> printX
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D88436
This generalizes printing beyond just i1,i32,i64 and also accounts
for signed and unsigned interpretation in the output.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D88290
Allow propagating optional user defined attributes during SCF to GPU conversion. Gives opportunity to use user defined attributes in the further lowering. For example setting subgroup size, or other options for GPU dispatch. This does not break backward compatibility and does not require new attributes, just allow passing optional ones.
Differential Revision: https://reviews.llvm.org/D88203
This pass converts shape.cstr_* ops to eager (side-effecting)
error-handling code. After that conversion is done, the witnesses are
trivially satisfied and are replaced with `shape.const_witness true`.
Differential Revision: https://reviews.llvm.org/D87941
- Use TypeRange instead of ArrayRef<Type> where possible.
- Change some of the custom builders to also use TypeRange
Differential Revision: https://reviews.llvm.org/D87944
When packing function results into a structure during the standard-to-llvm
dialect conversion, do not assume the conversion was successful and propagate
nullptr as error state.
Fixes PR45184.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D87605
Type converter may fail and return nullptr on unconvertible types. The function
conversion did not include a check and was attempting to use a nullptr type to
construct an LLVM function, leading to a crash. Add a check and return early.
The rest of the call stack propagates errors properly.
Fixes PR47403.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D87075
This introduces a builder for the more general case that supports zero
elements (where the element type can't be inferred from the ValueRange,
since it might be empty).
Also, fix up some cases in ShapeToStandard lowering that hit this. It
happens very easily when dealing with shapes of 0-D tensors.
The SameOperandsAndResultElementType is redundant with the new
TypesMatchWith and prevented having zero elements.
Differential Revision: https://reviews.llvm.org/D87492
This patch adds a new named structured op to accompany linalg.matmul and
linalg.matvec. We needed it for our codegen, so I figured it would be useful
to add it to Linalg.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D87292
Take advantage of the new `dynamic_tensor_from_elements` operation in `std`.
Instead of stack-allocated memory, we can now lower directly to a single `std`
operation.
Differential Revision: https://reviews.llvm.org/D86935
VectorToSCF.cpp:515:47: error: specialization of 'template<class TransferOpTy> mlir::LogicalResult mlir::VectorTransferRewriter<TransferOpTy>::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const' in different namespace [-fpermissive]
Put back anonymous namespace to work around GCC5 bug.
VectorToSCF.cpp:241:61: error: specialization of 'template<class ConcreteOp> mlir::LogicalResult {anonymous}::NDTransferOpHelper<ConcreteOp>::doReplace()' in different namespace [-fpermissive]
While there
- De-templatify code that can use function_ref
- Make BoundCaptures usable when they're const
- Address post-submit review comment (static function into global namespace)
This replaces the select chain for edge-padding with an scf.if that
performs the memory operation when the index is in bounds and uses the
pad value when it's not. For transfer_write the same mechanism is used,
skipping the store when the index is out of bounds.
The integration test has a bunch of cases of how I believe this should
work.
Differential Revision: https://reviews.llvm.org/D87241
With `dynamic_tensor_from_elements` tensor values of dynamic size can be
created. The body of the operation essentially maps the index space to tensor
elements.
Declare SCF operations in the `scf` namespace to avoid name clash with the new
`std.yield` operation. Resolve ambiguities between `linalg/shape/std/scf.yield`
operations.
Differential Revision: https://reviews.llvm.org/D86276
Vector to SCF conversion still had issues due to the interaction with the natural alignment derived by the LLVM data layout. One traditional workaround is to allocate aligned. However, this does not always work for vector sizes that are non-powers of 2.
This revision implements a more portable mechanism where the intermediate allocation is always a memref of elemental vector type. AllocOp is extended to use the natural LLVM DataLayout alignment for non-scalar types, when the alignment is not specified in the first place.
An integration test is added that exercises the transfer to scf.for + scalar lowering with a 5x5 transposition.
Differential Revision: https://reviews.llvm.org/D87150
When allowed, use 32-bit indices rather than 64-bit indices in the
SIMD computation of masks. This runs up to 2x and 4x faster on
a number of AVX2 and AVX512 microbenchmarks.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D87116
Added 128 byte alignment to alloc ops created in VectorToSCF pass.
128b alignment was already introduced to this pass but not to all alloc
ops. This commit changes that by adding 128b alignment to the remaining ops.
The point of specifying alignment is to prevent possible memory alignment errors
on weakly tested architectures.
Differential Revision: https://reviews.llvm.org/D86454
Unsigned and Signless attributes use uintN_t and signed attributes use intN_t, where N is the fixed width. The 1-bit variants use bool.
Differential Revision: https://reviews.llvm.org/D86739
Adding a conversion pattern for the parallel Operation. This will
help the conversion of parallel operation with standard dialect to
parallel operation with llvm dialect. The type conversion of the block
arguments in a parallel region are controlled by the pattern for the
parallel Operation. Without this pattern, a parallel Operation with
block arguments cannot be converted from standard to LLVM dialect.
Other OpenMP operations without regions are marked as legal. When
translation of OpenMP operations with regions are added then patterns
for these operations can also be added.
Also uses all the standard to llvm patterns. Patterns of other dialects
can be added later if needed.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D86273
This patch allows to pass the gpu module name to SPIR-V
module during conversion. This has many benefits as we can lookup
converted to SPIR-V kernel in the symbol table.
In order to avoid symbol conflicts, `"__spv__"` is added to the
gpu module name to form the new one.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86384
This patch introduces a hook to encode descriptor set
and binding number into `spv.globalVariable`'s symbolic name. This
allows to preserve this information, and at the same time legalize
the global variable for the conversion to LLVM dialect.
This is required for `mlir-spirv-cpu-runner` to convert kernel
arguments into LLVM.
Also, a couple of some nits added:
- removed unused comment
- changed to a capital letter in the comment
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86515
Instead of using the TypeConverter infer the value of the alloca created based
on the init value. This will allow some ambiguous types like multidimensional
vectors to be converted correctly.
Differential Revision: https://reviews.llvm.org/D86582
Binding MemRefs of f16 needs special handling as the type is not supported on
CPU. There was a bug in the type used.
Differential Revision: https://reviews.llvm.org/D86328
Removed the Standard to LLVM conversion patterns that were previously
pulled in for testing purposes. This helps to separate the conversion
to LLVM dialect of the MLIR module with both SPIR-V and Standard
dialects in it (particularly helpful for SPIR-V cpu runner). Also,
tests were changed accordingly.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86285
Add the unsigned complements to the existing FPToSI and SIToFP operations in the
standard dialect, with one-to-one lowerings to the corresponding LLVM operations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85557
If Memref has rank > 1 this pass emits N-1 loops around
TransferRead op and transforms the op itself to 1D read. Since vectors
must have static shape while memrefs don't the pass emits if condition
to prevent out of bounds accesses in case some memref dimension is smaller
than the corresponding dimension of targeted vector. This logic is fine
but authors forgot to apply `permutation_map` on loops upper bounds and
thus if condition compares induction variable to incorrect loop upper bound
(dimension of the memref) in case `permutation_map` is not identity map.
This commit aims to fix that.
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
mlir::registerDialect<mlir::standalone::StandaloneDialect>();
mlir::registerDialect<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
There should be an equivalent std.floor op to std.ceil. This includes
matching lowerings for SPIRV, NVVM, ROCDL, and LLVM.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85940
This patch adds more op/type conversion support
necessary for `spirv-runner`:
- EntryPoint/ExecutionMode: currently removed since we assume
having only one kernel function in the kernel module.
- StorageBuffer storage class is now supported. We are not
concerned with multithreading so this is fine for now.
- Type conversion enhanced, now regular offsets and strides
for structs and arrays are supported (based on
`VulkanLayoutUtils`).
- Support of `spc.AccessChain` that is modelled with GEP op
in LLVM dialect.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86109
The function makes too strong assumption regarding parent FuncOp
which gets broken when FuncOp is first lowered to llvm function.
In this fix we generalize the assumption to allocation scope and
add assertion to produce user friendly message in case our assumption
is broken.
Differential Revision: https://reviews.llvm.org/D86086
Legacy implementation of the LLVM dialect in MLIR contained an instance of
llvm::Module as it was required to parse LLVM IR types. The access to the data
layout of this module was exposed to the users for convenience, but in practice
this layout has always been the default one obtained by parsing an empty layout
description string. Current implementation of the dialect no longer relies on
wrapping LLVM IR types, but it kept an instance of DataLayout for
compatibility. This effectively forces a single data layout to be used across
all modules in a given MLIR context, which is not desirable. Remove DataLayout
from the LLVM dialect and attach it as a module attribute instead. Since MLIR
does not yet have support for data layouts, use the LLVM DataLayout in string
form with verification inside MLIR. Introduce the layout when converting a
module to the LLVM dialect and keep the default "" description for
compatibility.
This approach should be replaced with a proper MLIR-based data layout when it
becomes available, but provides an immediate solution to compiling modules with
different layouts, e.g. for GPUs.
This removes the need for LLVMDialectImpl, which is also removed.
Depends On D85650
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D85652
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
The convresion of memref cast operaitons from the Standard dialect to the LLVM
dialect has been emitting bitcasts from a struct type to itself. Beyond being
useless, such casts are invalid as bitcast does not operate on aggregate types.
This kept working by accident because LLVM IR bitcast construction API skips
the construction if types are equal before it verifies that the types are
acceptable in a bitcast. Do not emit such bitcasts, the memref cast that only
adds/erases size information is in fact a noop on the current descriptor as it
always contains dynamic values for all sizes.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D85899
This revision removes all of the lingering usages of Type::getKind. A consequence of this is that FloatType is now split into 4 derived types that represent each of the possible float types(BFloat16Type, Float16Type, Float32Type, and Float64Type). Other than this split, this revision is NFC.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D85566
Inital conversion of `spv._address_of` and `spv.globalVariable`.
In SPIR-V, the global returns a pointer, whereas in LLVM dialect
the global holds an actual value. This difference is handled by
`spv._address_of` and `llvm.mlir.addressof`ops that both return
a pointer. Moreover, only current invocation is in conversion's
scope.
Reviewed By: antiagainst, mravishankar
Differential Revision: https://reviews.llvm.org/D84626
This change adds initial support needed to generate OpenCL compliant SPIRV.
If Kernel capability is declared then memory model becomes OpenCL.
If Addresses capability is declared then addressing model becomes Physical64.
Additionally for Kernel capability interface variable ABI attributes are not
generated as entry point function is expected to have normal arguments.
Differential Revision: https://reviews.llvm.org/D85196
Using a shuffle for the last recursive step in progressive lowering not only
results in much more compact IR, but also more efficient code (since the
backend is no longer confused on subvector aliasing for longer vectors).
E.g. the following
%f = vector.shape_cast %v0: vector<1024xf32> to vector<32x32xf32>
yields much better x86-64 code that runs 3x faster than the original.
Reviewed By: bkramer, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D85482
Original modeling of LLVM IR types in the MLIR LLVM dialect had been wrapping
LLVM IR types and therefore required the LLVMContext in which they were created
to outlive them, which was solved by placing the LLVMContext inside the dialect
and thus having the lifetime of MLIRContext. This has led to numerous issues
caused by the lack of thread-safety of LLVMContext and the need to re-create
LLVM IR modules, obtained by translating from MLIR, in different LLVM contexts
to enable parallel compilation. Similarly, llvm::Module had been introduced to
keep track of identified structure types that could not be modeled properly.
A recent series of commits changed the modeling of LLVM IR types in the MLIR
LLVM dialect so that it no longer wraps LLVM IR types and has no dependence on
LLVMContext and changed the ownership model of the translated LLVM IR modules.
Remove LLVMContext and LLVM modules from the implementation of MLIR LLVM
dialect and clean up the remaining uses.
The only part of LLVM IR that remains necessary for the LLVM dialect is the
data layout. It should be moved from the dialect level to the module level and
replaced with an MLIR-based representation to remove the dependency of the
LLVMDialect on LLVM IR library.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D85445
Due to the original type system implementation, LLVMDialect in MLIR contains an
LLVMContext in which the relevant objects (types, metadata) are created. When
an MLIR module using the LLVM dialect (and related intrinsic-based dialects
NVVM, ROCDL, AVX512) is converted to LLVM IR, it could only live in the
LLVMContext owned by the dialect. The type system no longer relies on the
LLVMContext, so this limitation can be removed. Instead, translation functions
now take a reference to an LLVMContext in which the LLVM IR module should be
constructed. The caller of the translation functions is responsible for
ensuring the same LLVMContext is not used concurrently as the translation no
longer uses a dialect-wide context lock.
As an additional bonus, this change removes the need to recreate the LLVM IR
module in a different LLVMContext through printing and parsing back, decreasing
the compilation overhead in JIT and GPU-kernel-to-blob passes.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D85443
The RewritePattern will become one of several, and will be part of the LLVM conversion pass (instead of a separate pass following LLVM conversion).
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D84946
Historical modeling of the LLVM dialect types had been wrapping LLVM IR types
and therefore needed access to the instance of LLVMContext stored in the
LLVMDialect. The new modeling does not rely on that and only needs the
MLIRContext that is used for uniquing, similarly to other MLIR types. Change
LLVMType::get<Kind>Ty functions to take `MLIRContext *` instead of
`LLVMDialect *` as first argument. This brings the code base closer to
completely removing the dependence on LLVMContext from the LLVMDialect,
together with additional support for thread-safety of its use.
Depends On D85371
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D85372
This prepares for the removal of llvm::Module and LLVMContext from the
mlir::LLVMDialect.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D85371
The intrinsics were already supported and vector.transfer_read/write lowered
direclty into these operations. By providing them as individual ops, however,
clients can used them directly, and it opens up progressively lowering transfer
operations at higher levels (rather than direct lowering to LLVM IR as done now).
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D85357
Previous type model in the LLVM dialect did not support identified structure
types properly and therefore could use stateless translations implemented as
free functions. The new model supports identified structs and must keep track
of the identified structure types present in the target context (LLVMContext or
MLIRContext) to avoid creating duplicate structs due to LLVM's type
auto-renaming. Expose the stateful type translation classes and use them during
translation, storing the state as part of ModuleTranslation.
Drop the test type translation mechanism that is no longer necessary and update
the tests to exercise type translation as part of the main translation flow.
Update the code in vector-to-LLVM dialect conversion that relied on stateless
translation to use the new class in a stateless manner.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D85297
Per Vulkan's SPIR-V environment spec: "While the OpSRem and OpSMod
instructions are supported by the Vulkan environment, they require
non-negative values and thus do not enable additional functionality
beyond what OpUMod provides."
The `getOffsetForBitwidth` function is used for lowering std.load
and std.store, whose indices are of `index` type and cannot be
negative. So we should be okay to use spv.UMod directly here to
be exact. Also made the comment explicit about the assumption.
Differential Revision: https://reviews.llvm.org/D83714