Move async copy operations to NVGPU as they only exist on NV target and are
designed to match ptx semantic. This allows us to also add more fine grain
caching hint attribute to the op.
Add hint to bypass L1 and hook it up to NVVM op.
Differential Revision: https://reviews.llvm.org/D125244
Add attribute to be able to generate the intrinsic version of async copy
generating a copy with l1 bypass. This correspond to
cp.async.cg.shared.global in ptx.
Differential Revision: https://reviews.llvm.org/D125241
This change adds three new operations to the GPU dialect: gpu.mma.sync,
gpu.mma.ldmatrix, and gpu.lane_id. The former two are meant to target
the lower level nvvm.mma.sync and nvvm.ldmatrix instructions, respectively.
Lowerings are added for the new GPU operations for conversion to
NVVM.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D123647
This commit moves FuncOp out of the builtin dialect, and into the Func
dialect. This move has been planned in some capacity from the moment
we made FuncOp an operation (years ago). This commit handles the
functional aspects of the move, but various aspects are left untouched
to ease migration: func::FuncOp is re-exported into mlir to reduce
the actual API churn, the assembly format still accepts the unqualified
`func`. These temporary measures will remain for a little while to
simplify migration before being removed.
Differential Revision: https://reviews.llvm.org/D121266
OpBase.td has formed into a huge monolith of all ODS constructs. This
commits starts to rectify that by splitting out some constructs to their
own .td files.
Differential Revision: https://reviews.llvm.org/D118636
The current StandardToLLVM conversion patterns only really handle
the Func dialect. The pass itself adds patterns for Arithmetic/CFToLLVM, but
those should be/will be split out in a followup. This commit focuses solely
on being an NFC rename.
Aside from the directory change, the pattern and pass creation API have been renamed:
* populateStdToLLVMFuncOpConversionPattern -> populateFuncToLLVMFuncOpConversionPattern
* populateStdToLLVMConversionPatterns -> populateFuncToLLVMConversionPatterns
* createLowerToLLVMPass -> createConvertFuncToLLVMPass
Differential Revision: https://reviews.llvm.org/D120778
The Func has a large number of legacy dependencies carried over from the old
Standard dialect, which was pervasive and contained a large number of varied
operations. With the split of the standard dialect and its demise, a lot of lingering
dead dependencies have survived to the Func dialect. This commit removes a
large majority of then, greatly reducing the dependence surface area of the
Func dialect.
insert is soft deprecated, so remove all references so it's less likely
to be used and can be easily removed in the future.
Differential Revision: https://reviews.llvm.org/D120021
Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.
We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924
Differential Revision: https://reviews.llvm.org/D119191
The current lowering from GPU to NVVM does
not correctly handle the following cases when
lowering the gpu shuffle op.
1. When the active width is set to 32 (all lanes),
then the current approach computes (1 << 32) -1 which
results in poison values in the LLVM IR. We fix this by
defining the active mask as (-1) >> (32 - width).
2. In the case of shuffle up, the computation of the third
operand c has to be different from the other 3 modes due to
the op definition in the ISA reference.
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html)
Specifically, the predicate value is computed as j >= maxLane
for up and j <= maxLane for all other modes. We fix this by
computing maskAndClamp as 32 - width for this mode.
TEST: We modify the existing test and add more checks for the up mode.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D118086
Support load with broadcast, elementwise divf op and remove the
hardcoded restriction on the vector size. Picking the right size should
be enfored by user and will fail conversion to llvm/spirv if it is not
supported.
Differential Revision: https://reviews.llvm.org/D113618
Use existing helper instead of handling only a subset of indices lowering
arithmetic. Also relax the restriction on the memref rank for the GPU mma ops
as we can now support any rank.
Differential Revision: https://reviews.llvm.org/D113383
In order to support fusion with mma matrix type we need to be able to
execute elementwise operations on them. This add an op to be able to
support some basic elementwise operations. This is a is not a full
solution as it only supports a limited scope or operations. Ideally we would
want to be able to fuse with more kind of operations.
Differential Revision: https://reviews.llvm.org/D112857
wmma intrinsics have a large number of combinations, ideally we want to be able
to target all the different variants. To avoid a combinatorial explosion in the
number of mlir op we use attributes to represent the different variation of
load/store/mma ops. We also can generate with tablegen helpers to know which
combinations are available. Using this we can avoid having too hardcode a path
for specific shapes and can support more types.
This patch also adds boiler plates for tf32 op support.
Differential Revision: https://reviews.llvm.org/D112689
Allow lowering of wmma ops with 64bits indexes. Change the default
version of the test to use default layout.
Differential Revision: https://reviews.llvm.org/D112479
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
This commits updates the remaining usages of the ArrayRef<Value> based
matchAndRewrite/rewrite methods in favor of the new OpAdaptor
overload.
Differential Revision: https://reviews.llvm.org/D110360
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
"Standard-to-LLVM" conversion is one of the oldest passes in existence. It has
become quite large due to the size of the Standard dialect itself, which is
being split into multiple smaller dialects. Furthermore, several conversion
features are useful for any dialect that is being converted to the LLVM
dialect, which, without this refactoring, creates a dependency from those
conversions to the "standard-to-llvm" one.
Put several of the reusable utilities from this conversion to a separate
library, namely:
- type converter from builtin to LLVM dialect types;
- utility for building and accessing values of LLVM structure type;
- utility for building and accessing values that represent memref in the LLVM
dialect;
- lowering options applicable everywhere.
Additionally, remove the type wrapping/unwrapping notion from the type
converter that is no longer relevant since LLVM types has been reimplemented as
first-class MLIR types.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D105534
Split out GPU ops library from GPU transforms. This allows libraries to
depend on GPU Ops without needing/building its transforms.
Differential Revision: https://reviews.llvm.org/D105472
* Remove dependency: Standard --> MemRef
* Add dependencies: GPUToNVVMTransforms --> MemRef, Linalg --> MemRef, MemRef --> Tensor
* Note: The `subtensor_insert_propagate_dest_cast` test case in MemRef/canonicalize.mlir will be moved to Tensor/canonicalize.mlir in a subsequent commit, which moves over the remaining Tensor ops from the Standard dialect to the Tensor dialect.
Differential Revision: https://reviews.llvm.org/D104506
This allow creating a matrix with all elements set to a given value. This is
needed to be able to implement a simple dot op.
Differential Revision: https://reviews.llvm.org/D103870
Consolidate the type conversion in a single function to make it simpler
to use. This allow to re-use the type conversion for up coming ops.
Differential Revision: https://reviews.llvm.org/D103868
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.
Differential Revision: https://reviews.llvm.org/D103023
Splitting the memref dialect lead to an introduction of several dependencies
to avoid compilation issues. The canonicalize pass also depends on the
memref dialect, but it shouldn't. This patch resolves the dependencies
and the unintuitive includes are removed. However, the dependency moves
to the constructor of the std dialect.
Differential Revision: https://reviews.llvm.org/D102060
Index type is an integer type of target-specific bitwidth present in many MLIR
operations (loops, memory accesses). Converting values of this type to
fixed-size integers has always been problematic. Introduce a data layout entry
to specify the bitwidth of `index` in a given layout scope, defaulting to 64
bits, which is a commonly used assumption, e.g., in constants.
Port builtin-to-LLVM type conversion to use this data layout entry when
converting `index` type and untie it from pointer size. This is particularly
relevant for GPU targets. Keep a possibility to forcibly override the index
type in lowerings.
Depends On D98525
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98937
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
Just a pure method renaming.
It is a preparation step for replacing "memory space as raw integer"
with more generic "memory space as attribute", which will be done in
separate commit.
The `MemRefType::getMemorySpace` method will return `Attribute` and
become the main API, while `getMemorySpaceAsInt` will be declared as
deprecated and will be replaced in all in-tree dialects (also in separate
commits).
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D97476
Until now, the GPU translation to NVVM or ROCDL intrinsics relied on the
presence of the generic `gpu.kernel` attribute to attach additional LLVM IR
metadata to the relevant functions. This would be problematic if each dialect
were to handle the conversion of its own options, which is the intended
direction for the translation infrastructure. Introduce `nvvm.kernel` and
`rocdl.kernel` in addition to `gpu.kernel` and base translation on these new
attributes instead.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D96591
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
The LLVM dialect type system has been closed until now, i.e. did not support
types from other dialects inside containers. While this has had obvious
benefits of deriving from a common base class, it has led to some simple types
being almost identical with the built-in types, namely integer and floating
point types. This in turn has led to a lot of larger-scale complexity: simple
types must still be converted, numerous operations that correspond to LLVM IR
intrinsics are replicated to produce versions operating on either LLVM dialect
or built-in types leading to quasi-duplicate dialects, lowering to the LLVM
dialect is essentially required to be one-shot because of type conversion, etc.
In this light, it is reasonable to trade off some local complexity in the
internal implementation of LLVM dialect types for removing larger-scale system
complexity. Previous commits to the LLVM dialect type system have adapted the
API to support types from other dialects.
Replace LLVMIntegerType with the built-in IntegerType plus additional checks
that such types are signless (these are isolated in a utility function that
replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility
to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice.
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94178
BEGIN_PUBLIC
[mlir] Remove LLVMType, LLVM dialect types now derive Type directly
This class has become a simple `isa` hook with no proper functionality.
Removing will allow us to eventually make the LLVM dialect type infrastructure
open, i.e., support non-LLVM types inside container types, which itself will
make the type conversion more progressive.
Introduce a call `LLVM::isCompatibleType` to be used instead of
`isa<LLVMType>`. For now, this is strictly equivalent.
END_PUBLIC
Depends On D93681
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D93713