While the default value for the amdgpu-flat-work-group-size attribute,
"1, 256", matches the defaults from Clang, some users of the ROCDL dialect,
namely Tensorflow, use larger workgroups, such as 1024. Therefore,
instead of hardcoding this value, we add a rocdl.max_flat_work_group_size
attribute that can be set on GPU kernels to override the default value.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D115741
Make the reduction handling in OpenMPIRBuilder compatible with
opaque pointers by explicitly storing the element type in ReductionInfo,
and also passing it to the atomic reduction callback, as at least
the ones in the test need the type there.
This doesn't make things fully compatible yet, there are other
uses of element types in this class. I also left one
getPointerElementType() call in mlir, because I'm not familiar
with that area.
Differential Revison: https://reviews.llvm.org/D115638
This gives us better debugging print as it supports indent
levels and other nice features.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115583
The previous "optimization" that tries to reuse existing block for
selection header block can be problematic for deserialization
because it effectively pulls in previous ops in the selection op's
enclosing block into the selection op's header. When deserializing,
those ops will be placed in the selection op's region. If any of
the previous ops has usage after the section op, it will break. That
is, the following IR cannot round trip:
```mlir
^bb:
%def = ...
spv.mlir.selection { ... }
%use = spv.SomeOp %def
```
This commit removes the "optimization" to always create new blocks
for the selection header.
Along the way, also made error reporting better in deserialization
by turning asserts into proper errors and add check of uses outside
of sinked structured control flow region blocks.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115582
If we have a `spv.mlir.selection` op nested in a `spv.mlir.loop`
op, when serializing the loop's block, we might need to jump
from the selection op's merge block, which might be different
than the immediate MLIR IR predecessor block. But we still need
to get the block argument from the MLIR IR predecessor block.
Also, if the `spv.mlir.selection` is in the `spv.mlir.loop`'s
header block, we need to make sure `OpLoopMerge` is emitted
in the current block before start processing the nested selection
op. Otherwise we'll see the LoopMerge in the wrong SPIR-V
basic block.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115560
`(void)` was added when LogicalResult was marked as non
discard. This commit cleans them up to properly propagate
failures.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115541
It's legal per the Vulkan / SPIR-V spec; still it's better to avoid
such duplication to have cleaner blob and reduce the binary size.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115532
In SPIR-V, symbol names are encoded as `OpName` instructions.
They are not semantic impacting and can be omitted, which can
reduce the binary size.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115531
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered
This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.
And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.
In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110448
This patch adds lowering from omp.atomic.read to LLVM IR along with the
memory ordering clause. Tests for the same are also added.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D115134
NamedAttribute is currently represented as an std::pair, but this
creates an extremely clunky .first/.second API. This commit
converts it to a class, with better accessors (getName/getValue)
and also opens the door for more convenient API in the future.
Differential Revision: https://reviews.llvm.org/D113956
Identifier and StringAttr essentially serve the same purpose, i.e. to hold a string value. Keeping these seemingly identical pieces of functionality separate has caused problems in certain situations:
* Identifier has nice accessors that StringAttr doesn't
* Identifier can't be used as an Attribute, meaning strings are often duplicated between Identifier/StringAttr (e.g. in PDL)
The only thing that Identifier has that StringAttr doesn't is support for caching a dialect that is referenced by the string (e.g. dialect.foo). This functionality is added to StringAttr, as this is useful for StringAttr in generally the same ways it was useful for Identifier.
Differential Revision: https://reviews.llvm.org/D113536
This predates the templated variant, and has been simply forwarding
to getSplatValue<Attribute> for some time. Removing this makes the
API a bit more uniform, and also helps prevent users from thinking
it is "cheap".
There are several aspects of the API that either aren't easy to use, or are
deceptively easy to do the wrong thing. The main change of this commit
is to remove all of the `getValue<T>`/`getFlatValue<T>` from ElementsAttr
and instead provide operator[] methods on the ranges returned by
`getValues<T>`. This provides a much more convenient API for the value
ranges. It also removes the easy-to-be-inefficient nature of
getValue/getFlatValue, which under the hood would construct a new range for
the type `T`. Constructing a range is not necessarily cheap in all cases, and
could lead to very poor performance if used within a loop; i.e. if you were to
naively write something like:
```
DenseElementsAttr attr = ...;
for (int i = 0; i < size; ++i) {
// We are internally rebuilding the APFloat value range on each iteration!!
APFloat it = attr.getFlatValue<APFloat>(i);
}
```
Differential Revision: https://reviews.llvm.org/D113229
wmma intrinsics have a large number of combinations, ideally we want to be able
to target all the different variants. To avoid a combinatorial explosion in the
number of mlir op we use attributes to represent the different variation of
load/store/mma ops. We also can generate with tablegen helpers to know which
combinations are available. Using this we can avoid having too hardcode a path
for specific shapes and can support more types.
This patch also adds boiler plates for tf32 op support.
Differential Revision: https://reviews.llvm.org/D112689
Add llvm.mlir.global_ctors and global_dtors ops and their translation
support to LLVM global_ctors/global_dtors global variables.
Differential Revision: https://reviews.llvm.org/D112524
Pass the modifiers from the Flang parser to FIR/MLIR workshare
loop operation.
Not yet supporting the SIMD modifier, which is a bit more work
than just adding it to the list of modifiers, so will go in a
separate patch.
This adds a new field to the WsLoopOp.
Also add test for dynamic WSLoop, checking that dynamic schedule calls
the init and next functions as expected.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111053
This patch supports the ordered construct in OpenMP dialect following
Section 2.19.9 of the OpenMP 5.1 standard. Also lowering to LLVM IR
using OpenMP IRBduiler. Lowering to LLVM IR for ordered simd directive
is not supported yet since LLVM optimization passes do not support it
for now.
Reviewed By: kiranchandramohan, clementval, ftynse, shraiysh
Differential Revision: https://reviews.llvm.org/D110015
This patch supports the ordered construct in OpenMP dialect following
Section 2.19.9 of the OpenMP 5.1 standard. Also lowering to LLVM IR
using OpenMP IRBduiler. Lowering to LLVM IR for ordered simd directive
is not supported yet since LLVM optimization passes do not support it
for now.
Reviewed By: kiranchandramohan, clementval, ftynse, shraiysh
Differential Revision: https://reviews.llvm.org/D110015
According to the OpenMP 5.0 standard, names and hints of critical operation are
closely related. The following are the restrictions on them:
- Unless the effect is as if `hint(omp_sync_hint_none)` was specified, the
critical construct must specify a name.
- If the hint clause is specified, each of the critical constructs with the
same name must have a hint clause for which the hint-expression evaluates to
the same value.
These restrictions will be enforced by design if the hint expression is a part
of the `omp.critical.declare` operation.
- Any operation with no "name" will be considered to have
`hint(omp_sync_hint_none)`.
- All the operations with the same "name" will have the same hint value.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D112134
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
`hint-expression` is an IntegerAttr, because it can be a combination of multiple values from the enum `omp_sync_hint_t` (Section 2.17.12 of OpenMP 5.0)
Reviewed By: ftynse, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D111360
This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.
It also contains three small changes.
1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.
Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>
Reviewed by:antiagainst
Differential revision: https://reviews.llvm.org/D110207
Previously, the translation to LLVM IR would emit IR that directly uses
a scope metadata node in case only one scope was in use in alias.scopes
or noalias metadata. It should always be a list of scopes. The verifier
change in 8700f2bd36 enforced this and
broke the test. Fix the translation to always create a list of scopes
using a new metadata node, update and reenable the respective test.
Fixes PR51919.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110140
Currently DenseElementsAttr only exposes the ability to get the full range of values for a given type T, but there are many situations where we just want the beginning/end iterator. This revision adds proper value_begin/value_end methods for all of the supported T types, and also cleans up a bit of the interface.
Differential Revision: https://reviews.llvm.org/D104173
Make use of runtime extension for the second reference counter used in
structured data region. This extension is implemented in D106510 and D106509.
Differential Revision: https://reviews.llvm.org/D106517
The translation to LLVM IR used to construct sequential constants by recurring
down to individual elements, creating constant values for them, and wrapping
them into aggregate constants in post-order. This is highly inefficient for
large constants with known data such as DenseElementsAttr. Use LLVM's
ConstantData for the innermost dimension instead. LLVM does seem to support
data constants for nested sequential constants so the outer dimensions are
still handled recursively. Nevertheless, this speeds up the translation of
large constants with equal dimensions by up to 30x.
Users are advised to rewrite large constants to use flat types before
translating to LLVM IR if more efficiency in translation is necessary. This is
not done automatically as the translation is not aware of the expectations of
the overall compilation flow about type changes and indexing, in particular for
global constants with external linkage.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D109152
Add an operation omp.critical.declare to declare names/symbols of
critical sections. Named omp.critical operations should use symbols
declared by omp.critical.declare. Having a declare operation ensures
that the names of critical sections are global and unique. In the
lowering flow to LLVM IR, the OpenMP IRBuilder creates unique names
for critical sections.
Reviewed By: ftynse, jeanPerier
Differential Revision: https://reviews.llvm.org/D108713
This upstreams the Cpp emitter, initially presented with [1], from [2]
to MLIR core. Together with the previously upstreamed EmitC dialect [3],
the target allows to translate MLIR to C/C++.
[1] https://reviews.llvm.org/D76571
[2] https://github.com/iml130/mlir-emitc
[3] https://reviews.llvm.org/D103969
Co-authored-by: Jacques Pienaar <jpienaar@google.com>
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Co-authored-by: Oliver Scherf <oliver.scherf@iml.fraunhofer.de>
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D104632
Use the recently introduced OpenMPIRBuilder facility to transate OpenMP
workshare loops with reductions to LLVM IR calling OpenMP runtime. Most of the
heavy lifting is done at the OpenMPIRBuilder. When other OpenMP dialect
constructs grow support for reductions, the translation can be updated to
operate on, e.g., an operation interface for all reduction containers instead
of workshare loops specifically. Designing such a generic translation for the
single operation that currently supports reductions is premature since we don't
know how the reduction modeling itself will be generalized.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107343
The StringAttr version doesn't need a context, so we can just use the
existing `SymbolRefAttr::get` form. The StringRef version isn't preferred
so we want to encourage people to use StringAttr.
There is an additional form of getSymbolRefAttr that takes a (SymbolTrait
implementing) operation. This should also be moved, but I'll do that as
a separate patch.
Differential Revision: https://reviews.llvm.org/D108922
SymbolRefAttr is fundamentally a base string plus a sequence
of nested references. Instead of storing the string data as
a copies StringRef, store it as an already-uniqued StringAttr.
This makes a lot of things simpler and more efficient because:
1) references to the symbol are already stored as StringAttr's:
there is no need to copy the string data into MLIRContext
multiple times.
2) This allows pointer comparisons instead of string
comparisons (or redundant uniquing) within SymbolTable.cpp.
3) This allows SymbolTable to hold a DenseMap instead of a
StringMap (which again copies the string data and slows
lookup).
This is a moderately invasive patch, so I kept a lot of
compatibility APIs around. It would be nice to explore changing
getName() to return a StringAttr for example (right now you have
to use getNameAttr()), and eliminate things like the StringRef
version of getSymbol.
Differential Revision: https://reviews.llvm.org/D108899
Introduces new Ops to represent 1. alias.scope metadata in LLVM, and 2. domains for these scopes. These correspond to the metadata described in https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata. Lists of scopes are modeled the same way as access groups - as an ArrayAttr on the Op (added in https://reviews.llvm.org/D97944).
Lowering 'noalias' attributes on function parameters is already supported. However, lowering `noalias` metadata on individual Ops is not, which is added in this change. LLVM uses the same keyword for these, but this change introduces a separate attribute name 'noalias_scopes' to represent this distinct concept.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107870
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.
CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.
In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0
Also see discussion in D105706.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D107540
This patch adds the critical construct to the OpenMP dialect. The
implementation models the definition in 2.17.1 of the OpenMP 5 standard.
A name and hint can be specified. The name is a global entity or has
external linkage, it is modelled as a FlatSymbolRefAttr. Hint is
modelled as an integer enum attribute.
Also lowering to LLVM IR using the OpenMP IRBuilder.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D107135