This patch is a follow-up on https://reviews.llvm.org/D81127
BF16 constants were represented as 64-bit floating point values due to the lack
of support for BF16 in APFloat. APFloat was recently extended to support
BF16 so this patch is fixing the BF16 constant representation to be 16-bit.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D81218
Summary:
This revision adds a common folding pattern that starts appearing on
vector_transfer ops.
Differential Revision: https://reviews.llvm.org/D81281
This allows verifying op-indepent attributes (e.g., attributes that do not require the op to have been created) before constructing an operation. These include checking whether required attributes are defined or constraints on attributes (such as I32 attribute). This is not perfect (e.g., if one had a disjunctive constraint where one part relied on the op and the other doesn't, then this would not try and extract the op independent from the op dependent).
The next step is to move these out to a trait that could be verified earlier than in the generated method. The first use case is for inferring the return type while constructing the op. At that point you don't have an Operation yet and that ends up in one having to duplicate the same checks, e.g., verify that attribute A is defined before querying A in shape function which requires that duplication. Instead this allows one to invoke a method to verify all the traits and, if this is checked first during verification, then all other traits could use attributes knowing they have been verified.
It is a little bit funny to have these on the adaptor, but I see the adaptor as a place to collect information about the op before the op is constructed (e.g., avoiding stringly typed accessors, verifying what is possible to verify before the op is constructed) while being cheap to use even with constructed op (so layer of indirection between the op constructed/being constructed). And from that point of view it made sense to me.
Differential Revision: https://reviews.llvm.org/D80842
Improve consistency when printing test results:
Previously we were using different labels for group names (the header
for the list of, e.g., failing tests) and summary count lines. For
example, "Failing Tests"/"Unexpected Failures". This commit changes lit
to label things consistently.
Improve wording of labels:
When talking about individual test results, the first word in
"Unexpected Failures", "Expected Passes", and "Individual Timeouts" is
superfluous. Some labels contain the word "Tests" and some don't.
Let's simplify the names.
Before:
```
Failing Tests (1):
...
Expected Passes : 3
Unexpected Failures: 1
```
After:
```
Failed Tests (1):
...
Passed: 3
Failed: 1
```
Reviewed By: ldionne
Differential Revision: https://reviews.llvm.org/D77708
Summary:
`mlir-rocm-runner` is introduced in this commit to execute GPU modules on ROCm
platform. A small wrapper to encapsulate ROCm's HIP runtime API is also inside
the commit.
Due to behavior of ROCm, raw pointers inside memrefs passed to `gpu.launch`
must be modified on the host side to properly capture the pointer values
addressable on the GPU.
LLVM MC is used to assemble AMD GCN ISA coming out from
`ConvertGPUKernelToBlobPass` to binary form, and LLD is used to produce a shared
ELF object which could be loaded by ROCm HIP runtime.
gfx900 is the default target be used right now, although it could be altered via
an option in `mlir-rocm-runner`. Future revisions may consider using ROCm Agent
Enumerator to detect the right target on the system.
Notice AMDGPU Code Object V2 is used in this revision. Future enhancements may
upgrade to AMDGPU Code Object V3.
Bitcode libraries in ROCm-Device-Libs, which implements math routines exposed in
`rocdl` dialect are not yet linked, and is left as a TODO in the logic.
Reviewers: herhut
Subscribers: mgorny, tpr, dexonsmith, mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits
Tags: #mlir, #llvm
Differential Revision: https://reviews.llvm.org/D80676
Add support for flat, location, and noperspective decorations in the
serializer and deserializer to be able to process basic shader files
for graphics applications.
Differential Revision: https://reviews.llvm.org/D80837
Recently introduced allocation hoisting is quite conservative on the cases when it triggers.
This revision makes it such that the allocations for vector transfer lowerings are hoisted
to the top of the function.
This should be revisited in the context of parallelism and is a temporary workaround.
Differential Revision: https://reviews.llvm.org/D81253
This revision adds a helper function to hoist vector.transfer_read /
vector.transfer_write pairs out of immediately enclosing scf::ForOp
iteratively, if the following conditions are true:
1. The 2 ops access the same memref with the same indices.
2. All operands are invariant under the enclosing scf::ForOp.
3. No uses of the memref either dominate the transfer_read or are
dominated by the transfer_write (i.e. no aliasing between the write and
the read across the loop)
To improve hoisting opportunities, call the `moveLoopInvariantCode` helper
function on the candidate loop above which to hoist. Hoisting the transfers
results in scf::ForOp yielding the value that originally transited through
memory.
This revision additionally exposes `moveLoopInvariantCode` as a helper in
LoopUtils.h and updates SliceAnalysis to support return scf::For values and
allow hoisting across multiple scf::ForOps.
Differential Revision: https://reviews.llvm.org/D81199
Forward declaring llvm::errs is not enough, as it is used as a default
parameter with a type that references the base class. So the class
hierarchy must be visible.
Summary:
This will inline the region to a shape.assuming in the case that the
input witness is found to be statically true.
Differential Revision: https://reviews.llvm.org/D80302
In the case of all inputs being constant and equal, cstr_eq will be
replaced with a true_witness.
Differential Revision: https://reviews.llvm.org/D80303
This allows replacing of this op with a true witness in the case of both
inputs being const_shapes and being found to be broadcastable.
Differential Revision: https://reviews.llvm.org/D80304
This allows assuming_all to be replaced when all inputs are known to be
statically passing witnesses.
Differential Revision: https://reviews.llvm.org/D80306
This will later be used during canonicalization and folding steps to replace
statically known passing constraints.
Differential Revision: https://reviews.llvm.org/D80307
Update linalg to affine lowering for convop to use affine load for input
whenever there is no padding. It had always been using std.loads because
max in index functions (needed for non-zero padding if not materializing
zeros) couldn't be represented in the non-zero padding cases.
In the future, the non-zero padding case could also be made to use
affine - either by materializing or using affine.execute_region. The
latter approach will not impact the scf/std output obtained after
lowering out affine.
Differential Revision: https://reviews.llvm.org/D81191
This simplifies a lot of handling of BoolAttr/IntegerAttr. For example, a lot of places currently have to handle both IntegerAttr and BoolAttr. In other places, a decision is made to pick one which can lead to surprising results for users. For example, DenseElementsAttr currently uses BoolAttr for i1 even if the user initialized it with an Array of i1 IntegerAttrs.
Differential Revision: https://reviews.llvm.org/D81047
This revision adds a helper function to hoist alloc/dealloc pairs and
alloca op out of immediately enclosing scf::ForOp if both conditions are true:
1. all operands are defined outside the loop.
2. all uses are ViewLikeOp or DeallocOp.
This is now considered Linalg-specific and will be generalized on a per-need basis.
Differential Revision: https://reviews.llvm.org/D81152
Add SubgroupId, SubgroupSize and NumSubgroups to GPU dialect ops and add the
lowering of those ops to SPIRV.
Differential Revision: https://reviews.llvm.org/D81042
Summary:
If the output filename was specified as "-", the ToolOutputFile class
would create a brand new raw_ostream object referring to the stdout.
This patch changes it to reuse the llvm::outs() singleton.
At the moment, this change should be "NFC", but it does enable other
enhancements, like the automatic stdout/stderr synchronization as
discussed on D80803.
I've checked the history, and I did not find any indication that this
class *has* to use a brand new stream object instead of outs() --
indeed, it is special-casing "-" in a number of places already, so this
change fits the pattern pretty well. I suspect the main reason for the
current state of affairs is that the class was originally introduced
(r111595, in 2010) as a raw_fd_ostream subclass, which made any other
solution impossible.
Another potential benefit of this patch is that it makes it possible to
move the raw_ostream class out of the business of special-casing "-" for
stdout handling. That state of affairs does not seem appropriate because
"-" is a valid filename (albeit hard to access with a lot of command
line tools) on most systems. Handling "-" in ToolOutputFile seems more
appropriate.
To make this possible, this patch changes the return type of
llvm::outs() and errs() to raw_fd_ostream&. Previously the functions
were constructing objects of that type, but returning a generic
raw_ostream reference. This makes it possible for new ToolOutputFile and
other code to use raw_fd_ostream methods like error() on the outs()
object. This does not seem like a bad thing (since stdout is a file
descriptor which can be redirected to anywhere, it makes sense to ask it
whether the writing was successful or if it supports seeking), and
indeed a lot of code was already depending on this fact via the
ToolOutputFile "back door".
Reviewers: dblaikie, JDevlieghere, MaskRay, jhenderson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81078
Summary:
The fusion for tensor_reshape is embedding the information to indexing maps,
thus the exising pattenr also works for indexed_generic ops.
Depends On D80347
Differential Revision: https://reviews.llvm.org/D80348
Summary:
Different from the fusion between generic ops, indices are involved. In this
context, we need to re-map the indices for producer since the fused op is built
on consumer's perspective. This patch supports all combination of the fusion
between indexed_generic ops and generic ops, which includes tests case:
1) generic op as producer and indexed_generic op as consumer.
2) indexed_generic op as producer and generic op as consumer.
3) indexed_generic op as producer and indexed_generic op as consumer.
Differential Revision: https://reviews.llvm.org/D80347
Summary:
Progressive lowering of vector.transpose into an operation that
is closer to an intrinsic, and thus the hardware ISA. Currently
under the common vector transform testing flag, as we prepare
deploying this transformation in the LLVM lowering pipeline.
Reviewers: nicolasvasilache, reidtatge, andydavis1, ftynse
Reviewed By: nicolasvasilache, ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits
Tags: #llvm, #mlir
Differential Revision: https://reviews.llvm.org/D80772
Add a new pass to lower operations from the `shape` to the `std` dialect.
The conversion applies only to the `size_to_index` and `index_to_size`
operations and affected types.
Other patterns will be added as needed.
Differential Revision: https://reviews.llvm.org/D81091
The original design of TypeConverter expected specific converters to derive the
class and override virtual functions for conversions and materializations. This
did not scale well to multi-dialect conversions, so the design was changed to
register a list of converter and materializer functions, removing the need for
virtual functions. The only remaining virtual function, `convertSignatureArg`
is never overridden in-tree. Make it non-virtual, drop the virtual destructor
and thus remove vtable from TypeConverter.
If there exist TypeConverter users that need custom `convertSignatureArg`
behavior, it should be implemented using the callback registration mechanism
similar to that of conversions and materializations.
Differential Revision: https://reviews.llvm.org/D80993
This patch enables affine loop fusion for loops with affine vector loads
and stores. For that, we only had to use affine memory op interfaces in
LoopFusionUtils.cpp and Utils.cpp so that vector loads and stores are
also taken into account.
Reviewed By: andydavis1, ftynse
Differential Revision: https://reviews.llvm.org/D80971