The real(10) is supported on x86_64. On aarch64, the value of
selected_real_kind(16) should be 16 rather than 10 since real(10)
is not supported on x86_64. Previously, the real type support check
is not target dependent. Support it now through the target triple
information.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D134021
One of the test cases matched IR from a subsequent test case. For this reason, the test case appeared to pass while it is actually broken.
This change does not fix the test case itself. It will be fixed when we overhaul the buffer deallocation implementation. (The memory leak in this test case is an edge case.)
Differential Revision: https://reviews.llvm.org/D135046
Before, the isPreLegalize() query in CombinerHelper only checked for the
presence of a LegalizerInfo object. This is problematic when we want to have
a combine actually check for legality in a pre-legalizer combine pass, since
if we pass a LegalizerInfo object to the constructor it causes the combines to
think that we're running *post* legalizer, which isn't true.
This change fixes it to instead check an explicit bool that passes to signal
whether the pass will be run before or after legalization.
Doing so exposed a bug in the extending loads combine, which tried to check for
legality of candidate extending loads if LegalizerInfo was present. Since we
only ran it pre-legalizer and therefore with a null LegalizerInfo, it never
actually ran. Also fixes the legality checks to keep the tests passing.
Differential Revision: https://reviews.llvm.org/D135044
This interface is implemented by memref.dim and tensor.dim. This change makes it possible to remove a build dependency of the Affine dialect on the Tensor dialect (and maybe also the MemRef dialect in the future).
Differential Revision: https://reviews.llvm.org/D133595
Add outline-shape-computation pass. This pass his pass outlines the
shape computation part in high level IR by adding shape.func and
populate corresponding mapping information into ShapeMappingAnalysis.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D131810
This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.
This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf
As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.
Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:
* `F8M<N>` : For FP8 types that can be conceived of as following the
same rules as FP16 but with a smaller number of mantissa/exponent
bits. Including the number of mantissa bits in the type name is enough
to fully specify the type. This naming scheme is used to represent
the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
values.
The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).
Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.
MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.
(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)
Differential Revision: https://reviews.llvm.org/D133823
This is an unusual canonicalization because we create an extra instruction,
but it's likely better for analysis and codegen (similar reasoning as D133399).
InstCombine::Negator may create this kind of multiply from negate and shift,
but this should not conflict because of the narrow negation.
I don't know how to create a fully general proof for this kind of transform in
Alive2, but here's an example with bitwidths similar to one of the regression
tests:
https://alive2.llvm.org/ce/z/J3jTjR
Differential Revision: https://reviews.llvm.org/D133667
Much like f16 and f32, we shouldn't try to shrink bf16 to smaller fp
constant. The code may not be optimal, but this allows us to legalize
bf16 constants under Arm without errors.
The previous resolve only creates the host associated varaibles for
common block members, but does not replace the original objects with
the new created ones. Fix it and also compute the sizes and offsets
for the host common block members if they are host associated.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D127214
The real(10) is supported on x86_64. On aarch64, the value of
selected_real_kind(16) should be 16 rather than 10 since real(10)
is not supported on x86_64. Previously, the real type support check
is not target dependent. Support it now through the target triple
information.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D134021
Instead of using `reverse_iterator`, share the optimization between the 4 algorithms. The key observation here that `memmove` applies to both `copy` and `move` identically, and to their `_backward` versions very similarly. All algorithms now follow the same pattern along the lines of:
```
if constexpr (can_memmove<InIter, OutIter>) {
memmove(first, last, out);
} else {
naive_implementation(first, last, out);
}
```
A follow-up will delete `unconstrained_reverse_iterator`.
This patch removes duplication and divergence between `std::copy`, `std::move` and `std::move_backward`. It also improves testing:
- the test for whether the optimization is used only applied to `std::copy` and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
- there were no tests to make sure the optimization is not used when the effect would be visible.
Differential Revision: https://reviews.llvm.org/D130695
The current generation is unsafe as it is evaluated during verify
invocation rather than during verifySymbolUses. Remove until this is
safely generated.
Differential Revision: https://reviews.llvm.org/D134558
One of the sources is the same size as the destination so that source
doesn't have an overlap with the destination register. By using the _TIED
form we avoid an early clobber contraint for that source.
This matches what was already done for instrinsics. ConvertToThreeAddress
will fix it if it can't stay tied.