MVE does not have a single sext/zext or trunc instruction that takes the
bottom half of a vector and extends to a full width, like NEON has with
MOVL. Instead it is expected that this happens through top/bottom
instructions. So the MVE equivalent VMOVLT/B instructions take either
the even or odd elements of the input and extend them to the larger
type, producing a vector with half the number of elements each of double
the bitwidth. As there is no simple instruction for a normal extend, we
often have to expand sext/zext/trunc into a series of lane moves (or
stack loads/stores, which we do not do yet).
This pass takes vector code that starts at truncs, looks for
interconnected blobs of operations that end with sext/zext and
transforms them by adding shuffles so that the lanes are interleaved and
the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so
that it can work across basic blocks.
This initial version of the pass just handles a limited set of
instructions, not handling constants or splats or FP, which can all come
as extensions to this base.
Differential Revision: https://reviews.llvm.org/D95804
This follows GCC. Having libstdc++/libc++ include paths is not useful
anyway because libstdc++/libc++ header files cannot find features.h.
While here, suppress -stdlib++-isystem with -nostdlibinc.
RVV intrinsics has new overloading rule, please see
82aac7dad4
Changed:
1. Rename `generic` to `overloaded` because the new rule is not using C11 generic.
2. Change HasGeneric to HasNoMaskedOverloaded because all masked operations
support overloading api.
3. Add more overloaded tests due to overloading rule changed.
Differential Revision: https://reviews.llvm.org/D99189
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
Breaking a string literal or a function calls arguments with
AlignConsecutiveDeclarations or AlignConsecutiveAssignments did misalign
the continued line. E.g.:
void foo() {
int myVar = 5;
double x = 3.14;
auto str = "Hello"
"World";
}
or
void foo() {
int myVar = 5;
double x = 3.14;
auto str = "Hello"
"World";
}
Differential Revision: https://reviews.llvm.org/D98214
Adds more information about automated diagnostic reporting for statement
attributes and adds a bit more documentation about statement attributes
in general.
If the sizes of both memory locations are unknown, we can only
perform a check on the underlying objects. There's no point in
going through GEP decomposition in this case.
This patch adds a new isIntOrFPConstant helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.
There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D99428
`get_or_create_type_array` was used on a non-type MDNode.
Add interface for `get_or_create_array` and use that instead.
Differential Revision: https://reviews.llvm.org/D99450
We should have a test verifying / \ for Windows but have such a long
test specifically for Linux cross compilation suffer from Windows \
is too troublesome.
We have a special pattern for
(mul (and X, 0xffffffff), (and Y, 0xffffffff)), to optimize the
ANDs to shift. But if a sext_inreg coms first, we'll form a MULW
and limit the effectiveness of the special match. So this patch
adds a larger pattern to suppress the MULW formation by emitting
a sext.w and then the same output we use for the
(mul (and X, 0xffffffff), (and Y, 0xffffffff)). This should all
get CSEd.
This is the issue I was trying to fix with D99029, but that affected
many more tests.
The current linear expression decomposition handles zext/sext by
decomposing the casted operand, and then checking NUW/NSW flags
to determine whether the extension can be distributed. This has
some disadvantages:
First, it is not possible to perform a partial decomposition. If
we have zext((x + C1) +<nuw> C2) then we will fail to decompose
the expression entirely, even though it would be safe and
profitable to decompose it to zext(x + C1) +<nuw> zext(C2)
Second, we may end up performing unnecessary decompositions,
which will later be discarded because they lack nowrap flags
necessary for extensions.
Third, correctness of the code is not entirely obvious: At a high
level, we encounter zext(x -<nuw> C) in the form of a zext on the
linear expression x + (-C) with nuw flag set. Notably, this case
must be treated as zext(x) + -zext(C) rather than zext(x) + zext(-C).
The code handles this correctly by speculatively zexting constants
to the final bitwidth, and performing additional fixup if the
actual extension turns out to be an sext. This was not immediately
obvious to me.
This patch inverts the approach: An ExtendedValue represents a
zext(sext(V)), and linear expression decomposition will try to
decompose V further, either by absorbing another sext/zext into the
ExtendedValue, or by distributing zext(sext(x op C)) over a binary
operator with appropriate nsw/nuw flags. At each step we can
determine whether distribution is legal and abort with a partial
decomposition if not. We also know which extensions we need to
apply to constants, and don't need to speculate or fixup.
moves tests into directories matching their stable names so that the
tests can reflect the concept name
Differential Revision: https://reviews.llvm.org/D99104
Currently we want to allow calling non-const methods even when only a
shared lock is held, because -Wthread-safety-reference is already quite
sensitive and not all code is const-correct. Even if it is, this might
require users to add std::as_const around the implicit object argument.
See D52395 for a discussion.
Fixes PR46963.
Use the CMake 3.13 features of CMakeConfigPackageHelpers to generate
LLVMConfigVersion.cmake with proper architecture detection, major+minor
version matching, etc.
Differential Revision: https://reviews.llvm.org/D99451
Remove VBROADCAST/MOVDDUP/splat-shuffle handling from foldShuffleOfHorizOp
This can all be handled by canonicalizeShuffleMaskWithHorizOp along as we check that the HADD/SUB are only used once (to prevent infinite loops on slow-horizop targets which will try to reuse the nodes again followed by a post-hop shuffle).
For example,
<https://lab.llvm.org/buildbot/#/builders/132/builds/3929>
has this diagnostic:
```
/opt/gcc/9.3.0/snos/include/g++/bits/stl_tree.h:780:8: error: static assertion failed: comparison object must be invocable as const
780 | is_invocable_v<const _Compare&, const _Key&, const _Key&>,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
In input dump annotations, `check:2'1` indicates diagnostic 1 for the
`CHECK` directive on check file line 2. Without this patch,
`-dump-input` computes the diagnostic index with the assumption that
FileCheck *consecutively* produces all diagnostics for the same
pattern. Already, that can be a false assumption, as in the examples
below. Moreover, it seems like a brittle assumption as FileCheck
evolves. Finally, it actually complicates the implementation even if
it makes it slightly more efficient.
This patch avoids that assumption. Examples below show results after
applying this patch. Before applying this patch, `'N` is omitted
throughout these examples because the implementation doesn't notice
there's more than one diagnostic per pattern.
First, `CHECK-LABEL` violates the assumption because `CHECK-LABEL`
tries to match twice, and other directives can match in between:
```
$ cat check
CHECK: foobar
CHECK-LABEL: foobar
$ FileCheck -vv check < input |& tail -8
<<<<<<
1: text
2: foobar
label:2'0 ^~~~~~
check:1 ^~~~~~
label:2'1 X error: no match found
3: text
>>>>>>
```
Second, `--implicit-check-not` is obviously processed many times among
other directives:
```
$ cat check
CHECK: foo
CHECK: foo
$ FileCheck -vv -dump-input=always -implicit-check-not=foo \
check < input |& tail -16
<<<<<<
1: text
not:imp1'0 X~~~~
2: foo
check:1 ^~~
not:imp1'1 X
3: text
not:imp1'1 ~~~~~
4: foo
check:2 ^~~
not:imp1'2 X
5: text
not:imp1'2 ~~~~~
6:
eof:2 ^
>>>>>>
```
Reviewed By: thopre, jhenderson
Differential Revision: https://reviews.llvm.org/D97813
While explicit sext instructions were handled correctly, the
implicit sext that occurs if the offset is smaller than the
pointer size blindly assumed that sext(X * Scale + Offset) is the
same as sext(X) * Scale + Offset, which is obviously not correct.
Fix this by extracting the code that handles linear expression
extension and reusing it for the implicit sext as well.
A number of variables need to be correctly initialized on entry
to GetLinearExpression() for the implementation to behave reasonably.
The fact that SExtBits can currenlty be non-zero on entry is a bug,
as demonstrated by the added test: For implicit sexts by the GEP,
we do currently skip legality checks.
Currently, we'd produce an incorrect decomposition, because we
already recursively called GetLinearExpression(), so the Scale=1,
Offset=0 will not necessarily be relative to the shl itself.
Now, this doesn't actually matter for functional correctness,
because such a shift is poison anyway, so its okay to return
an incorrect decomposition. It's still unnecessarily confusing
though, and we can easily avoid this by checking the bitwidth
earlier.
Nowrap flags between mul and shl differ in that mul nsw allows
multiplication of 1 * INT_MIN, while shl nsw does not. This means
that it is always fine to transfer shl nowrap flags to muls, but
not necessarily the other way around. In this case the NUW/NSW
results refer to mul/add operations, so it's fine to retain the
flags from the shl.
See if the combined shuffle mask is equivalent to an identity shuffle, typically this is due to repeated LHS/RHS ops in horiz-ops, but isTargetShuffleEquivalent might see other patterns as well.
This is another small step towards getting rid of foldShuffleOfHorizOp and relying on canonicalizeShuffleMaskWithHorizOp and generic shuffle combining.
This is a small patch to make FoldBranchToCommonDest poison-safe by default.
After fc3f0c9c, only two syntactic changes are needed to fix unit tests.
This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99452
Provide a registration mechanism for Linalg dialect-specific passes in C
API and Python bindings. These are being built into the dialect library
but exposed in separate headers (C) or modules (Python).
Differential Revision: https://reviews.llvm.org/D99431