As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
This fixes the concern in single element store scalarization that the
alignment of new store may be larger than it should be. It calculates
the largest alignment if index is constant, and a safe one if not.
Reviewed By: lebedev.ri, spatel
Differential Revision: https://reviews.llvm.org/D103419
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.
The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D102515
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff55.
This patch updates the code to freeze the index, unless it is proven to
not be poison.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D103378
We can only scalarize memory accesses if we know the index is valid.
This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D102476
By llvm-mca analysis, Haswell/Broadwell has a non-uniform vector shift recip-throughput cost of the AVX2 targets at 2 for both 128 and 256-bit vectors - XOP capable targets have better 128-bit vector shifts so improve the fallback in those cases.
The input IR for @load_extract_idx_var_i64_known_valid_by_assume
and @load_extract_idx_var_i64_not_known_valid_by_assume_after_load
has been swapped.
This patch fixes the test so that @load_extract_idx_var_i64_known_valid_by_assume
has the assume before the load and the other test has it after.
This reverts commit 94d54155e2.
This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.
At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.
This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4
This should complement D98240.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D100273
This was reverted to mitigate mitigate miscompiles caused by
the logical and/or to bitwise and/or fold. Reapply it now that
the underlying issue has been fixed by D101191.
-----
This patch folds more operations to poison.
Alive2 proof: https://alive2.llvm.org/ce/z/mxcb9G (it does not contain tests about div/rem because they fold to poison when raising UB)
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D92270
Vector single element update optimization is landed in 2db4979. But the
scope needs restriction. This patch restricts the index to inbounds and
vector must be fixed sized. In future, we may use value tracking to
relax constant restrictions.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D102146
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93793
This commit copies existing tests at llvm/Transforms containing
'shufflevector X, undef' and replaces them with 'shufflevector X, poison'.
The new copied tests have *-inseltpoison.ll suffix at its file name
(as db7a2f347f did)
See https://reviews.llvm.org/D93793
Test files listed using
grep -R -E "^[^;]*shufflevector <.*> .*, <.*> undef" | cut -d":" -f1 | uniq
Test files copied & updated using
file_org=llvm/test/Transforms/$1
if [[ "$file_org" = *-inseltpoison.ll ]]; then
file=$file_org
else
file=${file_org%.ll}-inseltpoison.ll
if [ ! -f $file ]; then
cp $file_org $file
fi
fi
sed -i -E 's/^([^;]*)shufflevector <(.*)> (.*), <(.*)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
echo "$file : should be manually updated"
# The test is manually updated
exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
This commit copies existing tests at llvm/Transforms and replaces
'insertelement undef' in those files with 'insertelement poison'.
(see https://reviews.llvm.org/D93586)
Tests listed using this script:
grep -R -E '^[^;]*insertelement <.*> undef,' . | cut -d":" -f1 | uniq |
wc -l
Tests updated:
file_org=llvm/test/Transforms/$1
file=${file_org%.ll}-inseltpoison.ll
cp $file_org $file
sed -i -E 's/^([^;]*)insertelement <(.*)> undef/\1insertelement <\2> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
echo "$file : should be manually updated"
# I manually updated the script
exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
This is an enhancement motivated by https://llvm.org/PR16739
(see D92858 for another).
We can look through a GEP to find a base pointer that may be
safe to use for a vector load. If so, then we shuffle (shift)
the necessary vector element over to index 0.
Alive2 proof based on 1 of the regression tests:
https://alive2.llvm.org/ce/z/yPJLkh
The vector translation is independent of endian (verify by
changing to leading 'E' in the datalayout string).
Differential Revision: https://reviews.llvm.org/D93229
Here's another minimal step suggested by D93229 / D93397 .
(I'm trying to be extra careful in these changes because
load transforms are easy to get wrong.)
We can optimistically choose the greater alignment of a
load and its pointer operand. As the test diffs show, this
can improve what would have been unaligned vector loads
into aligned loads.
When we enhance with gep offsets, we will need to adjust
the alignment calculation to include that offset.
Differential Revision: https://reviews.llvm.org/D93406
As discussed in D93229, we only need a minimal alignment constraint
when querying whether a hypothetical vector load is safe. We still
pass/use the potentially stronger alignment attribute when checking
costs and creating the new load.
There's already a test that changes with the minimum code change,
so splitting this off as a preliminary commit independent of any
gep/offset enhancements.
Differential Revision: https://reviews.llvm.org/D93397
As noted in D93229, the transform from scalar load to vector load
potentially leaks poison from the extra vector elements that are
being loaded.
We could use freeze here (and x86 codegen at least appears to be
the same either way), but we already have a shuffle in this logic
to optionally change the vector size, so let's allow that
instruction to serve both purposes.
Differential Revision: https://reviews.llvm.org/D93238
This is an enhancement to load vectorization that is motivated by
a pattern in https://llvm.org/PR16739.
Unfortunately, it's still not enough to make a difference there.
We will have to handle multi-use cases in some better way to avoid
creating multiple overlapping loads.
Differential Revision: https://reviews.llvm.org/D92858
We can not bitcast pointers across different address spaces, and VectorCombine
should be careful when it attempts to find the original source of the loaded
data.
Differential Revision: https://reviews.llvm.org/D89577
As discussed in:
https://llvm.org/PR47558
...there are several potential fixes/follow-ups visible
in the test case, but this is the quickest and safest
fix of the perf regression.
Similar to the tsan suppression in
`Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN),
the D81766 optimization should be suppressed under tsan due to potential
spurious data race reports:
struct A {
int i;
const short s; // the load cannot be vectorized because
int modify; // it overlaps with bytes being concurrently modified
long pad1, pad2;
};
// __tsan_read16 does not know that some bytes are undef and accessing is safe
Similarly, under asan, users can mark memory regions with
`__asan_poison_memory_region`. A widened load can lead to a spurious
use-after-poison error. hwasan/memtag should be similarly suppressed.
`mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so
we need to exclude memtag in `vectorizeLoadInsert`.
Note, memtag suppression can be relaxed if the load is aligned to the
its granule (usually 16), but that is out of scope of this patch.
Reviewed By: spatel, vitalybuka
Differential Revision: https://reviews.llvm.org/D87538
First, shuffle cost for scalable type is not known for scalable type;
Second, we cannot reason if the narrowed shuffle mask for scalable type
is a splat or not.
E.g., Bitcast splat vector from type <vscale x 4 x i32> to <vscale x 8 x i16>
will involve narrowing shuffle mask <vscale x 4 x i32> zeroinitializer to
<vscale x 8 x i32> with element sequence of <0, 1, 0, 1, ...>, which cannot be
reasoned if it's a valid splat or not.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86995
This is an enhancement to D81766 to allow loading the minimum target
vector type into an IR vector with a different number of elements.
In one of the motivating tests from PR16739, SLP creates <2 x float>
load ops mixed with <4 x float> insert ops, so we want to handle that
pattern in addition to potential oversized vectors created by the
vectorizers.
For now, we are assuming the insert/extract subvector with undef is
free because there is no exact corresponding TTI modeling for that.
Differential Revision: https://reviews.llvm.org/D86160
This is a fixup to commit 43bdac2906, to make sure the
address space from the original load pointer is retained in the
vector pointer.
Resolves problem with
Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
due to address space mismatch.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D85912