This broke the Chromium build. Consider the following code:
float ScaleSumSamples_C(const float* src, float* dst, float scale, int width) {
float fsum = 0.f;
int i;
#if defined(__clang__)
#pragma clang loop vectorize_width(4)
#endif
for (i = 0; i < width; ++i) {
float v = *src++;
fsum += v * v;
*dst++ = v * scale;
}
return fsum;
}
Compiling at -Oz, Clang now warns:
$ clang++ -target x86_64 -Oz -c /tmp/a.cc
/tmp/a.cc:1:7: warning: loop not vectorized: the optimizer was unable to
perform the requested transformation; the transformation might be disabled or
specified as part of an unsupported transformation ordering
[-Wpass-failed=transform-warning]
this suggests it's not actually enabling vectorization hard enough.
At -Os it asserts instead:
$ build.release/bin/clang++ -target x86_64 -Os -c /tmp/a.cc
clang-10: /work/llvm.monorepo/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2734: void
llvm::InnerLoopVectorizer::emitMemRuntimeChecks(llvm::Loop*, llvm::BasicBlock*): Assertion `
!BB->getParent()->hasOptSize() && "Cannot emit memory checks when optimizing for size"' failed.
Of course neither of these are what the developer expected from the pragma.
> Specifying the vectorization width was supposed to implicitly enable
> vectorization, except that it wasn't really doing this. It was only
> setting the vectorize.width metadata, but not vectorize.enable.
>
> This should fix PR27643.
>
> Differential Revision: https://reviews.llvm.org/D66290
llvm-svn: 372225
Specifying the vectorization width was supposed to implicitly enable
vectorization, except that it wasn't really doing this. It was only
setting the vectorize.width metadata, but not vectorize.enable.
This should fix PR27643.
Differential Revision: https://reviews.llvm.org/D66290
llvm-svn: 372082
New pragma "vectorize_predicate(enable)" now implies "vectorize(enable)",
and it is ignored when vectorization is disabled with e.g.
"vectorize(disable) vectorize_predicate(enable)".
Differential Revision: https://reviews.llvm.org/D65776
llvm-svn: 368970
This adds a new vectorize predication loop hint:
#pragma clang loop vectorize_predicate(enable)
that can be used to indicate to the vectoriser that all (load/store)
instructions should be predicated (masked). This allows, for example, folding
of the remainder loop into the main loop.
This patch will be followed up with D64916 and D65197. The former is a
refactoring in the loopvectorizer and the groundwork to make tail loop folding
a more general concept, and in the latter the actual tail loop folding
transformation will be implemented.
Differential Revision: https://reviews.llvm.org/D64744
llvm-svn: 366989