We already expand select and select_cc in codegenprepare, but they can
still be generated under some situations. Explicitly mark them as expand
to ensure they are not produced, leading to a failure to select the
nodes.
Differential Revision: https://reviews.llvm.org/D92373
icmp is the preferred spelling in IR because icmp analysis is
expected to be better than any other analysis. This should
lead to more follow-on folding potential.
It's difficult to say exactly what we should do in codegen to
compensate. For example on AArch64, which of these is preferred:
sub w8, w0, w1
lsr w0, w8, #31
vs:
cmp w0, w1
cset w0, lt
If there are perf regressions, then we should deal with those in
codegen on a case-by-case basis.
A possible motivating example for better optimization is shown in:
https://llvm.org/PR43198 but that will require other transforms
before anything changes there.
Alive proof:
https://rise4fun.com/Alive/o4E
Name: sign-bit splat
Pre: C1 == (width(%x) - 1)
%s = sub nsw %x, %y
%r = ashr %s, C1
=>
%c = icmp slt %x, %y
%r = sext %c
Name: sign-bit LSB
Pre: C1 == (width(%x) - 1)
%s = sub nsw %x, %y
%r = lshr %s, C1
=>
%c = icmp slt %x, %y
%r = zext %c
* Un-inline the test.
* Use expect_expr everywhere and also check all involved types.
* Clang-format the test sources.
* Explain what we're actually testing with the 'C' and 'D' templates.
* Split out the non-template-parameter-pack part of the test into its own small test.
All three use readFile() for their argument so their argument file is
already copied to the tar, but we weren't rewriting the argument to
point to the path used in the tar file.
No test because the change is trivial (several other flags in
createResponseFile() also aren't tested, likely for the same reason.)
Differential Revision: https://reviews.llvm.org/D92356
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
Added UNLIKELY hint to one-time or rarely executed branches.
This improves performance of the library on some tasking benchmarks.
Differential Revision: https://reviews.llvm.org/D92322
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
There is no correctness need for that, and since we allow live-out
uses, this could theoretically happen, because currently nothing
will move the cond to right before the branch in those tests.
But regardless, lifting that restriction even makes the transform
easier to understand.
This makes the transform happen in 81 more cases (+0.55%)
)
Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds.
The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away.
Differential Revision: https://reviews.llvm.org/D91876
The restriction on pointer-to-pointer kernel arguments has been
relaxed in OpenCL 2.0. Apply the same address space restrictions for
pointer argument types to the inner pointer types.
Differential Revision: https://reviews.llvm.org/D92091
In the following loop the dependence distance is 2 and can only be
vectorized if the vector length is no larger than this.
void foo(int *a, int *b, int N) {
#pragma clang loop vectorize(enable) vectorize_width(4)
for (int i=0; i<N; ++i) {
a[i + 2] = a[i] + b[i];
}
}
However, when specifying a VF of 4 via a loop hint this loop is
vectorized. According to [1][2], loop hints are ignored if the
optimization is not safe to apply.
This patch introduces a check to bail of vectorization if the user
specified VF is greater than the maximum feasible VF, unless explicitly
forced with '-force-vector-width=X'.
[1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
[2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
Reviewed By: sdesmalen, fhahn, Meinersbur
Differential Revision: https://reviews.llvm.org/D90687
This patch replaces the attribute `unsigned VF` in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.
Differential Revision: https://reviews.llvm.org/D91532
PreferedType were not set when parsing compound literals, hence
designated initializers were not available as code completion suggestions.
This patch sets the preferedtype to parsed type for the following initializer
list.
Fixes https://github.com/clangd/clangd/issues/142.
Differential Revision: https://reviews.llvm.org/D92370
Instruction ExtractValue wasn't handled in
LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled
as a mul which is not really accurate. Since it is free (most of the times),
this now gets a cost of 0 using getInstructionCost.
This is a follow-up of D92208, that required changing this regression test.
In a follow up I will look at InsertValue which also isn't handled yet.
Differential Revision: https://reviews.llvm.org/D92317
The PREDICATE_CAST node is used to model moves between MVE predicate
registers and gpr's, and eventually become a VMSR p0, rn. When moving to
a predicate only the bottom 16 bits of the sources register are
demanded. This adds a simple fold for that, allowing it to potentially
remove instructions like uxth.
Differential Revision: https://reviews.llvm.org/D92213
Currently when we dump sections, we dump them in the order,
which is specified in the sections header table.
With that the order in the output might not match the order in the file.
This patch starts sorting them by by file offsets when dumping.
When the order in the section header table doesn't match the order
in the file, we should emit the "SectionHeaderTable" key. This patch does it.
Differential revision: https://reviews.llvm.org/D91249
This merges `invalid-attr-section-size.test` and `invalid-attr-version.test`
into `invalid-attributes-sec.test`.
This allows to have a single place where other related test cases can be added.
Differential revision: https://reviews.llvm.org/D92316
Methods synthesized from declared properties were being added to the
method lists twice. This came from the change to list them in the
class's method list, which missed removing the place in CGObjCGNU that
added them again.
Reviewed By: lanza
Differential Revision: https://reviews.llvm.org/D91874
This introduces the overload for `reportUniqueWarning` which allows
to avoid using `createError` in many places.
Differential revision: https://reviews.llvm.org/D92371
ExecutionEngine/LLJIT do not run globals destructors in loaded dynamic libraries when destroyed, and threads managed by ThreadPool can race with program termination, and it leads to segfaults.
TODO: Re-enable threading after fixing a problem with destructors, or removing static globals from dynamic library.
Differential Revision: https://reviews.llvm.org/D92368
This makes the options API composable, allows boolean flags to imply non-boolean values and makes the code more logical (IMO).
Differential Revision: https://reviews.llvm.org/D91861
This is a part of the plan we had previously to convert all calls to
`reportUniqueWarning` and then rename it to just `reportWarning`.
I was a bit unsure about this particular change at first, because it doesn't add a
new functionality: seems it is impossible to trigger a warning duplication currently.
At the same time I find the idea of the plan mentioned very reasonable.
And with that we will be sure that `DynRegionInfo` can't report duplicate
warnings, what looks like a nice feature for possible refactorings and further tool development.
Differential revision: https://reviews.llvm.org/D92224