The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.
The patch implements this relax of restriction.
The test profile/Linux/counter_promo_nest.c in compiler-rt project
is updated to meet this change.
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324572
Summary:
Loops with inequality comparers, such as:
// unsigned bound
for (unsigned i = 1; i < bound; ++i) {...}
have getSmallConstantMaxTripCount report a large maximum static
trip count - in this case, 0xffff fffe. However, profiling info
may show that the trip count is much smaller, and thus
counter-recommend vectorization.
This change:
- flips loop-vectorize-with-block-frequency on by default.
- validates profiled loop frequency data supports vectorization,
when static info appears to not counter-recommend it. Absence
of profile data means we rely on static data, just as we've
done so far.
Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal
Reviewed By: davidxl
Subscribers: bkramer, llvm-commits
Differential Revision: https://reviews.llvm.org/D42946
llvm-svn: 324543
The failures happened because of assert which was overconfident about
SCEV's proving capabilities and is generally not valid.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324473
Sometimes `isLoopEntryGuardedByCond` cannot prove predicate `a > b` directly.
But it is a common situation when `a >= b` is known from ranges and `a != b` is
known from a dominating condition. Thia patch teaches SCEV to sum these facts
together and prove strict comparison via non-strict one.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324453
Before r324429 we essentially didn't have a verification of LCSSA, so
no wonder that it has been broken: currently loop-sink breaks it (the
attached test illustrates the failure).
It was detected during a stage2 RA build, so to unbreak it I'm disabling
the check for now.
llvm-svn: 324445
Generalize existing constant matching to work with non-uniform constant vectors as well.
Differential Revision: https://reviews.llvm.org/D42818
llvm-svn: 324369
- Fix condition for detecting that a complex basic block was the first in
the chain.
- Add tests.
This was caught by buildbots when submitting rL324319.
llvm-svn: 324341
This patch moves ThinLTOBitcodeWriter/module-asm.ll test case into x86 directory to avoid a test failure when x86 backend is not enabled.
llvm-svn: 324316
If the inline asm provides the definition of a symbol, this can result
in duplicate symbol errors.
Differential Revision: https://reviews.llvm.org/D42944
llvm-svn: 324313
Summary:
Removing the dropped symbols will prevent indirect call promotion in the
ThinLTO Backend from adding a new reference to a symbol, which can
result in linker unsats. This can happen when we compile with a sample
profile collected from one binary by used for another, which may have
profiled targets that aren't used in the new binary.
Note that until dropDeadSymbols handles variables and aliases (in
progress), we may not be able to remove the declaration and can still
have an issue.
Reviewers: grimar, davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D42816
llvm-svn: 324299
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
As PR36225 shows, we definitely don't want to enable the
canEvaluate* logic with phis.
There's still a question of whether we should just revert
r324014 completely because it exposes a compile-time sinkhole
(although that problem might exist independently).
llvm-svn: 324266
This is the instcombine part of unsigned saturation canonicalization.
Backend patches already commited:
https://reviews.llvm.org/D37510https://reviews.llvm.org/D37534
It converts unsigned saturated subtraction patterns to forms recognized
by the backend:
(a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)
(a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)
((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
Patch by Yulia Koval!
Differential Revision: https://reviews.llvm.org/D41480
llvm-svn: 324255
This broke the Chromium build; see PR36238.
> This patch is an enhancement to propagate dbg.value information when
> Phis are created on behalf of LCSSA. I noticed a case where a value
> carried across a loop was reported as <optimized out>.
>
> Specifically this case:
>
> int bar(int x, int y) {
> return x + y;
> }
>
> int foo(int size) {
> int val = 0;
> for (int i = 0; i < size; ++i) {
> val = bar(val, i); // Both val and i are correct
> }
> return val; // <optimized out>
> }
>
> In the above case, after all of the interesting computation completes
> our value is reported as "optimized out." This change will add a
> dbg.value to correct this.
>
> This patch also moves the dbg.value insertion routine from
> LoopRotation.cpp into Local.cpp, so that we can share it in both places
> (LoopRotation and LCSSA).
>
> Patch by Matt Davis!
>
> Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 324247
The patch causes the failure of the test
compiler-rt/test/profile/Linux/counter_promo_nest.c
To unblock buildbot, revert the patch while investigation is in progress.
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324214
The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.
The patch implements this relax of restriction.
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324208
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
To bugs additionally fixed:
assert is moved after the check whether loop is not a nullptr.
Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow
is guarded by isAvailableAtLoopEntry.
Reviewers: sanjoy, mkazantsev, anna, dorit, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42417
llvm-svn: 324204
When using the partial inliner, we might have attributes for forwarded
varargs, but the CodeExtractor does not create an empty argument
attribute set for regular arguments in that case, because it does not know
of the additional arguments. So in case we have attributes for VarArgs, we
also have to make sure we create (empty) attributes for all regular arguments.
This fixes PR36210.
llvm-svn: 324197
The type-shrinking logic in reduction detection, although narrow in scope, is
also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies
the approach to rely on the demanded bits and value tracking analyses, if
available. We currently perform type-shrinking separately for reductions and
other instructions in the loop. Long-term, we should probably think about
computing minimal bit widths in a more complete way for the loops we want to
vectorize.
PR35734
Differential Revision: https://reviews.llvm.org/D42309
llvm-svn: 324195
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 324174
Without extra instructions and uses, swapMayExposeCSEOpportunities() would change
the icmp (as seen in the check lines), so we were not actually testing patterns
that should be handled by D41480.
llvm-svn: 324143
This is the enhancement suggested in D42536 to fix a shortcoming in
regular InstCombine's canEvaluate* functionality.
When we have multiple uses of a value, but they're all in one instruction, we can
allow that expression to be narrowed or widened for the same cost as a single-use
value.
AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified
away if the operands are the same value; add becomes shl; shifts with a variable shift
amount aren't handled.
Differential Revision: https://reviews.llvm.org/D42739
llvm-svn: 324014
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 323951
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.
For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.
Patch by: Bevin Hansson
Reviewers: atrick, qcolombet, sanjoy
Reviewed By: qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42103
llvm-svn: 323946
This covers the case where TruncInst leaf node is a constant expression.
See PR36121 for more details.
Differential Revision: https://reviews.llvm.org/D42622
llvm-svn: 323926
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
CodeGenPrepare pass to be more aggressive in improving the source and destination alignments
of memcpy/memmove/memset by exploiting our new ability to record independent alignments
for each argument.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 323891
Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis.
See PR36134 for more details.
Differential Revision: https://reviews.llvm.org/D42683
llvm-svn: 323862
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.
Differential Revision: https://reviews.llvm.org/D42526
llvm-svn: 323797
When removing return value Dead Argument Elimination pass clobbers first
llvm.dbg.value’s argument for live arguments of that function by replacing
it with nullptr. In the next pass it will be deleted, so debug location
about those arguments are lost. This change fixes it.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D42541
llvm-svn: 323784