There are no users that pass in LazyValueInfo, so we can simplify the
function a bit.
Reviewers: brzycki, asbirlea, davide
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D68297
llvm-svn: 373488
Two small changes in llvm::removeUnreachableBlocks() to avoid unnecessary (re-)computation.
First, replace the use of count() with find(), which has better time complexity.
Second, because we have already computed the set of dead blocks, replace the second loop over all basic blocks to a loop only over the already computed dead blocks. This simplifies the loop and avoids recomputation.
Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>
Reviewers: efriedma, spatel, fhahn, xbolva00
Reviewed By: fhahn, xbolva00
Differential Revision: https://reviews.llvm.org/D68191
llvm-svn: 373429
Expand the simplification of special cases of `log()` to include `log2()`
and `log10()` as well as intrinsics and more types.
Differential revision: https://reviews.llvm.org/D67199
llvm-svn: 373261
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 373099
The static analyzer is warning about a potential null dereference, but we should be able to use cast<FunctionSummary> directly and if not assert will fire for us.
llvm-svn: 373097
Summary:
FlattenCFG merges two 'if' basicblocks by inserting one basicblock
to another basicblock. The inserted basicblock can have a successor
that contains a PHI node whoes incoming basicblock is the inserted
basicblock. Since the existing code does not handle it, it becomes
a badref.
if (cond1)
statement
if (cond2)
statement
successor - contains PHI node whose predecessor is cond2
-->
if (cond1 || cond2)
statement
(BB for cond2 was deleted)
successor - contains PHI node whose predecessor is cond2 --> bad ref!
Author: Jaebaek Seo
Reviewers: asbirlea, kuhar, tstellar, chandlerc, davide, dexonsmith
Reviewed By: kuhar
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68032
llvm-svn: 372989
The static analyzer is warning about a potential null dereferences, but we should be able to use cast<BranchInst> directly and if not assert will fire for us.
llvm-svn: 372977
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LandingPadInst> directly and if not assert will fire for us.
llvm-svn: 372727
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us.
llvm-svn: 372726
Summary:
Motivation:
- If we can fold it to strdup, we should (strndup does more things than strdup).
- Annotation mechanism. (Works for strdup well).
strdup and strndup are part of C 20 (currently posix fns), so we should optimize them.
Reviewers: efriedma, jdoerfert
Reviewed By: jdoerfert
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67679
llvm-svn: 372636
MSAN bot complains that there is use-of-uninitialized-value
of this FreeStores later in IsWorthwhile().
Perhaps FreeStores needs to be stored in a vector?
llvm-svn: 372262
Summary:
As it can be see in the changed test, while `div` is really costly,
we were speculating it. This does not seem correct.
Also, the old code would run for every single insturuction in BB,
instead of eagerly bailing out as soon as there are too many instructions.
This function still has a problem that `PHINodeFoldingThreshold` is
per-basic-block, while it should be for all the basic blocks.
Reviewers: efriedma, craig.topper, dmgreen, jmolloy
Reviewed By: jmolloy
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67315
llvm-svn: 372255
Summary:
Previously, if the threshold was 2, we were willing to speculatively
execute 2 cheap instructions in both basic blocks (thus we were willing
to speculatively execute cost = 4), but weren't willing to speculate
when one BB had 3 instructions and other one had no instructions,
even thought that would have total cost of 3.
This looks inconsistent to me.
I don't think `cmov`-like instructions will start executing
until both of it's inputs are available: https://godbolt.org/z/zgHePf
So i don't see why the existing behavior is the correct one.
Also, let's add it's own `cl::opt` for this threshold,
with default=4, so it is not stricter than the previous threshold:
will allow to fold when there are 2 BB's each with cost=2.
And since the logic has changed, it will also allow to fold when
one BB has cost=3 and other cost=1, or there is only one BB with cost=4.
This is an alternative solution to D65148:
This fix is mainly motivated by `signbit-like-value-extension.ll` test.
That pattern comes up in JPEG decoding, see e.g.
`Figure F.12 – Extending the sign bit of a decoded value in V`
of `ITU T.81` (JPEG specification).
That branch is not predictable, and it is within the innermost loop,
so the fact that that pattern ends up being stuck with a branch
instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial.
This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240)
| metric | old | new | delta | % |
| x86-mi-counting.NumMachineFunctions | 37720 | 37721 | 1 | 0.00% |
| x86-mi-counting.NumMachineBasicBlocks | 773545 | 771181 | -2364 | -0.31% |
| x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% |
| x86-mi-counting.NumUncondBR | 135770 | 135543 | -227 | -0.17% |
| x86-mi-counting.NumCondBR | 423753 | 422187 | -1566 | -0.37% |
| x86-mi-counting.NumCMOV | 24815 | 25731 | 916 | 3.69% |
| x86-mi-counting.NumVecBlend | 17 | 17 | 0 | 0.00% |
We significantly decrease basic block count, notably decrease instruction count,
significantly decrease branch count and very significantly increase `cmov` count.
Performance-wise, unsurprisingly, this has great effect on
target RawSpeed benchmark. I'm seeing 5 **major** improvements:
```
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681
```
That being said, it's always best-effort, so there will likely
be cases where this worsens things.
Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc
Reviewed By: jmolloy
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67318
llvm-svn: 372009
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300
We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.
We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.
A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.
In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect
Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324
llvm-svn: 371635
This reverts commit r371584. It introduced a dependency from compiler-rt
to llvm/include/ADT, which is problematic for multiple reasons.
One is that it is a novel dependency edge, which needs cross-compliation
machinery for llvm/include/ADT (yes, it is true that right now
compiler-rt included only header-only libraries, however, if we allow
compiler-rt to depend on anything from ADT, other libraries will
eventually get used).
Secondly, depending on ADT from compiler-rt exposes ADT symbols from
compiler-rt, which would cause ODR violations when Clang is built with
the profile library.
llvm-svn: 371598
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300
We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.
We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.
A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.
In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect
Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324
llvm-svn: 371584
Reverts the change in r371084, but keeps the test.
After r371565, debuginfo cannot be modelled in MemorySSA, even with a
non-standard AA pipeline.
llvm-svn: 371573
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300
We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.
We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.
A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.
In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect
Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324
llvm-svn: 371484
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
Add the new method `LibCallSimplifier::substituteInParent()` that calls
`LibCallSimplifier::replaceAllUsesWith()' and
`LibCallSimplifier::eraseFromParent()` back to back, simplifying the
resulting code.
llvm-svn: 371264
Summary:
Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain
which can check on undef. Msan by design reports branches on uninitialized
memory and undefs, so we have false report here.
In general msan does not like when we convert
```
// If at least one of them is true we can MSAN is ok if another is undefs
if (a || b)
return;
```
into
```
// If 'a' is undef MSAN will complain even if 'b' is true
if (a)
return;
if (b)
return;
```
Example
Before optimization we had something like this:
```
while (true) {
bool maybe_undef = doStuff();
while (true) {
char c = getChar();
if (c != 10 && c != 13)
continue
break;
}
// we know that c == 10 || c == 13 if we get here,
// so msan know that branch is not affected by maybe_undef
if (maybe_undef || c == 10 || c == 13)
continue;
return;
}
```
SimplifyBranchOnICmpChain will convert that into
```
while (true) {
bool maybe_undef = doStuff();
while (true) {
char c = getChar();
if (c != 10 && c != 13)
continue;
break;
}
// however msan will complain here:
if (maybe_undef)
continue;
// we know that c == 10 || c == 13, so either way we will get continue
switch(c) {
case 10: continue;
case 13: continue;
}
return;
}
```
Reviewers: eugenis, efriedma
Reviewed By: eugenis, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67205
llvm-svn: 371138
SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.
Differential Revision: https://reviews.llvm.org/D64595
llvm-svn: 370906
Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.
GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96
Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert
Reviewed By: efriedma
Subscribers: hjl.tools, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65737
llvm-svn: 370593
We can also apply the earlier updates to the lazy DTU, instead of
applying them directly.
Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer
Reviewed By: brzycki, asbirlea, SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D66918
llvm-svn: 370391
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.
This patch enables the use of MemorySSA in the loop pass manager.
Passes that currently use MemorySSA:
- EarlyCSE
Passes that use MemorySSA after this patch:
- EarlyCSE
- LICM
- SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
- LoopInstSimplify
- LoopSimplifyCFG
- LoopUnswitch
- LoopRotate
- LoopSimplify
- LCSSA
Loop passes that do *not* update MemorySSA:
- IndVarSimplify
- LoopDelete
- LoopIdiom
- LoopSink
- LoopUnroll
- LoopInterchange
- LoopUnrollAndJam
- LoopVectorize
- LoopReroll
- IRCE
Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry
Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58311
llvm-svn: 370384
Summary:
- Similar to the workaround in fix of PR30188, skip sinking common
lifetime markers of `alloca`. They are mostly left there after
inlining functions in branches.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66950
llvm-svn: 370376
Summary:
As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow'
and got rid of potential for division-by-zero, the control flow remains, we still have that branch.
We have this condition:
```
// Don't fold i1 branches on PHIs which contain binary operators
// These can often be turned into switches and other things.
if (PN->getType()->isIntegerTy(1) &&
(isa<BinaryOperator>(PN->getIncomingValue(0)) ||
isa<BinaryOperator>(PN->getIncomingValue(1)) ||
isa<BinaryOperator>(IfCond)))
return false;
```
which was added back in rL121764 to help with `select` formation i think?
That check prevents us to flatten the CFG here, even though we know
we no longer need that guard and will be able to drop everything
but the '@llvm.umul.with.overflow' + `not`.
As it can be seen from tests, we end here because the `not` is being
sinked into the PHI's incoming values by InstCombine,
so we can't workaround this by hoisting it to after PHI.
Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`.
Reviewers: craig.topper, spatel, fhahn, nikic
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65147
llvm-svn: 370349
We do not access the DT in the loop, so we do not have to apply updates
eagerly. We can apply them lazyly and flush them after we are done
merging blocks.
As follow-up work, we might be able to use the DTU above as well,
instead of manually updating the DT.
This brings the example from PR43134 from ~100s to ~4s for a relase +
assertions build on my machine.
Reviewers: efriedma, kuhar, asbirlea, brzycki
Reviewed By: kuhar, brzycki
Differential Revision: https://reviews.llvm.org/D66911
llvm-svn: 370292
...cloning a function from a different module
Currently when a function with debug info is cloned from a different module, the
cloned function may have hanging DICompileUnits, so that the module with the
cloned function fails debug info verification.
The proposed fix inserts all DICompileUnits reachable from the cloned function
to "llvm.dbg.cu" metadata operands of the cloned function module.
Reviewed By: aprantl, efriedma
Differential Revision: https://reviews.llvm.org/D66510
Patch by Oleg Pliss (Oleg.Pliss@azul.com)
llvm-svn: 370265
Summary:
This functionality was added when Mapper::mapMetadata was recursive. It
is no longer needed after r265456, which switched it to be iterative.
Reviewers: dexonsmith, srhines
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66860
llvm-svn: 370236
Summary:
When reconstructing the CFG of the loop after unrolling,
LoopUnroll could in some cases remove the phi operands of
loop-carried values instead of preserving them, resulting
in undef phi values after loop unrolling.
When doing this reconstruction, avoid removing incoming
phi values for phis in the successor blocks if the successor
is the block we are jumping to anyway.
Patch-by: ebevhan
Reviewers: fhahn, efriedma
Reviewed By: fhahn
Subscribers: bjope, lebedev.ri, zzheng, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66334
llvm-svn: 369886
Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of
the stack on ARM32.
Differential Revision: https://reviews.llvm.org/D65019
llvm-svn: 369147
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Refactor `LibCallSimplifier::optimizeExp2()` to use the new
`emitBinaryFloatFnCall()` version that fetches the function name from TLI.
llvm-svn: 368457
GlobalAlias and GlobalIFunc ought to be treated the same by the IR
linker, so we can generalize the code to be in terms of their common
base class GlobalIndirectSymbol.
Differential Revision: https://reviews.llvm.org/D55046
llvm-svn: 368357
For some targets the LICM pass can result in sub-optimal code in some
cases where it would be better not to run the pass, but it isn't
always possible to suppress the transformations heuristically.
Where the front-end has insight into such cases it is beneficial
to attach loop metadata to disable the pass - this change adds the
llvm.licm.disable metadata to enable that.
Differential Revision: https://reviews.llvm.org/D64557
llvm-svn: 368296
When we remove instructions cached references could still be live. This
patch avoids removing invoke instructions that are replaced by calls and
instead keeps them around but in a dead block.
llvm-svn: 367933
Currently, when a GVN or CSE optimization happens,
the llvm.preserve.access.index metadata is dropped.
This caused a problem for BPF AbstructMemberOffset phase
as it relies on the metadata (debuginfo types).
This patch added proper hooks in lib/Transforms to
preserve !preserve.access.index metadata. A test
case is added to ensure metadata is preserved under CSE.
Differential Revision: https://reviews.llvm.org/D65700
llvm-svn: 367769
Summary:
Since the for loop iterates over BB's predecessors, the branch conditions found must have BB as one of the successors.
For an unconditional branch the successor must be BB, added `assert`.
For a conditional branch, one of the two successors must be BB, simplify `else if` to `else` and `assert`.
Sink common instructions outside the if/else block.
Reviewers: sanjoy.google
Subscribers: jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65596
llvm-svn: 367699
Current peeling cost model can decide to peel off not all iterations
but only some of them to eliminate conditions on phi. At the same time
if any peeling happens the door for further unroll/peel optimizations on that
loop closes because the part of the code thinks that if peeling happened
it is profile based peeling and all iterations are peeled off.
To resolve this inconsistency the patch provides the flag which states whether
the full peeling basing on profile is enabled or not and peeling cost model
is able to modify this field like it does not PeelCount.
In a separate patch I will introduce an option to allow/disallow peeling basing
on profile.
To avoid infinite loop peeling the patch tracks the total number of peeled iteration
through llvm.loop.peeled.count loop metadata.
Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D64972
llvm-svn: 367647
Summary:
DominatorTree is invalid after SimplifyCFG because of a missed `Changed = true` when simplifying a branch condition and removing an edge.
Resolves PR42272.
Reviewers: zhizhouy, manojgupta
Subscribers: jlebar, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65490
llvm-svn: 367596
Summary:
LoopSimplify is preserved in the legacy pass manager, but not in the new pass manager.
Update LoopSimplify to preserve MemorySSA conditionally when the analysis is available (same behavior as the legacy pass manager).
Reviewers: chandlerc
Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65418
llvm-svn: 367594
To avoid duplicates in loop metadata, if the string to add is
already there, just update the value.
Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D65265
llvm-svn: 367087
Just move the utility function to LoopUtils.cpp to re-use it in loop peeling.
Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, asbirlea, llvm-commits
Differential Revision: https://reviews.llvm.org/D65264
llvm-svn: 367085
Currently there are a few pointer comparisons in ValueDFS_Compare, which
can cause non-deterministic ordering when materializing values. There
are 2 cases this patch fixes:
1. Order defs before uses used to compare pointers, which guarantees
defs before uses, but causes non-deterministic ordering between 2
uses or 2 defs, depending on the allocation order. By converting the
pointers to booleans, we can circumvent that problem.
2. comparePHIRelated was comparing the basic block pointers of edges,
which also results in a non-deterministic order and is also not
really meaningful for ordering. By ordering by their destination DFS
numbers we guarantee a deterministic order.
For the example below, we can end up with 2 different uselist orderings,
when running `opt -mem2reg -ipsccp` hundreds of times. Because the
non-determinism is caused by allocation ordering, we cannot reproduce it
with ipsccp alone.
declare i32 @hoge() local_unnamed_addr #0
define dso_local i32 @ham(i8* %arg, i8* %arg1) #0 {
bb:
%tmp = alloca i32
%tmp2 = alloca i32, align 4
br label %bb19
bb4: ; preds = %bb20
br label %bb6
bb6: ; preds = %bb4
%tmp7 = call i32 @hoge()
store i32 %tmp7, i32* %tmp
%tmp8 = load i32, i32* %tmp
%tmp9 = icmp eq i32 %tmp8, 912730082
%tmp10 = load i32, i32* %tmp
br i1 %tmp9, label %bb11, label %bb16
bb11: ; preds = %bb6
unreachable
bb13: ; preds = %bb20
br label %bb14
bb14: ; preds = %bb13
%tmp15 = load i32, i32* %tmp
br label %bb16
bb16: ; preds = %bb14, %bb6
%tmp17 = phi i32 [ %tmp10, %bb6 ], [ 0, %bb14 ]
br label %bb19
bb18: ; preds = %bb20
unreachable
bb19: ; preds = %bb16, %bb
br label %bb20
bb20: ; preds = %bb19
indirectbr i8* null, [label %bb4, label %bb13, label %bb18]
}
Reviewers: davide, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D64866
llvm-svn: 367049
We'd like to determine the idom of exit block after peeling one iteration.
Let Exit is exit block.
Let ExitingSet - is a set of predecessors of Exit block. They are exiting blocks.
Let Latch' and ExitingSet' are copies after a peeling.
We'd like to find an idom'(Exit) - idom of Exit after peeling.
It is an evident that idom'(Exit) will be the nearest common dominator of ExitingSet and ExitingSet'.
idom(Exit) is a nearest common dominator of ExitingSet.
idom(Exit)' is a nearest common dominator of ExitingSet'.
Taking into account that we have a single Latch, Latch' will dominate Header and idom(Exit).
So the idom'(Exit) is nearest common dominator of idom(Exit)' and Latch'.
All these basic blocks are in the same loop, so what we find is
(nearest common dominator of idom(Exit) and Latch)'.
Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D65292
llvm-svn: 367044
Later code in TryToSimplifyUncondBranchFromEmptyBlock() assumes that
we have cleaned up unreachable blocks, but that was not happening
with this switch transform.
llvm-svn: 367037
We do not need the SmallPtrSet to avoid adding duplicates to
OpsToRename, because we already keep a ValueInfo mapping. If we see an
op for the first time, Infos will be empty and we can also add it to
OpsToRename.
We process operands by visiting BBs depth-first and then iterate over
all instructions & users, so the order should be deterministic.
Therefore we can skip one round of sorting, which we purely needed for
guaranteeing a deterministic order when iterating over the SmallPtrSet.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D64816
llvm-svn: 367028
This is a follow up to D64971. While we need to insert the deref after
the offset, it needs to come before the remaining elements in the
original expression since the deref needs to happen before the LLVM
fragment if present.
Differential Revision: https://reviews.llvm.org/D65172
llvm-svn: 366865
[Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64162
llvm-svn: 366769
[Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366753
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366736
While debugging code that uses SafeStack, we've noticed that LLVM
produces an invalid DWARF. Concretely, in the following example:
int main(int argc, char* argv[]) {
std::string value = "";
printf("%s\n", value.c_str());
return 0;
}
DWARF would describe the value variable as being located at:
DW_OP_breg14 R14+0, DW_OP_deref, DW_OP_constu 0x20, DW_OP_minus
The assembly to get this variable is:
leaq -32(%r14), %rbx
The order of operations in the DWARF symbols is incorrect in this case.
Specifically, the deref is incorrect; this appears to be incorrectly
re-inserted in repalceOneDbgValueForAlloca.
With this change which inserts the deref after the offset instead of
before it, LLVM produces correct DWARF:
DW_OP_breg14 R14-32
Differential Revision: https://reviews.llvm.org/D64971
llvm-svn: 366726
Current algorithm to update branch weights of latch block and its copies is
based on the assumption that number of peeling iterations is approximately equal
to trip count.
However it is not correct. According to profitability check in one case we can decide to peel
in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration
can be less then estimated trip count.
This patch introduces another way to set the branch weights to peeled of branches.
Let F is a weight of the edge from latch to header.
Let E is a weight of the edge from latch to exit.
F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit.
Then, Estimated TripCount = F / E.
For I-th (counting from 0) peeled off iteration we set the the weights for
the peeled latch as (TC - I, 1). It gives us reasonable distribution,
The probability to go to exit 1/(TC-I) increases. At the same time
the estimated trip count of remaining loop reduces by I.
As a result after peeling off N iteration the weights will be
(F - N * E, E) and trip count of loop becomes
F / E - N or TC - N.
The idea is taken from the review of the patch D63918 proposed by Philip.
Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64235
llvm-svn: 366665
If the blockaddress is not destoryed, the destination block will still
be marked as having its address taken, limiting further transformations.
I think there are other places where the dead blockaddress constants are kept
around, I'll look into that as follow up.
Reviewers: craig.topper, brzycki, davide
Reviewed By: brzycki, davide
Differential Revision: https://reviews.llvm.org/D64936
llvm-svn: 366633
Add "memtag" sanitizer that detects and mitigates stack memory issues
using armv8.5 Memory Tagging Extension.
It is similar in principle to HWASan, which is a software implementation
of the same idea, but there are enough differencies to warrant a new
sanitizer type IMHO. It is also expected to have very different
performance properties.
The new sanitizer does not have a runtime library (it may grow one
later, along with a "debugging" mode). Similar to SafeStack and
StackProtector, the instrumentation pass (in a follow up change) will be
inserted in all cases, but will only affect functions marked with the
new sanitize_memtag attribute.
Reviewers: pcc, hctim, vitalybuka, ostannard
Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64169
llvm-svn: 366123
It is possible that loop exit has two predecessors in a loop body.
In this case after the peeling the iDom of the exit should be a clone of
iDom of original exit but no a clone of a block coming to this exit.
Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64618
llvm-svn: 366050
This CL enables peeling of the loop with multiple exits where
one exit should be from latch and others are basic blocks with
call to deopt.
The peeling is enabled under the flag which is false by default.
Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D63923
llvm-svn: 366048
With this patch the getLoopEstimatedTripCount function will
accept also the loops where there are more than one exit but
all exits except latch block should ends up with a call to deopt.
This side exits should not impact the estimated trip count.
Reviewers: reames, mkuper, danielcdh
Reviewed By: reames
Subscribers: fhahn, lebedev.ri, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64553
llvm-svn: 366042
Extract the code from LoopUnrollRuntime into utility function to
re-use it in D63923.
Reviewers: reames, mkuper
Reviewed By: reames
Subscribers: fhahn, hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D64548
llvm-svn: 366040
Introduce and deduce "nosync" function attribute to indicate that a function
does not synchronize with another thread in a way that other thread might free memory.
Reviewers: jdoerfert, jfb, nhaehnle, arsenm
Subscribers: wdng, hfinkel, nhaenhle, mehdi_amini, steven_wu,
dexonsmith, arsenm, uenoku, hiraditya, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D62766
llvm-svn: 365830
Summary:
The map kept in loop rotate is used for instruction remapping, in order
to simplify the clones of instructions. Thus, if an instruction can be
simplified, its simplified value is placed in the map, even when the
clone is added to the IR. MemorySSA in contrast needs to know about that
clone, so it can add an access for it.
To resolve this: keep a different map for MemorySSA.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63680
llvm-svn: 365672
An alloca which can be sunk into the extraction region may have more
than one bitcast use. Move these uses along with the alloca to prevent
use-before-def.
Testing: check-llvm, stage2 build of clang
Fixes llvm.org/PR42451.
Differential Revision: https://reviews.llvm.org/D64463
llvm-svn: 365660
Summary:
Transform
pow(C,x)
To
exp2(log2(C)*x)
if C > 0, C != inf, C != NaN (and C is not power of 2, since we have some fold for such case already).
log(C) is folded by the compiler and exp2 is much faster to compute than pow.
Reviewers: spatel, efriedma, evandro
Reviewed By: evandro
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64099
llvm-svn: 365637
This patch modifies the loop peeling transformation so that
it does not expect that there is only one loop exit from latch.
It modifies only transformation. Update of branch weights remains
only for exit from latch.
The motivation is that in follow-up patch I plan to enable loop peeling for
loops with multiple exits but only if other exits then from latch one goes to
block with call to deopt.
For now this patch is NFC.
Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames, fhahn
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D63921
llvm-svn: 365441
loop
Summary:
Do the cloning in two steps, first allocate all the new loops, then
clone the basic blocks in the same order as the original loop.
Reviewer: Meinersbur, fhahn, kbarton, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, hiraditya, llvm-commits
Tag: https://reviews.llvm.org/D64224
Differential Revision:
llvm-svn: 365366
This patch adds a function attribute, nofree, to indicate that a function does
not, directly or indirectly, call a memory-deallocation function (e.g., free,
C++'s operator delete).
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D49165
llvm-svn: 365336
It's possible that some function can load and store the same
variable using the same constant expression:
store %Derived* @foo, %Derived** bitcast (%Base** @bar to %Derived**)
%42 = load %Derived*, %Derived** bitcast (%Base** @bar to %Derived**)
The bitcast expression was mistakenly cached while processing loads,
and never examined later when processing store. This caused @bar to
be mistakenly treated as read-only variable. See load-store-caching.ll.
llvm-svn: 365188
This reverts r365040 (git commit 5cacb91475)
Speculatively reverting, since this appears to have broken check-lld on
Linux. Partial analysis in https://crbug.com/981168.
llvm-svn: 365097
This transform came up in D62414, but we should deal with it first.
We have LLVM intrinsics that correspond exactly to libm calls (unlike
most libm calls, these libm calls never set errno).
This holds without any fast-math-flags, so we should always canonicalize
to those intrinsics directly for better optimization.
Currently, we convert to fcmp+select only when we have FMF (nnan) because
fcmp+select does not preserve the semantics of the call in the general case.
Differential Revision: https://reviews.llvm.org/D63214
llvm-svn: 364714
This patch introduces a new function attribute, willreturn, to indicate
that a call of this function will either exhibit undefined behavior or
comes back and continues execution at a point in the existing call stack
that includes the current invocation.
This attribute guarantees that the function does not have any endless
loops, endless recursion, or terminating functions like abort or exit.
Patch by Hideto Ueno (@uenoku)
Reviewers: jdoerfert
Subscribers: mehdi_amini, hiraditya, steven_wu, dexonsmith, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62801
llvm-svn: 364555
FunctionComparator attempts to produce a stable comparison of two Function
instances by looking at all available properties. Since ByVal attributes now
contain a Type pointer, they are not trivially ordered and FunctionComparator
should use its own Type comparison logic to sort them.
llvm-svn: 364523
This patch generalizes the UnrollLoop utility to support loops that exit
from the header instead of the latch. Usually, LoopRotate would take care
of must of those cases, but in some cases (e.g. -Oz), LoopRotate does
not kick in.
Codesize impact looks relatively neutral on ARM64 with -Oz + LTO.
Program master patch diff
External/S.../CFP2006/447.dealII/447.dealII 629060.00 627676.00 -0.2%
External/SPEC/CINT2000/176.gcc/176.gcc 1245916.00 1244932.00 -0.1%
MultiSourc...Prolangs-C/simulator/simulator 86100.00 86156.00 0.1%
MultiSourc...arks/Rodinia/backprop/backprop 66212.00 66252.00 0.1%
MultiSourc...chmarks/Prolangs-C++/life/life 67276.00 67312.00 0.1%
MultiSourc...s/Prolangs-C/compiler/compiler 69824.00 69788.00 -0.1%
MultiSourc...Prolangs-C/assembler/assembler 86672.00 86696.00 0.0%
Reviewers: efriedma, vsk, paquette
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D61962
llvm-svn: 364398
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
I have set up a separate review D61933 for a fix which is required for this patch.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse
Reviewed By: hfinkel, jmorse
Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
> llvm-svn: 363046
llvm-svn: 363786
Using the new SwitchInstProfUpdateWrapper this patch
simplifies 3 places of prof branch_weights handling.
Differential Revision: https://reviews.llvm.org/D62123
llvm-svn: 363652
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.
Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338
llvm-svn: 363566
Third time's the charm.
This was reverted in r363220 due to being suspected of an internal benchmark
regression and a test failure, none of which turned out to be caused by this.
llvm-svn: 363529
SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata
if unreachable switch cases are removed. This patch fixes this bug by making use
of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122).
A new test is created.
Differential Revision: https://reviews.llvm.org/D62186
llvm-svn: 363527
If we can detect that saturating math that depends on an IV cannot
overflow, replace it with simple math. This is similar to the CVP
optimization from D62703, just based on a different underlying
analysis (SCEV vs LVI) that catches different cases.
Differential Revision: https://reviews.llvm.org/D62792
llvm-svn: 363489
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
Differential Revision: https://reviews.llvm.org/D61151
llvm-svn: 363422
This reverts 363226 and 363227, both NFC intended
I swear I fixed the test case that is failing, and ran
the tests, but I will look into it again.
llvm-svn: 363229
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
Differential Revision: https://reviews.llvm.org/D61151
llvm-svn: 363227
We have observed some failures with internal builds with this revision.
- Performance regressions:
- llvm's SingleSource/Misc evalloop shows performance regressions (although these may be red herrings).
- Benchmarks for Abseil's SwissTable.
- Correctness:
- Failures for particular libicu tests when building the Google AppEngine SDK (for PHP).
hwennborg has already been notified, and is aware of reproducer failures.
llvm-svn: 363220
This changes the standalone pass only. Arguably the utility class
itself should assert there are no convergent calls. However, a target
pass with additional context may still be able to version a loop if
all of the dynamic conditions are sufficiently uniform.
llvm-svn: 363165
We were only matching RHS being a loop invariant value, not the inverse. Since there's nothing which appears to canonicalize loop invariant values to RHS, this means we missed cases.
Differential Revision: https://reviews.llvm.org/D63112
llvm-svn: 363108
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
I have set up a separate review D61933 for a fix which is required for this patch.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse
Reviewed By: hfinkel, jmorse
Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
llvm-svn: 363046