This reverts commit r342387 as it's showing significant performance
regressions in a number of benchmarks. Followed up with the
committer and original thread with an example and will get performance
numbers before recommitting.
llvm-svn: 343522
Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: mehdi_amini, modocache, llvm-commits
Differential Revision: https://reviews.llvm.org/D51642
llvm-svn: 343336
This patch turns LoopInterchange into a loop pass. It now only
considers top-level loops and tries to move the innermost loop to the
optimal position within the loop nest. By only looking at top-level
loops, we might miss a few opportunities the function pass would get
(e.g. if we have a loop nest of 3 loops, in the function pass
we might process loops at level 1 and 2 and move the inner most loop to
level 1, and then we process loops at levels 0, 1, 2 and interchange
again, because we now have a different inner loop). But I think it would
be better to handle such cases by picking the best inner loop from the
start and avoid re-visiting the same loops again.
The biggest advantage of it being a function pass is that it interacts
nicely with the other loop passes. Without this patch, there are some
performance regressions on AArch64 with loop interchanging enabled,
where no loops were interchanged, but we missed out on some other loop
optimizations.
It also removes the SimplifyCFG run. We are just changing branches, so
the CFG should not be more complicated, besides the additional 'unique'
preheaders this pass might create.
Reviewers: chandlerc, efriedma, mcrosier, javed.absar, xbolva00
Reviewed By: xbolva00
Differential Revision: https://reviews.llvm.org/D51702
llvm-svn: 343308
Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912
has been fixed by rL342055.
Precommit testing performed:
* Overnight runs of csmith comparing the output between programs
compiled with gvn-hoist enabled/disabled.
* Bootstrap builds of clang with UbSan/ASan configurations.
llvm-svn: 342387
This reverts rL341954.
The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been
failing with timeouts at stage2 clang/ubsan:
[3065/3073] Linking CXX executable bin/lld
command timed out: 1200 seconds without output running python
../sanitizer_buildbot/sanitizers/buildbot_selector.py,
attempting to kill
llvm-svn: 342001
Find cold blocks based on profile information (or optionally with static analysis).
Forward propagate profile information to all cold-blocks.
Outline a cold region.
Set calling conv and prof hint for the callsite of the outlined function.
Worked in collaboration with: Sebastian Pop <s.pop@samsung.com>
Differential Revision: https://reviews.llvm.org/D50658
llvm-svn: 341669
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.
if (hot_cond1) { // Likely true.
do_stg_hot1();
}
if (hot_cond2) { // Likely true.
do_stg_hot2();
}
->
if (hot_cond1 && hot_cond2) { // Hot path.
do_stg_hot1();
do_stg_hot2();
} else { // Cold path.
if (hot_cond1) {
do_stg_hot1();
}
if (hot_cond2) {
do_stg_hot2();
}
}
This speeds up some internal benchmarks up to ~30%.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D50591
llvm-svn: 341386
Rebase rL338240 since the excessive memory usage observed when using
GVNHoist with UBSan has been fixed by rL340818.
Differential Revision: https://reviews.llvm.org/D49858
llvm-svn: 340922
Summary:
Without this change, the WholeProgramDevirt pass, which requires the
TargetLibraryInfo, will construct one from the default triple.
Fixes PR38139.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49278
llvm-svn: 337750
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder Loop
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 336062
and diretory.
Also cleans up all the associated naming to be consistent and removes
the public access to the pass ID which was unused in LLVM.
Also runs clang-format over parts that changed, which generally cleans
up a bunch of formatting.
This is in preparation for doing some internal cleanups to the pass.
Differential Revision: https://reviews.llvm.org/D47352
llvm-svn: 336028
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).
See https://reviews.llvm.org/D34156 for details on the original change.
This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
llvm-svn: 335385
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.
llvm-svn: 335320
This patch adds support for generating a call graph profile from Branch Frequency Info.
The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.
After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
!llvm.module.flags = !{!0}
!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
Differential Revision: https://reviews.llvm.org/D48105
llvm-svn: 335306
loop-cleanup passes at the beginning of the loop pass pipeline, and
re-enqueue loops after even trivial unswitching.
This will allow us to much more consistently avoid simplifying code
while doing trivial unswitching. I've also added a test case that
specifically shows effective iteration using this technique.
I've unconditionally updated the new PM as that is always using the
SimpleLoopUnswitch pass, and I've made the pipeline changes for the old
PM conditional on using this new unswitch pass. I added a bunch of
comments to the loop pass pipeline in the old PM to make it more clear
what is going on when reviewing.
Hopefully this will unblock doing *partial* unswitching instead of just
full unswitching.
Differential Revision: https://reviews.llvm.org/D47408
llvm-svn: 333493
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 333358
Summary:
Follow-up to D43690, the EliminateAvailableExternally pass currently
runs under -O0 and -O2 and up. Under -O1 we would still want to drop
available_externally symbols to reduce space without inlining having
run.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D46093
llvm-svn: 330961
(notionally Scalar.h is part of libLLVMScalarOpts, so it shouldn't be
included by InstCombine which doesn't/shouldn't need to depend on
ScalarOpts)
llvm-svn: 330669
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.
Differential Revision: https://reviews.llvm.org/D38566
llvm-svn: 320749
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.
This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)
LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.
Differential Revision: https://reviews.llvm.org/D39287
llvm-svn: 318195
Summary:
In ThinLTO compilation, we exit populateModulePassManager early and
were not adding PM extension passes meant to run at the end of the
pipeline. This includes sanitizer passes. Add these passes before
the early exit.
A test will be added to projects/compiler-rt.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D39565
llvm-svn: 317714
This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide
Reviewed By: davidxl
Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D39137
llvm-svn: 317351
This is necessary because DCE is applied to full LTO modules. Without
this change, a reference from a dead ThinLTO global to a dead full
LTO global will result in an undefined reference at link time.
This problem is only observable when --gc-sections is disabled, or
when targeting COFF, as the COFF port of lld requires all symbols to
have a definition even if all references are dead (this is consistent
with link.exe).
This change also adds an EliminateAvailableExternally pass at -O0. This
is necessary to handle the situation on Windows where a non-prevailing
copy of a linkonce_odr function has an SEH filter function; any
such filters must be DCE'd because they will contain a call to the
llvm.localrecover intrinsic, passing as an argument the address of the
function that the filter belongs to, and llvm.localrecover requires
this function to be defined locally.
Fixes PR35142.
Differential Revision: https://reviews.llvm.org/D39484
llvm-svn: 317108
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.
Differential Revision: https://reviews.llvm.org/D37355
llvm-svn: 316576
This is a vestige from the GCC-3 days, which disables IPO passes
when set. I don't think anybody actually uses it as there are
several IPO passes which still run with this flag set and
nobody complained/noticed. This reduces the delta between
current and new pass manager and allows us to easily review
the difference when we decide to flip the switch (or audit
which passes should run, FWIW).
llvm-svn: 315043
The inliner performs some kind of dead code elimination as it goes,
but there are cases that are not really caught by it. We might
at some point consider teaching the inliner about them, but it
is OK for now to run GlobalOpt + GlobalDCE in tandem as their
benefits generally outweight the cost, making the whole pipeline
faster.
This fixes PR34652.
Differential Revision: https://reviews.llvm.org/D38154
llvm-svn: 314997
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
Differential Revision: https://reviews.llvm.org/D37121
llvm-svn: 312862
Summary:
GlobalExtensions is dereferenced twice, once for iteration and then a check if it is empty.
As a ManagedStatic this dereference forces it's construction which is unnecessary.
Reviewers: efriedma, davide, mehdi_amini
Reviewed By: mehdi_amini
Subscribers: chapuni, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D33381
llvm-svn: 307229
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html
Time to actually let go and move forward. =]
I've updated the release notes both about the removal and the
deprecation of the corresponding C API.
llvm-svn: 306797
Summary: Fixes an issue using RegisterStandardPasses from a statically linked object before PassManagerBuilder::addGlobalExtension is called from a dynamic library.
Reviewers: efriedma, theraven
Reviewed By: efriedma
Subscribers: mehdi_amini, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D33515
llvm-svn: 305303
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline. We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.
This is turned off by default. A follow-up change will turn it on to
allow for easier reversion in case it breaks something.
llvm-svn: 305146
The whole-program-devirt pass needs to run at -O0 because only it
knows about the llvm.type.checked.load intrinsic: it needs to both
lower the intrinsic itself and handle it in the summary.
Differential Revision: https://reviews.llvm.org/D33571
llvm-svn: 304019
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.
This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:
[ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ]
[ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ]
\ /
[ %e = phi i32 %a2, %c2 ]
[ add i32 %e, 4 ]
GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.
What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.
This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.
Differential revision: https://reviews.llvm.org/D24805
llvm-svn: 303850
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
also a discussion about exactly what we should do prior to re-enabling
it.
The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.
llvm-svn: 301505
Summary:
Otherwise we might end up with some empty basic blocks or
single-entry-single-exit basic blocks.
This fixes PR32085
Reviewers: chandlerc, danielcdh
Subscribers: mehdi_amini, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D30468
llvm-svn: 301395
... in the per-TU -O0 pipeline.
The problem is that there could be passes registered using
`addExtensionsToPM()` introducing unnamed globals.
Asan is an example, but there may be others. Building cppcheck
with `-flto=thin` and `-fsanitize=address` triggers an assertion
while we're reading bitcode (in lib/LTO), as the BitcodeReader
assumes there are no unnamed globals (because the namer has run).
Unfortunately I wasn't able to find an easy way to test this.
I added a comment in the hope nobody moves this again.
llvm-svn: 301102
This patch optimizes two memory intrinsic operations: memset and memcpy based
on the profiled size of the operation. The high level transformation is like:
mem_op(..., size)
==>
switch (size) {
case s1:
mem_op(..., s1);
goto merge_bb;
case s2:
mem_op(..., s2);
goto merge_bb;
...
default:
mem_op(..., size);
goto merge_bb;
}
merge_bb:
Differential Revision: http://reviews.llvm.org/D28966
llvm-svn: 299446
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
Summary:
loop unrolling and icp will make the sample profile annotation much harder in the backend. So disable these 2 optimization in the ThinLTO compile phase.
Will add a test in cfe in a separate patch.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D31217
llvm-svn: 298646
Pass const qualified summaries into importers and unqualified summaries into
exporters. This lets us const-qualify the summary argument to thinBackend.
Differential Revision: https://reviews.llvm.org/D31230
llvm-svn: 298534
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.
Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.
Differential Revision: https://reviews.llvm.org/D30796
llvm-svn: 298104
This re-applies r289696, which caused TSan perf regression, which has
since been addressed in separate changes (see PR for details).
See PR31382.
llvm-svn: 296759
Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.
Reviewers: davidxl, eraman, xur
Reviewed By: davidxl, xur
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30282
llvm-svn: 296028
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.
Differential Revision: https://reviews.llvm.org/D29782
llvm-svn: 294981
I intend to use the same type with the same semantics in the WholeProgramDevirt
pass.
Differential Revision: https://reviews.llvm.org/D29746
llvm-svn: 294629
Also move command line handling out of the pass constructor and into
a separate function.
Differential Revision: https://reviews.llvm.org/D28422
llvm-svn: 291323
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).
As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.
clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.
This changes opt's default to off as well to match clang. The tests are
modified to explicitly enable the transformation.
llvm-svn: 290235
No existing client is passing a non-null value here. This will come back
in a slightly different form as part of the type identifier summary work.
Differential Revision: https://reviews.llvm.org/D28006
llvm-svn: 290222
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.
Reviewers: tejohnson, davidxl, dnovillo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D27743
llvm-svn: 289714
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline. The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.
Performance results on AArch64 kryo:
Improvements:
Benchmarks/CoyoteBench/fftbench -24.952%
spec2006/bzip2 -4.071%
internal bmark -3.177%
Benchmarks/PAQ8p/paq8p -1.754%
spec2000/perlbmk -1.328%
spec2006/h264ref -1.140%
Regressions:
internal bmark +1.818%
Benchmarks/mafft/pairlocalalign +1.084%
Reviewers: sebpop, dberlin, hiraditya
Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27722
llvm-svn: 289696
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.
Reviewers: tejohnson, davidxl, dnovillo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D27743
llvm-svn: 289669
This pass splits globals into elements using inrange annotations on
getelementptr indices.
Differential Revision: https://reviews.llvm.org/D22295
llvm-svn: 287178
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.
Reviewers: davidxl, hfinkel
Subscribers: mehdi_amini, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26155
llvm-svn: 286325
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
sqrt(val);
is transformed to
if (val < 0)
sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.
These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.
Reviewers: hfinkel, mehdi_amini, davidxl
Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz
Differential Revision: https://reviews.llvm.org/D24414
llvm-svn: 284542
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.
Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.
Fixes PR30419.
Reviewers: mehdi_amini
Subscribers: probinson, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24701
llvm-svn: 281840