Commit Graph

461 Commits

Author SHA1 Message Date
Adam Nemet f31b1f310c Move verbosity check for remarks to the diag handler
Test needs some slight adjustment because we no longer check the existence of
BFI but rather that the actual hotness is set on the remark.  If entry_count
is not set getBlockProfileCount returns None.

llvm-svn: 314874
2017-10-04 04:26:23 +00:00
Haicheng Wu 3ec848bc50 [InlineCost] add visitSelectInst()
InlineCost can understand Select IR now.  This patch finds free Select IRs and
continue the propagation of SimplifiedValues, ConstantOffsetPtrs, and
SROAArgValues.

Differential Revision: https://reviews.llvm.org/D37198

llvm-svn: 314307
2017-09-27 14:44:56 +00:00
Reid Kleckner e33c94f1b0 Add llvm.codeview.annotation to implement MSVC __annotation
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.

Reviewers: majnemer

Subscribers: eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36904

llvm-svn: 312569
2017-09-05 20:14:58 +00:00
Adrian Prantl 05782218ab Canonicalize the representation of empty an expression in DIGlobalVariableExpression
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().

If someone needs to upgrade out-of-tree testcases:
  perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.

llvm-svn: 312144
2017-08-30 18:06:51 +00:00
Dehao Chen f0e27e63e7 Move accurate-sample-profile into the function attribute.
Summary: We need to have accurate-sample-profile in function attribute so that it works with LTO.

Reviewers: davidxl, rsmith

Reviewed By: davidxl

Subscribers: sanjoy, mehdi_amini, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D37113

llvm-svn: 311706
2017-08-24 21:37:04 +00:00
Dehao Chen b2d1de5a7c Add test to cover accurate-sample-profile.
Summary: This patch adds test to cover the logic guarded by "accurate-sample-profile" flag.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: sanjoy, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D37084

llvm-svn: 311618
2017-08-23 23:19:11 +00:00
Reid Kleckner 6d353348e5 Parse and print DIExpressions inline to ease IR and MIR testing
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.

This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.

See also PR22780, for making DIExpression not be an MDNode.

Reviewers: aprantl, dexonsmith, dblaikie

Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37075

llvm-svn: 311594
2017-08-23 20:31:27 +00:00
Sam Elliott e604b563ea Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311349
2017-08-21 16:45:47 +00:00
Sam Elliott 7fe0aaa140 Revert "Emit only A Single Opt Remark When Inlining"
Reverting due to clang build failure

llvm-svn: 311274
2017-08-20 06:55:10 +00:00
Sam Elliott 785dd75369 Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311273
2017-08-20 06:43:34 +00:00
Chandler Carruth 4f3aa29a46 [Inliner] Fix a nasty bug when inlining a non-recursive trace of
a function into itself.

We tried to fix this before in r306495 but that got reverted as the
assert was actually hit.

This fixes the original bug (which we seem to have lost track of with
the revert) by blocking a second remapping when the function being
inlined is also the caller and the remapping could succeed but
erroneously.

The included test case would actually load from an inlined copy of the
alloca before this change, failing to load the stored value and
miscompiling.

Many thanks to Richard Smith for diagnosing a user miscompile to this
bug, and to Kyle for the first attempt and initial analysis and David Li
for remembering the issue and how to fix it and suggesting the patch.
I'm just stitching it together and landing it. =]

llvm-svn: 311229
2017-08-19 06:56:11 +00:00
Chandler Carruth 2a80fddf67 [Inliner] Clean up a test case a bit to make it more clear what is being
tested and why.

llvm-svn: 311228
2017-08-19 06:06:44 +00:00
Chandler Carruth bba762a13f [InlineCost] Refactor the checks for different analyses to be a bit more
localized to the code that uses those analyses.

Technically, this can change behavior as we no longer require the
existence of the ProfileSummaryInfo analysis to use local profile
information via BFI. We didn't actually require the PSI to have an
interesting profile though, so this only really impacts the behavior in
non-default pass pipelines.

IMO, this makes it substantially less surprising how everything works --
before an analysis that wasn't actually used had to exist to trigger
*any* profile aware inlining. I think the new organization makes it more
obvious where various checks for profile signals happen.

Differential Revision: https://reviews.llvm.org/D36710

llvm-svn: 310888
2017-08-14 21:25:00 +00:00
Matt Arsenault aac47c1c00 AMDGPU: Use a custom areInlineCompatible
Fixes not inlining OpenCL library functions on AMDGPU,
which don't have an explicitly set target-cpu.

llvm-svn: 310269
2017-08-07 17:08:44 +00:00
Teresa Johnson ecd901314d [PM] Split LoopUnrollPass and make partial unroller a function pass
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.

*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.

Reviewers: chandlerc

Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36157

llvm-svn: 309886
2017-08-02 20:35:29 +00:00
Chad Rosier 2e1c050e52 [InlineCost] Simplify more 'and' and 'or' operations.
Differential Revision: https://reviews.llvm.org/D35856

llvm-svn: 309817
2017-08-02 14:40:42 +00:00
Chandler Carruth 95055d8f8b [PM] Fix a bug where through CGSCC iteration we can get
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.

This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.

The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).

I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.

A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.

The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.

I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.

We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.

Differential Revision: https://reviews.llvm.org/D36188

llvm-svn: 309784
2017-08-02 02:09:22 +00:00
Easwaran Raman 51b809bf2f [Inliner] Do not apply any bonus for cold callsites.
Summary:
Inlining threshold is increased by application of bonuses when the
callee has a single reachable basic block or is rich in vector
instructions. Similarly, inlining cost is reduced by applying a large
bonus when the last call to a static function is considered for
inlining. This patch disables the application of these bonuses when the
callsite or the callee is cold. The intention here is to prevent a large
cold callsite from being inlined to a non-cold caller that could prevent
the caller from being inlined. This is especially important when the
cold callsite is a last call to a static since the associated bonus is
very high.

Reviewers: chandlerc, davidxl

Subscribers: danielcdh, llvm-commits

Differential Revision: https://reviews.llvm.org/D35823

llvm-svn: 309441
2017-07-28 21:47:36 +00:00
Adrian Prantl abe04759a6 Remove the obsolete offset parameter from @llvm.dbg.value
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951

llvm-svn: 309426
2017-07-28 20:21:02 +00:00
Haicheng Wu abdef9ee7e [TTI] Refine the cost of EXT in getUserCost()
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.

Differential Revision: https://reviews.llvm.org/D34458

llvm-svn: 308076
2017-07-15 02:12:16 +00:00
Florian Hahn 738a70d54d [ARM] Fix typo in test added in r307889
This fixes the following test failure:
    LLVM :: Transforms/Inline/ARM/inline-target-attr.ll

Sorry for any inconenience.

llvm-svn: 307892
2017-07-13 08:53:43 +00:00
Florian Hahn 4adcfcf1d6 [ARM] Inline callee if its target-features are a subset of the caller
Summary:
Similar to X86, it should be safe to inline callees if their
target-features are a subset of the caller. As some subtarget features
provide different instructions depending on whether they are set or
unset (e.g. ThumbMode and ModeSoftFloat), we use a whitelist of
target-features describing hardware capabilities only.

Reviewers: kristof.beyls, rengolin, t.p.northover, SjoerdMeijer, peter.smith, silviu.baranga, efriedma

Reviewed By: SjoerdMeijer, efriedma

Subscribers: dschuff, efriedma, aemerson, sdardis, javed.absar, arichardson, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D34697

llvm-svn: 307889
2017-07-13 08:26:17 +00:00
Chandler Carruth c213c67df8 [PM] Fix a nasty bug in the new PM where we failed to properly
invalidation of analyses when merging SCCs.

While I've added a bunch of testing of this, it takes something much
more like the inliner to really trigger this as you need to have
partially-analyzed SCCs with updates at just the right time. So I've
added a direct test for this using the inliner and verifying the
domtree. Without the changes here, this test ends up finding a stale
dominator tree.

However, to handle this properly, we need to invalidate analyses
*before* merging the SCCs. After talking to Philip and Sanjoy about this
they convinced me this was the right approach. To do this, we need
a callback mechanism when merging SCCs so we can observe the cycle that
will be merged before the merge happens. This API update ended up being
surprisingly easy.

With this commit, the new PM passes the test-suite again. It hadn't
since MemorySSA was enabled for EarlyCSE as that also will find this bug
very quickly.

llvm-svn: 307498
2017-07-09 13:45:11 +00:00
Chandler Carruth bd9c29039e [PM] Finish implementing and fix a chain of bugs uncovered by testing
the invalidation propagation logic from an SCC to a Function.

I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.

However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.

Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.

Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.

llvm-svn: 307487
2017-07-09 03:59:31 +00:00
Davide Italiano 9282f1aece [Cloner] Re-map simplfied cloned instructions.
This commit pretty much rolls back the logic added in r306495
as in the testcase provided we simplify an `icmp` looking through
a PHI that hasn't been mapped yet.

I think instsimplify shouldn't do threading over select/phis or
just looking through phis in general, but this is what we have
now. Also, add a test to prevent this from happening in case somebody
wants to modify this code again.

Briefly discussed with Kyle Butt (thanks Kyle!).

llvm-svn: 306938
2017-07-01 03:29:33 +00:00
Brian Gesiak 4ef3daafef [ORE] Add diagnostics hotness threshold
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html

Reviewers: anemet, davidxl, hfinkel

Reviewed By: anemet

Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D34867

llvm-svn: 306912
2017-06-30 23:14:53 +00:00
Easwaran Raman c5fa6358ba [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case
Differential Revision: https://reviews.llvm.org/D34312

llvm-svn: 306484
2017-06-27 23:11:18 +00:00
Florian Hahn 2665febb54 [AArch64] Inline callee if its target-features are a subset of the caller
Summary:
Similar to X86, it should be safe to inline callees if their target-features
are a subset of the caller. This change matches GCC's inlining behavior
with respect to attributes [1].

[1] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes

Reviewers: kristof.beyls, javed.absar, rengolin, t.p.northover

Reviewed By: t.p.northover

Subscribers: aemerson, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D34698

llvm-svn: 306478
2017-06-27 22:27:32 +00:00
Jun Bum Lim 506cfb7ab7 [InlineCost] Do not take INT_MAX when Cost is negative
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.

Reviewers: hans, echristo, dmgreen

Reviewed By: dmgreen

Subscribers: mcrosier, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D34436

llvm-svn: 306118
2017-06-23 16:12:37 +00:00
whitequark 08b20356c3 Define behavior of "stack-probe-size" attribute when inlining.
Also document the attribute, since "probe-stack" already is.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D34528

llvm-svn: 306069
2017-06-22 23:22:36 +00:00
whitequark ed54b4a798 Add a "probe-stack" attribute
This attribute is used to ensure the guard page is triggered on stack
overflow. Stack frames larger than the guard page size will generate
a call to __probestack to touch each page so the guard page won't
be skipped.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D34386

llvm-svn: 305939
2017-06-21 18:46:50 +00:00
David Blaikie ae8c4af4ac Inliner: Don't remove calls to readnone+nounwind (but not always_inline) functions in the AlwaysInliner
llvm-svn: 305245
2017-06-12 23:01:17 +00:00
David Blaikie cb9327b02d Inliner: Don't touch indirect calls
Other comments/implications are that this isn't intended behavior (nor
perserved/reimplemented in the new inliner) & complicates fixing the
'inlining' of trivially dead calls without consulting the cost function
first.

llvm-svn: 305052
2017-06-09 03:29:20 +00:00
Jun Bum Lim 2960d41e68 [InlineCost] Enable the new switch cost heuristic
Summary:
This is to enable the new switch inline cost heuristic (r301649) by removing the
old heuristic as well as the flag itself.
In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and
8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64.
No significant code size / performance regression was found in O3/O2/Os. No
significant complain was reported from the llvm-dev thread.

Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo

Reviewed By: echristo

Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini

Differential Revision: https://reviews.llvm.org/D32653

llvm-svn: 304594
2017-06-02 20:42:54 +00:00
Haicheng Wu bf277f38ad [InlineCost] Add a test case for GEP cost
The added test case is to check whether the simplified value is passed to
getGEPCost().

Differential Revision: https://reviews.llvm.org/D33779

llvm-svn: 304454
2017-06-01 19:06:07 +00:00
Teresa Johnson 525dcb617b Fix update VP metadata after inlining for instrumentation PGO
Summary:
With instrumentation profiling, when updating the VP metadata after
an inline, VP metadata on the inlined copy was inadvertantly having
all counts zeroed out. This was causing indirect calls from code inlined
during the call step to be marked as cold in the ThinLTO summaries and
not imported.

The CallerBFI needs to be passed down so that the CallSiteCount can be
computed from the profile summary info. With Sample PGO this was working
since the count is extracted from the branch weight metadata on the
call being inlined (even before we stopped looking at metadata for
non-sample PGO in r302844 this largely wasn't working for instrumentation
PGO since only promoted indirect calls would be getting inlined and have
the metadata).

Added an instrumentation PGO test and renamed the sample PGO test.

Reviewers: danielcdh, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D33389

llvm-svn: 303574
2017-05-22 20:28:18 +00:00
Easwaran Raman 3cd1479c3f [Inliner] Do not mix callsite and callee hotness based updates.
Update threshold based on callee's hotness only when BFI is not available.
Otherwise use only callsite's hotness. This makes it easier to reason about
hotness related threshold updates.

Differential revision: https://reviews.llvm.org/D33157

llvm-svn: 303210
2017-05-16 21:18:09 +00:00
Xinliang David Li 90a9ef6ced Renable test that was disabled due to cost analysis
llvm-svn: 303000
2017-05-14 02:58:39 +00:00
Teresa Johnson 2a6b7991d4 Restrict call metadata based hotness detection to Sample PGO mode
Summary:
Don't use the metadata on call instructions for determining hotness
unless we are in sample PGO mode, where it is needed because profile
counts are not accurate. In instrumentation mode this is not necessary
and does more harm than good when calls have VP metadata that hasn't
been properly scaled after transformations or dropped after constant
prop based devirtualization (both should be fixed, but we don't need
to do this in the first place for instrumentation PGO).

This required adjusting a number of tests to distinguish between sample
and instrumentation PGO handling, and to add in profile summary metadata
so that getProfileCount can get the summary.

Reviewers: davidxl, danielcdh

Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D32877

llvm-svn: 302844
2017-05-11 23:18:05 +00:00
Easwaran Raman c103ef89ee Decrease inlinecold-threshold to 45
I ran the test-suite (including SPEC 2006) in PGO mode comparing cold
thresholds of 225 and 45. Here are some stats on the text size:

Out of 904 tests that ran, 197 see a change in text size. The average
text size reduction (of all the 904 binaries) is 1.07%. Of the 197
binaries, 19 see a text size increase, as high as 18%, but most of them
are small single source benchmarks. There are 3 multisource benchmarks
with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On
the other side of the spectrum, 31 benchmarks see >10% size reduction
and 6 of them are MultiSource.

I haven't run the test-suite with other values of inlinecold-threshold.
Since we have a cold callsite threshold of 45, I picked this value.

Differential revision: https://reviews.llvm.org/D33106

llvm-svn: 302829
2017-05-11 21:36:28 +00:00
Daniel Berlin 0f2af7f93b ConstantFold: Handle gep nonnull, undef as well
llvm-svn: 302447
2017-05-08 17:37:33 +00:00
Dehao Chen a75d0da91b Update VP prof metadata during inlining.
Summary: r298270 added profile update logic for branch_weights. This patch implements profile update logic for VP prof metadata too.

Reviewers: eraman, tejohnson, davidxl

Reviewed By: eraman

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32773

llvm-svn: 302209
2017-05-05 00:47:34 +00:00
Easwaran Raman 5e6f9bd4f8 [PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline.
Differential revision: https://reviews.llvm.org/D32768

llvm-svn: 302170
2017-05-04 16:58:45 +00:00
Jun Bum Lim 919f9e8d65 [InlineCost] Improve the cost heuristic for Switch
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets

```
lor.lhs.false2:                                   ; preds = %if.then
  switch i32 %Status, label %if.then27 [
    i32 -7012, label %if.end35
    i32 -10008, label %if.end35
    i32 -10016, label %if.end35
    i32 15000, label %if.end35
    i32 14013, label %if.end35
    i32 10114, label %if.end35
    i32 10107, label %if.end35
    i32 10105, label %if.end35
    i32 10013, label %if.end35
    i32 10011, label %if.end35
    i32 7008, label %if.end35
    i32 7007, label %if.end35
    i32 5002, label %if.end35
  ]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)

```
.LBB853_9:                              // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.gt    .LBB853_14
// BB#10:                               // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.gt    .LBB853_18
// BB#11:                               // %lor.lhs.false2
        mov     w8, #-10016
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#12:                               // %lor.lhs.false2
        mov     w8, #-10008
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#13:                               // %lor.lhs.false2
        mov     w8, #-7012
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_14:                             // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.gt    .LBB853_21
// BB#15:                               // %lor.lhs.false2
        mov     w8, #-10105
        add             w8, w19, w8
        cmp             w8,           // =9
        b.hi    .LBB853_17
// BB#16:                               // %lor.lhs.false2
        orr     w9, wzr, #0x1
        lsl     w8, w9, w8
        mov     w9, 
        and             w8, w8, w9
        cbnz    w8, .LBB853_23
.LBB853_17:                             // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_18:                             // %lor.lhs.false2
        mov     w8, #-7007
        add             w8, w19, w8
        cmp             w8,           // =2
        b.lo    .LBB853_23
// BB#19:                               // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#20:                               // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_21:                             // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#22:                               // %lor.lhs.false2
        mov     w8, 
        cmp             w19, w8
        b.ne    .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.

This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
  -inline-generic-switch-cost=false

This change was originally proposed by Haicheng in D29870.

Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier

Reviewed By: hans

Subscribers: joerg, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D31085

llvm-svn: 301649
2017-04-28 16:04:03 +00:00
Matt Arsenault f10061ec70 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

llvm-svn: 299876
2017-04-10 20:18:21 +00:00
Teresa Johnson 428b9e0627 [ThinLTO] Correct counting of functions in inliner stats
Summary: Declarations need to be filtered out when counting functions.

Reviewers: eraman

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31336

llvm-svn: 298720
2017-03-24 17:59:06 +00:00
Dehao Chen e593049fb0 Updates branch_weights annotation for call instructions during inlining.
Summary: Inliner should update the branch_weights annotation to scale it to proper value.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D30767

llvm-svn: 298270
2017-03-20 16:40:44 +00:00
Chandler Carruth 814e0df1c5 [PM/Inliner] Fix a bug in r297374 where we would leave stale calls in
the work queue and crash when trying to visit them after deleting the
function containing those calls.

llvm-svn: 297940
2017-03-16 10:45:42 +00:00
Chandler Carruth 6ef42cc6bb [PM/Inliner] Add a test case that encapsulates the core issue addressed
in r297374.

I've extracted a small version of this from the C++ metaprogram Richard
came up with to exercise these kinds of issues and written comments to
describe both how to reproduce a fresh version of the test case and what
likely failure modes are.

The test case is still a bit brittle as it depends on the particular
inline cost modeling and SCC visitation order, but it definitely would
have caught the bug right away when developing things so it seems
a really valuable test case to have.

llvm-svn: 297935
2017-03-16 10:13:55 +00:00
Chandler Carruth 20e588e1af [PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.

This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.

Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.

Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;

void g(const char *);

template <bool K, int N> void f(bool *B, bool *E) {
  if (K)
    g(str<N>);
  if (B == E)
    return;
  if (*B)
    f<true, N + 1>(B + 1, E);
  else
    f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }

extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```

When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.

What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.

The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.

We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.

I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.

The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
   really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
   we consider for inlining.

llvm-svn: 297374
2017-03-09 11:35:40 +00:00