Commit Graph

2895 Commits

Author SHA1 Message Date
George Rimar 5d8aea1009 WholeProgramDevirt: Fixed compilation error under MSVS2015.
It was introduced in:

r296945
WholeProgramDevirt: Implement exporting for single-impl devirtualization.
---------------------
r296939
WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
---------------------

Microsoft Visual Studio Community 2015
Version 14.0.23107.0 D14REL
Does not compile that code without additional brackets, showing multiple error like below:

WholeProgramDevirt.cpp(1216): error C2958: the left bracket '[' found at 'c:\access_softek\llvm\lib\transforms\ipo\wholeprogramdevirt.cpp(1216)' was not matched correctly
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ']' before '}'
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ';' before '}'
WholeProgramDevirt.cpp(1216): error C2059: syntax error: ']'

llvm-svn: 297451
2017-03-10 10:31:56 +00:00
Chandler Carruth 20e588e1af [PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.

This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.

Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.

Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;

void g(const char *);

template <bool K, int N> void f(bool *B, bool *E) {
  if (K)
    g(str<N>);
  if (B == E)
    return;
  if (*B)
    f<true, N + 1>(B + 1, E);
  else
    f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }

extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```

When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.

What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.

The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.

We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.

I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.

The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
   really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
   we consider for inlining.

llvm-svn: 297374
2017-03-09 11:35:40 +00:00
Peter Collingbourne 0152c8156b WholeProgramDevirt: Implement importing for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29854

llvm-svn: 297350
2017-03-09 01:11:15 +00:00
Peter Collingbourne 6d284fab20 WholeProgramDevirt: Implement importing for single-impl devirtualization.
Differential Revision: https://reviews.llvm.org/D29844

llvm-svn: 297333
2017-03-09 00:21:25 +00:00
Teresa Johnson d820447212 Perform symbol binding for .symver versioned symbols
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".

While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.

E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.

The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.

Reviewers: rafael, pcc

Reviewed By: pcc

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30485

llvm-svn: 297332
2017-03-09 00:19:49 +00:00
Evgeniy Stepanov 8537d9994d Don't merge global constants with non-dbg metadata.
!type metadata can not be dropped. An alternative to this is adding
!type metadata from the replaced globals to the replacement, but that
may weaken type tests and make them slower at the same time.

The merged global gets !dbg metadata from replaced globals, and can
end up with multiple debug locations.

llvm-svn: 297327
2017-03-09 00:03:37 +00:00
Evgeniy Stepanov 7a5cfa9a11 Fix one-after-the-end type metadata handling in globalsplit.
Itanium ABI may have an address point one byte after the end of a
vtable. When such vtable global is split, the !type metadata needs to
follow the right vtable.

Differential Revision: https://reviews.llvm.org/D30716

llvm-svn: 297236
2017-03-07 22:18:48 +00:00
Hans Wennborg 254f5fa5f2 Disable gvn-hoist (PR32153)
llvm-svn: 297075
2017-03-06 21:10:40 +00:00
Dehao Chen c632a393b7 Remove the sample pgo annotation heuristic that uses call count to annotate basic block count.
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.

Reviewers: davidxl, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D30658

llvm-svn: 297038
2017-03-06 17:49:59 +00:00
Peter Collingbourne f0bb90b126 Fix build.
llvm-svn: 296949
2017-03-04 01:38:05 +00:00
Peter Collingbourne 77a8d563a3 WholeProgramDevirt: Implement exporting for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29846

llvm-svn: 296948
2017-03-04 01:34:53 +00:00
Peter Collingbourne 2325bb34c1 WholeProgramDevirt: Implement exporting for single-impl devirtualization.
Differential Revision: https://reviews.llvm.org/D29811

llvm-svn: 296945
2017-03-04 01:31:01 +00:00
Peter Collingbourne b406baaeef WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
Any unsuccessful llvm.type.checked.load devirtualizations will be translated
into uses of llvm.type.test, so we need to add the resulting llvm.type.test
intrinsics to the function summaries so that the LowerTypeTests pass will
export them.

Differential Revision: https://reviews.llvm.org/D29808

llvm-svn: 296939
2017-03-04 01:23:30 +00:00
Benjamin Kramer 9528f8c2fb Revert "Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline.""
This reverts commit r296759. Miscompiles bash.

llvm-svn: 296872
2017-03-03 14:27:53 +00:00
Peter Collingbourne 3baa72af7d ThinLTOBitcodeWriter: Do not follow operand edges of type GlobalValue when looking for virtual functions.
Such edges may otherwise result in infinite recursion if a pointer to a vtable
is reachable from the vtable itself. This can happen in practice if a TU
defines the ABI types used to implement RTTI, and is itself compiled with RTTI.

Fixes PR32121.

llvm-svn: 296839
2017-03-02 23:10:17 +00:00
Geoff Berry 484d756583 Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This re-applies r289696, which caused TSan perf regression, which has
since been addressed in separate changes (see PR for details).

See PR31382.

llvm-svn: 296759
2017-03-02 16:16:47 +00:00
Dehao Chen a60cdd3881 Add function importing info from samplepgo profile to the module summary.
Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported.

Reviewers: mehdi_amini, tejohnson

Reviewed By: tejohnson

Subscribers: davidxl, llvm-commits

Differential Revision: https://reviews.llvm.org/D30053

llvm-svn: 296498
2017-02-28 18:09:44 +00:00
Adam Nemet de53bfb94d [OptDiag] Hide legacy remark ctors
These are only used when emitting remarks without ORE directly using the free
functions emitOptimizationRemark*.

llvm-svn: 296037
2017-02-23 23:11:11 +00:00
Dehao Chen cc75d2441d Add call branch annotation for ICP promoted direct call in SamplePGO mode.
Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.

Reviewers: davidxl, eraman, xur

Reviewed By: davidxl, xur

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30282

llvm-svn: 296028
2017-02-23 22:15:18 +00:00
Dehao Chen 533bc6ea8e Use base discriminator in sample pgo profile matching.
Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching.

Reviewers: dblaikie, davidxl

Reviewed By: dblaikie, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30218

llvm-svn: 295999
2017-02-23 18:27:45 +00:00
Dehao Chen 7d230325ef Increases full-unroll threshold.
Summary:
The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold

This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge):

Performance:

403	0.11%
433	0.51%
445	0.48%
447	3.50%
453	1.49%
464	0.75%

Code size:

403	0.56%
433	0.96%
445	2.16%
447	2.96%
453	0.94%
464	8.02%

The compiler time overhead is similar with code size.

Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc

Reviewed By: hfinkel, chandlerc

Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D28368

llvm-svn: 295538
2017-02-18 03:46:51 +00:00
Justin Bogner 7bc978b543 OptDiag: Allow constructing DiagnosticLocation from DISubprograms
This avoids creating a DILocation just to represent a line number,
since creating Metadata is expensive. Creating a DiagnosticLocation
directly is much cheaper.

llvm-svn: 295531
2017-02-18 02:00:27 +00:00
Peter Collingbourne 184773d81f WholeProgramDevirt: For VCP use a 32-bit ConstantInt for the byte offset.
A future change will cause this byte offset to be inttoptr'd and then exported
via an absolute symbol. On the importing end we will expect the symbol to be
in range [0,2^32) so that it will fit into a 32-bit relocation. The problem
is that on 64-bit architectures if the offset is negative it will not be in
the correct range once we inttoptr it.

This change causes us to use a 32-bit integer so that it can be inttoptr'd
(which zero extends) into the correct range.

Differential Revision: https://reviews.llvm.org/D30016

llvm-svn: 295487
2017-02-17 19:43:45 +00:00
Peter Collingbourne 37317f1207 WholeProgramDevirt: Examine the function body when deciding whether functions are readnone.
The goal is to get an analysis result even for de-refineable functions.

Differential Revision: https://reviews.llvm.org/D29803

llvm-svn: 295472
2017-02-17 18:17:04 +00:00
Peter Collingbourne 08eb081ac3 PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline.
Differential Revision: https://reviews.llvm.org/D30008

llvm-svn: 295260
2017-02-15 23:48:38 +00:00
Peter Collingbourne 50cbd7cc90 Re-apply r295110 and r295144 with a fix for the ASan issue.
llvm-svn: 295241
2017-02-15 21:56:51 +00:00
Daniel Jasper eef9b03395 Revert r295110 and r295144.
This fails under ASAN:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio

llvm-svn: 295162
2017-02-15 09:56:08 +00:00
Peter Collingbourne e2367415b6 WholeProgramDevirt: Separate the code that applies optzns from the code that decides whether to apply them. NFCI.
The idea is that the apply* functions will also be called when importing
devirt optimizations.

Differential Revision: https://reviews.llvm.org/D29745

llvm-svn: 295144
2017-02-15 02:13:08 +00:00
Peter Collingbourne 534c0175b6 WholeProgramDevirt: Change internal vcall data structures to match summary.
Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).

This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
  types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).

Differential Revision: https://reviews.llvm.org/D29744

llvm-svn: 295110
2017-02-14 22:12:23 +00:00
Taewook Oh f22fa72e4a Do not apply redundant LastCallToStaticBonus
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.

If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.

Reviewers: chandlerc, eraman

Reviewed By: eraman

Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29169

llvm-svn: 295075
2017-02-14 17:30:05 +00:00
Peter Collingbourne 002c2d5380 ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module.
Differential Revision: https://reviews.llvm.org/D29701

llvm-svn: 295021
2017-02-14 03:42:38 +00:00
Peter Collingbourne c45f7f3eb4 FunctionAttrs: Factor out a function for querying memory access of a specific copy of a function. NFC.
This will later be used by ThinLTOBitcodeWriter to add copies of readnone
functions to the regular LTO module.

Differential Revision: https://reviews.llvm.org/D29695

llvm-svn: 295008
2017-02-14 00:28:13 +00:00
Sanjay Patel 4f74216da0 [FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back to its parent function
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html
...we should be able to propagate 'nonnull' info from a callsite back to its parent.

The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430),
but this won't solve that problem.

The transform is currently disabled by default while we wait for clang to work-around
potential security problems:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html

Differential Revision: https://reviews.llvm.org/D27855

llvm-svn: 294998
2017-02-13 23:10:51 +00:00
Peter Collingbourne 2b33f65317 IR: Type ID summary extensions for WPD; thread summary into WPD pass.
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.

Differential Revision: https://reviews.llvm.org/D29782

llvm-svn: 294981
2017-02-13 19:26:18 +00:00
Chandler Carruth addcda483e [PM] Port ArgumentPromotion to the new pass manager.
Now that the call graph supports efficient replacement of a function and
spurious reference edges, we can port ArgumentPromotion to the new pass
manager very easily.

The old PM-specific bits are sunk into callbacks that the new PM simply
doesn't use. Unlike the old PM, the new PM simply does argument
promotion and afterward does the update to LCG reflecting the promoted
function.

Differential Revision: https://reviews.llvm.org/D29580

llvm-svn: 294667
2017-02-09 23:46:27 +00:00
Peter Collingbourne 17febdbb25 WholeProgramDevirt: Check that VCP candidate functions are defined before evaluating them.
This was crashing before.

llvm-svn: 294666
2017-02-09 23:46:26 +00:00
Chandler Carruth aaad9f84be [PM/LCG] Teach the LazyCallGraph how to replace a function without
disturbing the graph or having to update edges.

This is motivated by porting argument promotion to the new pass manager.
Because of how LLVM IR Function objects work, in order to change their
signature a new object needs to be created. This is efficient and
straight forward in the IR but previously was very hard to implement in
LCG. We could easily replace the function a node in the graph
represents. The challenging part is how to handle updating the edges in
the graph.

LCG previously used an edge to a raw function to represent a node that
had not yet been scanned for calls and references. This was the core
of its laziness. However, that model causes this kind of update to be
very hard:
1) The keys to lookup an edge need to be `Function*`s that would all
   need to be updated when we update the node.
2) There will be some unknown number of edges that haven't transitioned
   from `Function*` edges to `Node*` edges.

All of this complexity isn't necessary. Instead, we can always build
a node around any function, always pointing edges at it and always using
it as the key to lookup an edge. To maintain the laziness, we need to
sink the *edges* of a node into a secondary object and explicitly model
transitioning a node from empty to populated by scanning the function.
This design seems much cleaner in a number of ways, but importantly
there is now exactly *one* place where the `Function*` has to be
updated!

Some other cleanups that fall out of this include having something to
model the *entry* edges more accurately. Rather than hand rolling parts
of the node in the graph itself, we have an explicit `EdgeSequence`
object that gives us exactly the functionality needed. We also have
a consistent place to define the edge iterators and can use them for
both the entry edges and the internal edges of the graph.

The API used to model the separation between a node and its edges is
intentionally very thin as most clients are expected to deal with nodes
that have populated edges. We model this exactly as an optional does
with an additional method to populate the edges when that is
a reasonable thing for a client to do. This is based on API design
suggestions from Richard Smith and David Blaikie, credit goes to them
for helping pick how to model this without it being either too explicit
or too implicit.

The patch is somewhat noisy due to shifting around iterator types and
new syntax for walking the edges of a node, but most of the
functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.

Differential Revision: https://reviews.llvm.org/D29577

llvm-svn: 294653
2017-02-09 23:24:13 +00:00
Peter Collingbourne cea1e4e79a De-duplicate some code for creating an AARGetter suitable for the legacy PM.
I'm about to use this in a couple more places.

Differential Revision: https://reviews.llvm.org/D29793

llvm-svn: 294648
2017-02-09 23:11:52 +00:00
Peter Collingbourne 857aba4410 Rename LowerTypeTestsSummaryAction to PassSummaryAction. NFCI.
I intend to use the same type with the same semantics in the WholeProgramDevirt
pass.

Differential Revision: https://reviews.llvm.org/D29746

llvm-svn: 294629
2017-02-09 21:45:01 +00:00
Peter Collingbourne 28ffd3261f ThinLTOBitcodeWriter: Strip debug info from merged module.
This module will contain nothing but vtable definitions and (soon)
available_externally function definitions, so there is no point in keeping
debug info in the module.

Differential Revision: https://reviews.llvm.org/D28913

llvm-svn: 294511
2017-02-08 20:44:00 +00:00
Peter Collingbourne 1ea1fd893c LowerTypeTests: Simplify. NFC.
llvm-svn: 294273
2017-02-07 03:20:58 +00:00
Dehao Chen 4a9dd70213 Fix the samplepgo indirect call promotion bug: we should not promote a direct call.
Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29570

llvm-svn: 294260
2017-02-06 23:33:15 +00:00
Dehao Chen c81483d79c Fix the bug of samplepgo indirect call promption when type casting of the return value is needed.
Summary: When type casting of the return value is needed, promoteIndirectCall will return the type casting instruction instead of the direct call. This patch changed to return the direct call instruction instead.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29569

llvm-svn: 294205
2017-02-06 18:10:36 +00:00
Dehao Chen 448b5790f6 Refactor SampleProfile.cpp to make it cleaner. (NFC)
llvm-svn: 294118
2017-02-05 07:32:17 +00:00
Davide Italiano ec49313b11 [IPCP] Don't propagate return value for naked functions.
This is pretty much the same change made in SCCP.

llvm-svn: 294098
2017-02-04 19:44:14 +00:00
Peter Collingbourne e6fd9ff96a IRMover: Merge flags LinkModuleInlineAsm and IsPerformingImport.
Currently these flags are always the inverse of each other, so there is
no need to keep them separate.

Differential Revision: https://reviews.llvm.org/D29471

llvm-svn: 294016
2017-02-03 17:01:14 +00:00
Peter Collingbourne 6d8f817f8b FunctionImport: Use IRMover directly.
The importer was previously using ModuleLinker in a sort of "IRMover mode". Use
IRMover directly instead in order to remove a level of indirection.

I will remove all importing support from ModuleLinker in a separate
change.

Differential Revision: https://reviews.llvm.org/D29468

llvm-svn: 294014
2017-02-03 16:56:27 +00:00
Mehdi Amini 1380edf4ef Revert "[ThinLTO] Add an auto-hide feature"
This reverts commit r293970.

After more discussion, this belongs to the linker side and
there is no added value to do it at this level.

llvm-svn: 293993
2017-02-03 07:41:43 +00:00
Mehdi Amini b0a8ff71e5 [ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.

This is a recommit of r293912 after fixing build failures,
and a recommit of r293918 after fixing LLD tests.

Differential Revision: https://reviews.llvm.org/D28978

llvm-svn: 293970
2017-02-03 00:32:38 +00:00
Mehdi Amini 21c89dc920 Revert "[ThinLTO] Add an auto-hide feature"
This reverts commit r293918, one lld test does not pass.

llvm-svn: 293961
2017-02-02 23:20:36 +00:00
Peter Collingbourne 37e2459186 FunctionImport: Remove the -disable-force-link-odr flag and change importFunctions to never force link.
This removes some functionality that was only being used by tests.

Differential Revision: https://reviews.llvm.org/D29439

llvm-svn: 293919
2017-02-02 18:42:25 +00:00
Mehdi Amini 97624fb1ec [ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.

This is a recommit of r293912 after fixing build failures.

Differential Revision: https://reviews.llvm.org/D28978

llvm-svn: 293918
2017-02-02 18:31:35 +00:00
Mehdi Amini 827600deaf Revert "[ThinLTO] Add an auto-hide feature"
This reverts r293912, bots are broken.

llvm-svn: 293914
2017-02-02 18:24:37 +00:00
Mehdi Amini dc5a7444f0 [ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.

Differential Revision: https://reviews.llvm.org/D28978

llvm-svn: 293912
2017-02-02 18:13:46 +00:00
Dehao Chen 274df5ea41 Explicitly promote indirect calls before sample profile annotation.
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.

Reviewers: xur, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29040

llvm-svn: 293657
2017-01-31 17:49:37 +00:00
Dehao Chen 6217fa44b8 Revert r292979 which causes compile time failure.
llvm-svn: 293557
2017-01-30 22:26:05 +00:00
Adam Nemet e7bdf227f6 [Inliner] Fold analysis remarks into missed remarks
This significantly reduces the noise level of these messages.

llvm-svn: 293492
2017-01-30 16:22:45 +00:00
Haicheng Wu f8dc2d8c8b [Inliner] Fix a comment to match the code. NFC.
TotalAltCost => TotalSecondaryCost

Differential Revision: https://reviews.llvm.org/D29231

llvm-svn: 293490
2017-01-30 16:15:14 +00:00
Chandler Carruth 8e9c0a8472 [ArgPromote] Move static helpers to modern LLVM naming conventions while
here. NFC.

Simple refactoring while prepping a port to the new PM.

Differential Revision: https://reviews.llvm.org/D29249

llvm-svn: 293426
2017-01-29 08:03:21 +00:00
Chandler Carruth ae9ce3d402 [ArgPromote] Run clang-format to normalize remarkably idiosyncratic
formatting that has evolved here over the past years prior to making
somewhat invasive changes to thread new PM support through the business
logic.

Differential Revision: https://reviews.llvm.org/D29248

llvm-svn: 293425
2017-01-29 08:03:19 +00:00
Chandler Carruth cd836cd4ee [ArgPromote] Re-arrange the code in a more typical, logical way.
This arranges the static helpers in an order where they are defined
prior to their use to avoid the need of forward declarations, and
collect the core pass components at the bottom below their helpers.

This also folds one trivial function into the pass itself. Factoring
this 'runImpl' was an attempt to help porting to the new pass manager,
however in my attempt to begin this port in earnest it turned out to not
be a substantial help. I think it will be easier to factor things
without it.

This is an NFC change and does a minimal amount of edits over all.
Subsequent NFC cleanups will normalize the formatting with clang-format
and improve the basic doxygen commenting.

Differential Revision: https://reviews.llvm.org/D29247

llvm-svn: 293424
2017-01-29 08:03:16 +00:00
Davide Italiano 9b8738d7c8 [PM] MLSM has been enabled for a way. Reclaim a cl::opt.
llvm-svn: 293401
2017-01-28 23:45:37 +00:00
Matthias Braun 194ded551c Use print() instead of dump() in code
llvm-svn: 293371
2017-01-28 06:53:55 +00:00
Mehdi Amini 888dee444b Global DCE performance improvement
Change the original algorithm so that it scales better when meeting
very large bitcode where every instruction does not implies a global.

The target query is "how to you get all the globals referenced by
another global"?

Before this patch, it was doing this by walking the body (or the
initializer) and collecting the references. What this patch is doing,
it precomputing the answer to this query for the whole module by
walking the use-list of every global instead.

Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

Differential Revision: https://reviews.llvm.org/D28549

llvm-svn: 293328
2017-01-27 19:48:57 +00:00
Peter Collingbourne 1df6e858ef LowerTypeTests: Ignore external globals with type metadata.
Thanks to Davide Italiano for finding the problem and providing a test case.

llvm-svn: 293119
2017-01-26 00:32:15 +00:00
Krzysztof Parzyszek 0fd6296b82 Add loop pass insertion point EP_LateLoopOptimizations
Differential Revision: https://reviews.llvm.org/D28694

llvm-svn: 293067
2017-01-25 16:12:25 +00:00
Dehao Chen a5eb1689dc Explicitly promote indirect calls before sample profile annotation.
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.

Reviewers: xur, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29040

llvm-svn: 292979
2017-01-24 21:05:51 +00:00
Chandler Carruth 6acdca78a0 [PH] Replace uses of AssertingVH from members of analysis results with
a lazy-asserting PoisoningVH.

AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.

This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.

The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.

The rest is straight cleanup.

I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.

Differential Revision: https://reviews.llvm.org/D29006

llvm-svn: 292928
2017-01-24 12:55:57 +00:00
David L. Jones d21529fa0d [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848
2017-01-23 23:16:46 +00:00
Evgeniy Stepanov 29e56bceec Revert "Refactor SampleProfile.cpp to move computation inside a branch. (NFC)"
Causes MSan failures on the buildbot.

llvm-svn: 292840
2017-01-23 22:40:08 +00:00
Dehao Chen a53219a5f2 Refactor SampleProfile.cpp to move computation inside a branch. (NFC)
llvm-svn: 292803
2017-01-23 17:09:02 +00:00
Chandler Carruth e8c66b2766 [PM] Replace the hard invalidate in JumpThreading for LVI with correct
invalidation of deleted functions in GlobalDCE.

This was always testing a bug really triggered in GlobalDCE. Right now
we have analyses with asserting value handles into IR. As long as those
remain, when *deleting* an IR unit, we cannot wait for the normal
invalidation scheme to kick in even though it was designed to work
correctly in the face of these kinds of deletions. Instead, the pass
needs to directly handle invalidating the analysis results pointing at
that IR unit.

I've tought the Inliner about this and this patch teaches GlobalDCE.
This will handle the asserting VH case in the existing test as well as
other issues of the same fundamental variety. I've moved the test into
the GlobalDCE directory and added a comment explaining what is going on.

Note that we cannot simply require LVI here because LVI is too lazy.

llvm-svn: 292773
2017-01-23 08:33:24 +00:00
Chandler Carruth 9524af4aac [PM] Clear any analyses for a dead function after inlining it and before
clearing its body. This is essential to avoid triggering asserting value
handles in analyses on the function's body.

I'm working on a test case for this behavior in LLVM, but Clang has
a great one that managed to trigger this on all of the bots already.

llvm-svn: 292770
2017-01-23 07:03:41 +00:00
Chandler Carruth b698d5964d [PM] Fix a really nasty bug introduced when adding PGO support to the
new PM's inliner.

The bug happens when we refine an SCC after having computed a proxy for
the FunctionAnalysisManager, and then proceed to compute fresh analyses
for functions in the *new* SCC using the manager provided by the old
SCC's proxy. *And* when we manage to mutate a function in this new SCC
in a way that invalidates those analyses. This can be... challenging to
reproduce.

I've managed to contrive a set of functions that trigger this and added
a test case, but it is a bit brittle. I've directly checked that the
passes run in the expected ways to help avoid the test just becoming
silently irrelevant.

This gets the new PM back to passing the LLVM test suite after the PGO
improvements landed.

llvm-svn: 292757
2017-01-22 10:34:01 +00:00
Chandler Carruth d4be9f4b8d [PM] Add some debug logging to the new PM inliner to make it easier to
trace its behavior.

llvm-svn: 292756
2017-01-22 10:33:58 +00:00
Anmol P. Paralkar 910dc8de3f MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info
Summary:
Under option -mergefunc-preserve-debug-info we:
- Do not create a new function for a thunk.
- Retain the debug info for a thunk's parameters (and associated
  instructions for the debug info) from the entry block.
  Note: -debug will display the algorithm at work.
- Create debug-info for the call (to the shared implementation) made by
  a thunk and its return value.
- Erase the rest of the function, retaining the (minimally sized) entry
  block to create a thunk.
- Preserve a thunk's call site to point to the thunk even when both occur
  within the same translation unit, to aid debugability. Note that this
  behaviour differs from the underlying -mergefunc implementation which
  modifies the thunk's call site to point to the shared implementation
  when both occur within the same translation unit.

Reviewers: echristo, eeckstein, dblaikie, aprantl, friss

Reviewed By: aprantl

Subscribers: davide, fhahn, jfb, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D28075

llvm-svn: 292702
2017-01-21 02:02:56 +00:00
Peter Collingbourne b365d921cf LowerTypeTests: Fix use-after-free. Found by asan/msan.
llvm-svn: 292700
2017-01-21 01:57:44 +00:00
Peter Collingbourne 67addbcacf LowerTypeTests: Simplify; always create SizeM1 with type IntPtrTy, move initialization out of if statement.
llvm-svn: 292674
2017-01-20 23:22:28 +00:00
Dehao Chen 77079003dd Add indirect call promotion to SamplePGO
Summary: This patch adds metadata for indirect call promotion in the sample profile loader.

Reviewers: xur, davidxl, dnovillo

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28923

llvm-svn: 292672
2017-01-20 22:56:07 +00:00
Easwaran Raman 12585b0148 Improve PGO support for the new inliner
This adds the following to the new PM based inliner in PGO mode:

* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.

* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.

* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.

Differential revision: https://reviews.llvm.org/D28331

llvm-svn: 292666
2017-01-20 22:44:04 +00:00
Peter Collingbourne e02b74e294 IPO, LTO: Plumb the summary from the LTO API into the pass manager.
Differential Revision: https://reviews.llvm.org/D28840

llvm-svn: 292661
2017-01-20 22:18:52 +00:00
Teresa Johnson 4566c6db87 [ThinLTO] Drop non-prevailing non-ODR weak to declarations
Summary:
Allow non-ODR weak/linkonce non-prevailing copies to be marked
as available_externally in the index. Add support for dropping these to
declarations in the backend.

Reviewers: mehdi_amini, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28806

llvm-svn: 292656
2017-01-20 21:54:58 +00:00
Peter Collingbourne f04a390099 LowerTypeTests: Implement importing of type identifiers.
To import a type identifier we read the summary and create external
references to the symbols defined when exporting.

Differential Revision: https://reviews.llvm.org/D28546

llvm-svn: 292654
2017-01-20 21:49:34 +00:00
Peter Collingbourne ee1416037e LowerTypeTests: Compute SizeM1BitWidth in exportTypeId. NFCI.
This avoids needing to store it in a separate field in TypeIdLowering.

llvm-svn: 292647
2017-01-20 20:57:40 +00:00
Dehao Chen 94f369fc7f clang-format SampleProfile.cpp (NFC)
llvm-svn: 292533
2017-01-19 23:20:31 +00:00
Peter Collingbourne 22d9d3cdce LowerTypeTests: Implement exporting of type identifiers.
Type identifiers are exported by:
- Adding coarse-grained information about how to test the type
  identifier to the summary.
- Creating symbols in the object file (aliases and absolute symbols)
  containing fine-grained information about the type identifier.

Differential Revision: https://reviews.llvm.org/D28424

llvm-svn: 292462
2017-01-19 01:20:11 +00:00
Peter Collingbourne 20a00933fb ThinLTOBitcodeWriter: Clear comdats on filtered globals.
Differential Revision: https://reviews.llvm.org/D28839

llvm-svn: 292431
2017-01-18 20:03:02 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Teresa Johnson 83aaf358fd [ThinLTO] Import static functions from the same module as caller
Summary:
We can sometimes end up with multiple copies of a local function that
have the same GUID in the index. This happens when there are local
functions with the same name that are in different source files with the
same name (but in different directories), and they were compiled in
their own directory so had the same path at compile time.

In this case make sure we import the copy in the caller's module. While
it isn't a correctness problem (the renamed reference which is based on the
module IR hash will be unique since the module must have had an
externally visible function that was imported), importing the wrong copy
will result in lost performance opportunity since it won't be referenced
and inlined.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28440

llvm-svn: 291841
2017-01-12 22:04:45 +00:00
Peter Collingbourne 7636532c1b LowerTypeTests: Represent the memory region size with the constant size-1.
This means that we can use a shorter instruction sequence in the case where
the size is a power of two and on the boundary between two representations.

Differential Revision: https://reviews.llvm.org/D28421

llvm-svn: 291706
2017-01-11 21:32:10 +00:00
Peter Collingbourne 6bca5a0d82 Re-apply r291205, "LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.", with a fix for an off-by-one error.
llvm-svn: 291699
2017-01-11 20:28:46 +00:00
Ivan Krasin 42e6b4fd98 Revert rL291205 because it breaks Chrome tests under CFI.
Summary:
Revert LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.

This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.

Original URL: https://reviews.llvm.org/D28341

Reviewers: pcc

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D28532

llvm-svn: 291684
2017-01-11 16:54:04 +00:00
Peter Collingbourne d79e49d807 LowerTypeTests: Thread summary and action from the API and command line into the pass.
Also move command line handling out of the pass constructor and into
a separate function.

Differential Revision: https://reviews.llvm.org/D28422

llvm-svn: 291323
2017-01-07 01:17:24 +00:00
Peter Collingbourne 81271b7bd2 LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.

Differential Revision: https://reviews.llvm.org/D28341

llvm-svn: 291205
2017-01-06 02:22:47 +00:00
Teresa Johnson 6c475a7595 ThinLTO: add early "dead-stripping" on the Index
Summary:
Using the linker-supplied list of "preserved" symbols, we can compute
the list of "dead" symbols, i.e. the one that are not reachable from
a "preserved" symbol transitively on the reference graph.
Right now we are using this information to mark these functions as
non-eligible for import.

The impact is two folds:
- Reduction of compile time: we don't import these functions anywhere
  or import the function these symbols are calling.
- The limited number of import/export leads to better internalization.

Patch originally by Mehdi Amini.

Reviewers: mehdi_amini, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23488

llvm-svn: 291177
2017-01-05 21:34:18 +00:00
Teresa Johnson 519465b993 [ThinLTO] Subsume all importing checks into a single flag
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28169

llvm-svn: 291108
2017-01-05 14:32:16 +00:00
Peter Collingbourne b2ce2b6805 IR: Module summary representation for type identifiers; summary test scaffolding for lowertypetests.
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).

Differential Revision: https://reviews.llvm.org/D28041

llvm-svn: 291069
2017-01-05 03:39:00 +00:00
Mehdi Amini 19ef4fad91 Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.

Two main current limitations are:

1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.

These two limitations can be lifted in a subsequent improvement if
needed.

With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).

Reviewers: pcc, tejohnson, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28113

llvm-svn: 291027
2017-01-04 22:54:33 +00:00
Davide Italiano b672537cbf [PMBuilder] Remove RunFloat2Int cl::opt.
The pass has been on by default for a long time without problems.

llvm-svn: 290814
2017-01-02 17:49:18 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00