Commit Graph

3315 Commits

Author SHA1 Message Date
Chad Rosier 4cb2e82774 [PartialInlining] Type. NFC.
llvm-svn: 311699
2017-08-24 20:29:02 +00:00
Peter Collingbourne 001052a067 WholeProgramDevirt: Create bitcast to i8* at each virtual call site.
We can't reuse the llvm.assume instruction's bitcast because it may not
dominate every user of the vtable pointer.

Differential Revision: https://reviews.llvm.org/D36994

llvm-svn: 311491
2017-08-22 21:41:19 +00:00
Haicheng Wu 0812c5bea3 [InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes.
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.

I took over this work from Chad Rosier (mcrosier@codeaurora.org).

Differential Revision: https://reviews.llvm.org/D35850

llvm-svn: 311371
2017-08-21 20:00:09 +00:00
Sam Elliott e963c89d11 Migrate WholeProgramDevirt to new Optimization Remark API
Summary:
This is an attempt to move WholeProgramDevirt to the new remark API.

https://bugs.llvm.org/show_bug.cgi?id=33793

Reviewers: anemet

Reviewed By: anemet

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D36943

llvm-svn: 311352
2017-08-21 16:57:21 +00:00
Sam Elliott e604b563ea Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311349
2017-08-21 16:45:47 +00:00
Sam Elliott 7fe0aaa140 Revert "Emit only A Single Opt Remark When Inlining"
Reverting due to clang build failure

llvm-svn: 311274
2017-08-20 06:55:10 +00:00
Sam Elliott 785dd75369 Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311273
2017-08-20 06:43:34 +00:00
Teresa Johnson 73305f82e9 [ThinLTO] Fix ThinLTO crash
Summary:
Follow up to fix in r311023, which fixed the case where the combined
index is written to disk. The same samplePGO logic exists for the
in-memory index when computing imports, so we need to filter out
GlobalVariable summaries there too.

Reviewers: davidxl

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D36919

llvm-svn: 311254
2017-08-19 18:04:25 +00:00
Andrew Kaylor 53a5fbb45f Add strictfp attribute to prevent unwanted optimizations of libm calls
Differential Revision: https://reviews.llvm.org/D34163

llvm-svn: 310885
2017-08-14 21:15:13 +00:00
Eli Friedman 51cf2604b6 [OptDiag] Updating Remarks in SampleProfile
Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.

Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.

Patch by Tarun Rajendran.

Differential Revision: https://reviews.llvm.org/D36127

llvm-svn: 310763
2017-08-11 21:12:04 +00:00
Xinliang David Li 24524f314c Fix typo /NFC
llvm-svn: 310737
2017-08-11 17:49:20 +00:00
Davide Italiano c163fac184 [GlobalOpt] Switch an explicit loop to llvm::all_of(). NFCI.
llvm-svn: 310453
2017-08-09 09:23:29 +00:00
Reid Kleckner da748f1c3d [ArgPromotion] Preserve alignment of byval argument in new alloca
The frontend may have requested a higher alignment for any reason, and
downstream optimizations may already have taken advantage of it.  We
should keep the same alignment when moving the allocation from the
parameter area to the local variable area.

Fixes PR34038

llvm-svn: 310071
2017-08-04 17:09:11 +00:00
Victor Leschuk 56b03d0dd6 Un-revert r310014: false revert, it wasn't the cause of build break
llvm-svn: 310021
2017-08-04 04:51:15 +00:00
Victor Leschuk 21713ebfb1 Revert r310014 as it breaks build lld-x86_64-darwin13
llvm-svn: 310020
2017-08-04 04:43:54 +00:00
Adrian Prantl fd8c8e9fe6 Teach GlobalSRA to update the debug info for split-up globals.
This is similar to what we are doing in "regular" SROA and creates
DW_OP_LLVM_fragment operations to describe the resulting variables.

rdar://problem/33654891

llvm-svn: 310014
2017-08-04 01:19:54 +00:00
Chandler Carruth 95055d8f8b [PM] Fix a bug where through CGSCC iteration we can get
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.

This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.

The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).

I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.

A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.

The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.

I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.

We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.

Differential Revision: https://reviews.llvm.org/D36188

llvm-svn: 309784
2017-08-02 02:09:22 +00:00
Peter Collingbourne bcd204b478 Update phi nodes in LowerTypeTests control flow simplification
D33925 added a control flow simplification for -O2 --lto-O0 builds that
manually splits blocks and reassigns conditional branches but does not
correctly update phi nodes. If the else case being branched to had
incoming phi nodes the control-flow simplification would leave phi nodes
in that BB with an unhandled predecessor.

Patch by Vlad Tsyrklevich!

Differential Revision: https://reviews.llvm.org/D36012

llvm-svn: 309621
2017-07-31 20:43:07 +00:00
Florian Hahn 6b3216aad8 Guard print() functions only used by dump() functions.
Summary:
Since  r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.

Reviewers: MatzeB

Reviewed By: MatzeB

Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D35949

llvm-svn: 309553
2017-07-31 10:07:49 +00:00
Dehao Chen 8260d66556 Increase the ImportHotMultiplier to 10.0
Summary: The original 3.0 hot mupltiplier is too small, and would prevent hot callsites from being inline. This patch increases the hot multilier to 10.0

Reviewers: davidxl, tejohnson

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D35969

llvm-svn: 309344
2017-07-28 01:02:34 +00:00
whitequark 9e8197ac8f [MergeFunctions] Remove alias support.
The alias support was dead code since 2011. It was last touched
in r124182, where it was reintroduced after being removed
in r110434, and since then it was gated behind a HasGlobalAliases
flag that was permanently stuck as `false`.

It is also broken. I'm not sure if it bitrotted or was just broken
in the first place because it appears to have never been tested,
but the following IR results in a crash:

    define internal i32 @a(i32 %a, i32 %b) unnamed_addr {
      %c = add i32 %a, %b
      %d = xor i32 %a, %c
      ret i32 %c
    }

    define internal i32 @b(i32 %a, i32 %b) unnamed_addr {
      %c = add i32 %a, %b
      %d = xor i32 %a, %c
      ret i32 %c
    }

It seems safe to remove buggy untested code that no one cared about
for seven years.

Differential Revision: https://reviews.llvm.org/D34802

llvm-svn: 309313
2017-07-27 19:36:13 +00:00
Davide Italiano 82c7d3768d [FunctionImport] Prefer isa<> to dyn_cast<> as the value is not used.
This change makes GCC7 happy again.

llvm-svn: 309305
2017-07-27 18:38:09 +00:00
David Blaikie 2f0cc477ab ThinLTO: Don't import aliases of any kind (even linkonce_odr)
Summary:
Until a more advanced version of importing can be implemented for
aliases (one that imports an alias as an available_externally definition
of the aliasee), skip the narrow subset of cases that was possible but
came at a cost: aliases of linkonce_odr functions could be imported
because the linkonce_odr function could be safely duplicated from the
source module. This came/comes at the cost of not being able to 'home'
imported linkonce functions (they had to be emitted linkonce_odr in all
the destination modules (even if they weren't used by an alias) rather
than as available_externally - causing extra object size).

Tangentially, this also was the only reason ThinLTO would emit multiple
CUs in to the resulting DWARF - which happens to be a problem for
Fission (there's a fix for this in GDB but not released yet, etc).
(actually it's not the only reason - but I'm sending a patch to fix the
other reason shortly)

There's no reason to believe this particularly narrow alias importing
was especially/meaningfully important, only that it was /possible/ to
implement in this way. When a more general solution is done, it should
still satisfy the DWARF concerns above, since the import will still be
available_externally, and thus not create extra CUs.

Since now all aliases are treated the same, I removed/simplified some
test cases since they were testing corner cases where there are no
longer any corners.

Reviewers: tejohnson, mehdi_amini

Differential Revision: https://reviews.llvm.org/D35875

llvm-svn: 309278
2017-07-27 15:09:06 +00:00
Haojie Wang 1dec57d5b0 ThinLTO Minimized Bitcode File Size Reduction
Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.

Reviewers: danielcdh, tejohnson, pcc

Reviewed By: pcc

Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D35334

llvm-svn: 308760
2017-07-21 17:25:20 +00:00
Peter Collingbourne 6f6788b99c LowerTypeTests: Drop function type metadata only if we're going to replace it.
Previously we were (mis)handling jump table members with a prevailing
definition in a full LTO module and a non-prevailing definition in a
ThinLTO module by dropping type metadata on those functions entirely,
which would cause type tests involving such functions to fail.

This patch causes us to drop metadata only if we are about to replace
it with metadata from cfi.functions.

We also want to replace metadata for available_externally functions,
which can arise in the opposite scenario (prevailing ThinLTO
definition, non-prevailing full LTO definition). The simplest way
to handle that is to remove the definition; there's little value in
keeping it around at this point (i.e. after most optimization passes
have already run) and later code will try to use the function's linkage
to create an alias, which would result in invalid IR if the function
is available_externally.

Fixes PR33832.

Differential Revision: https://reviews.llvm.org/D35604

llvm-svn: 308642
2017-07-20 18:02:05 +00:00
Peter Collingbourne 93fdaca5ac ThinLTOBitcodeWriter: Do not rewrite intrinsic functions when splitting modules.
Changing the type of an intrinsic may invalidate the IR.

Differential Revision: https://reviews.llvm.org/D35593

llvm-svn: 308500
2017-07-19 17:54:29 +00:00
Chandler Carruth 06a86301a1 [PM/LCG] Follow-up fix to r308088 to handle deletion of library
functions.

In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.

While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.

We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.

This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.

llvm-svn: 308417
2017-07-19 04:12:25 +00:00
Teresa Johnson f9dc3deaa6 Revert "Restore with fix "[ThinLTO] Ensure we always select the same function copy to import""
This reverts commit r308114 (and follow on fixes to test).

There is a linking failure in a ThinLTO bot:
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/

(and undefined reference). It seems like it must be a second order
effect of the heuristic change I made, and may take some time to try
to reproduce locally and track down. Therefore, reverting for now.

llvm-svn: 308206
2017-07-17 19:25:38 +00:00
Teresa Johnson a7660b0127 Restore with fix "[ThinLTO] Ensure we always select the same function copy to import"
This restores r308078/r308079 with a fix for bot non-determinisim (make
sure we run llvm-lto in single threaded mode so the debug output doesn't get
interleaved).

llvm-svn: 308114
2017-07-15 22:58:06 +00:00
Chandler Carruth d78a38ed2e Revert r308078 (and subsequent tweak in r308079) which introduces a test
that appears to exhibit non-determinism and is flaking on the bots
pretty consistently.

r308078: [ThinLTO] Ensure we always select the same function copy to import
r308079: Require asserts in new test that uses debug flag
llvm-svn: 308095
2017-07-15 13:50:26 +00:00
Teresa Johnson 82b4fb1afe [ThinLTO] Ensure we always select the same function copy to import
Summary:
Check if the first eligible callee is under the instruction threshold.
Checking this on the first eligible callee ensures that we don't end
up selecting different callees to import when we invoke this routine
with different thresholds due to reaching the callee via paths that
are shallower or hotter (when there are multiple copies, i.e. with
weak or linkonce linkage). We don't want to leave the decision of which
copy to import up to the backend.

Reviewers: mehdi_amini

Subscribers: inglorion, fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D35436

llvm-svn: 308078
2017-07-15 04:53:05 +00:00
Jakub Kuderski b292c22c8d [Dominators] Make IsPostDominator a template parameter
Summary:
DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere.

This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction.

Reviewers: dberlin, sanjoy, davide, grosser

Reviewed By: dberlin

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D35315

llvm-svn: 308040
2017-07-14 18:26:09 +00:00
Davide Italiano c3dc055780 Reapply [GlobalOpt] Remove unreachable blocks before optimizing a function.
This commit reapplies r307215 now that we found out and fixed
the cause of the cfi test failure (in r307871).

llvm-svn: 307920
2017-07-13 15:40:59 +00:00
Peter Collingbourne cacac6a104 LowerTypeTests: When importing functions skip definitions where the summary contains a decl.
This normally indicates mixed CFI + non-CFI compilation, and will
result in us treating the function in the same way as a function
defined outside of the LTO unit.

Part of PR33752.

Differential Revision: https://reviews.llvm.org/D35281

llvm-svn: 307744
2017-07-12 00:39:12 +00:00
Davide Italiano b8ad3eebca [IPO] Temporarily rollback r307215.
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.

llvm-svn: 307729
2017-07-11 23:10:17 +00:00
Konstantin Zhuravlyov bb80d3e1d3 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723

llvm-svn: 307722
2017-07-11 22:23:00 +00:00
Chandler Carruth 01f0c8a8c4 [PM/ThinLTO] Fix PR33536, a bug where the ThinLTO bitcode writer was
querying for analysis results on a function declaration rather than
a definition.

The only reason this worked previously is by chance -- because the way
we got alias analysis results with the legacy PM, we happened to not
compute a dominator tree and so we happened to not hit an assert even
though it didn't make any real sense. Now we bail out before trying to
compute alias analysis so that we don't hit these asserts.

llvm-svn: 307625
2017-07-11 05:39:20 +00:00
Mikael Holmen e0ced14449 [ArgumentPromotion] Change use of removed argument in llvm.dbg.value to undef
Summary:
This solves PR33641.

When removing a dead argument we must also handle possibly existing calls
to llvm.dbg.value that use the removed argument. Now we change the use
of the otherwise dead argument to an undef for some other pass to cleanup
later.

If the calls are left untouched, they will later on cause errors:
 "function-local metadata used in wrong function"
since the ArgumentPromotion rewrites the code by creating a new function
with the wanted signature, but the metadata is not recreated so the new
function may then erroneously use metadata from the old function.

Reviewers: mstorsjo, rnk, arsenm

Reviewed By: rnk

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D34874

llvm-svn: 307521
2017-07-10 06:07:24 +00:00
Chandler Carruth bd9c29039e [PM] Finish implementing and fix a chain of bugs uncovered by testing
the invalidation propagation logic from an SCC to a Function.

I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.

However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.

Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.

Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.

llvm-svn: 307487
2017-07-09 03:59:31 +00:00
Dehao Chen 64c46574b0 Increase the import-threshold for crtical functions.
Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D35096

llvm-svn: 307439
2017-07-07 21:01:00 +00:00
Davide Italiano f4891d29f8 [lib/LTO] Add a comment to explain where we set the linkage in the summary.
Pointed out by Teresa!

llvm-svn: 307305
2017-07-06 20:04:20 +00:00
Davide Italiano 6a5fbe52fa [LTO] Fix the interaction between linker redefined symbols and ThinLTO
This is the same as r304719 but for ThinLTO.
The substantial difference is that in this case we don't have
whole visibility, just the summary.
In the LTO case, when we got the resolution for the input file we
could just see if the linker told us whether a symbol was linker
redefined (using --wrap or --defsym) and switch the linkage directly
for the GV.

Here, we have the summary. So, we record that the linkage changed
from <whatever it was> to $weakany to prevent IPOs across this symbol
boundaries and actually just switch the linkage at FunctionImport time.

This patch should also fixes the lld bits (as all the scaffolding for
communicating if a symbol is linker redefined should be there & should
be the same), but I'll make sure to add some tests there as well.

Fixes PR33192.

Differential Revision:  https://reviews.llvm.org/D35064

llvm-svn: 307303
2017-07-06 19:58:26 +00:00
Frederich Munch 52dfcd18d1 Avoid constructing GlobalExtensions only to find out it is empty.
Summary:
GlobalExtensions is dereferenced twice, once for iteration and then a check if it is empty.
As a ManagedStatic this dereference forces it's construction which is unnecessary.

Reviewers: efriedma, davide, mehdi_amini

Reviewed By: mehdi_amini

Subscribers: chapuni, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D33381

llvm-svn: 307229
2017-07-06 00:09:09 +00:00
Davide Italiano 7dd0694f96 [GlobalOpt] Remove unreachable blocks before optimizing a function.
LLVM's definition of dominance allows instructions that are cyclic
in unreachable blocks, e.g.:

  %pat = select i1 %condition, @global, i16* %pat

because any instruction dominates an instruction in a block that's
not reachable from entry.
So, remove unreachable blocks from the function, because a) there's
no point in analyzing them and b) GlobalOpt should otherwise grow
some more complicated logic to break these cycles.

Differential Revision:  https://reviews.llvm.org/D35028

llvm-svn: 307215
2017-07-05 22:28:28 +00:00
Chandler Carruth 3545a9e1f9 Remove the BBVectorize pass.
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html

Time to actually let go and move forward. =]

I've updated the release notes both about the removal and the
deprecation of the corresponding C API.

llvm-svn: 306797
2017-06-30 07:09:08 +00:00
Dehao Chen 2f31d0d86e Hook the sample PGO machinery in the new PM
Summary: This patch hooks up SampleProfileLoaderPass with the new PM.

Reviewers: chandlerc, davidxl, davide, tejohnson

Reviewed By: chandlerc, tejohnson

Subscribers: tejohnson, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D34720

llvm-svn: 306763
2017-06-29 23:33:05 +00:00
Peter Collingbourne 92648c25a4 Bitcode: Write the irsymtab to disk.
Differential Revision: https://reviews.llvm.org/D33973

llvm-svn: 306487
2017-06-27 23:50:11 +00:00
Geoff Berry 2573a19fe6 [EarlyCSE][MemorySSA] Enable MemorySSA in function-simplification pass of EarlyCSE.
llvm-svn: 306477
2017-06-27 22:25:02 +00:00
Dehao Chen 66131665c4 Enable ICP for AutoFDO.
Summary: AutoFDO should have ICP enabled.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: sanjoy, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D34662

llvm-svn: 306429
2017-06-27 17:23:33 +00:00
Xinliang David Li b67530e9b9 [PGO] Implementate profile counter regiser promotion
Differential Revision: http://reviews.llvm.org/D34085

llvm-svn: 306231
2017-06-25 00:26:43 +00:00
Eric Christopher 5a7c2f1700 Remove the LoadCombine pass. It was never enabled and is unsupported.
Based on discussions with the author on mailing lists.

llvm-svn: 306067
2017-06-22 22:58:12 +00:00
Dehao Chen 50f2aa19e8 Do not inline recursive direct calls in sample loader pass.
Summary: r305009 disables recursive inlining for indirect calls in sample loader pass. The same logic applies to direct recursive calls.

Reviewers: iteratee, davidxl

Reviewed By: iteratee

Subscribers: sanjoy, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D34456

llvm-svn: 305934
2017-06-21 17:57:43 +00:00
Xinliang David Li c3f8e83253 Fix function name /NFC
llvm-svn: 305564
2017-06-16 16:54:13 +00:00
Evgeniy Stepanov 4d4ee93d25 [cfi] CFI-ICall for ThinLTO.
Implement ControlFlowIntegrity for indirect function calls in ThinLTO.
Design follows the RFC in llvm-dev, see
https://groups.google.com/d/msg/llvm-dev/MgUlaphu4Qc/kywu0AqjAQAJ

llvm-svn: 305533
2017-06-16 00:18:29 +00:00
Xinliang David Li eea0ade2eb [PartialInlining] Code Refactoring
This is a NFC code refactoring and interface cleanup. This paves the
way to enable outlining-only mode for the partial inliner.

llvm-svn: 305530
2017-06-15 23:56:59 +00:00
Frederich Munch 6391c7e2a1 Revert r305313 & r305303, self-hosting build-bot isn’t liking it.
llvm-svn: 305318
2017-06-13 19:05:24 +00:00
Frederich Munch 4c73b40dca Force RegisterStandardPasses to construct std::function in the IPO library.
Summary: Fixes an issue using RegisterStandardPasses from a statically linked object before PassManagerBuilder::addGlobalExtension is called from a dynamic library.

Reviewers: efriedma, theraven

Reviewed By: efriedma

Subscribers: mehdi_amini, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D33515

llvm-svn: 305303
2017-06-13 16:48:41 +00:00
David Blaikie 6d0f39476a Inliner: Avoid calling shouldInline until it's absolutely necessary
This restores the order of evaluation (& conditionalized evaluation) of
isTriviallyDeadInstruction, InlineHistoryIncludes, and shouldInline
(with the addition of a shouldInline call after
isTriviallyDeadInstruction) from before r305245.

llvm-svn: 305267
2017-06-13 02:24:09 +00:00
David Blaikie ae8c4af4ac Inliner: Don't remove calls to readnone+nounwind (but not always_inline) functions in the AlwaysInliner
llvm-svn: 305245
2017-06-12 23:01:17 +00:00
Geoff Berry 3cca1da20c [EarlyCSE] Add option to use MemorySSA for function simplification run of EarlyCSE (off by default).
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline.  We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.

This is turned off by default.  A follow-up change will turn it on to
allow for easier reversion in case it breaks something.

llvm-svn: 305146
2017-06-10 15:20:03 +00:00
David Blaikie cb9327b02d Inliner: Don't touch indirect calls
Other comments/implications are that this isn't intended behavior (nor
perserved/reimplemented in the new inliner) & complicates fixing the
'inlining' of trivially dead calls without consulting the cost function
first.

llvm-svn: 305052
2017-06-09 03:29:20 +00:00
Evgeniy Stepanov d02dbf6b1c [CFI] Remove LinkerSubsectionsViaSymbols.
Since D17854 LinkerSubsectionsViaSymbols is unnecessary.

It is interfering with ThinLTO implementation of CFI-ICall, where
the aliases used on the !LinkerSubsectionsViaSymbols branch are
needed to export jump tables to ThinLTO backends.

This is the second attempt to land this change after fixing PR33316.

llvm-svn: 305031
2017-06-08 23:38:22 +00:00
Craig Topper 2aa4d39f5e [ExtractGV] Fix the doxygen comment on the constructor and the class to refer to global values instead of functions. While there fix an 80 column violation. NFC
llvm-svn: 305030
2017-06-08 23:38:19 +00:00
Peter Collingbourne e357fbd243 Write summaries for merged modules when splitting modules for ThinLTO.
This is to prepare to allow for dead stripping of globals in the
merged modules.

Differential Revision: https://reviews.llvm.org/D33921

llvm-svn: 305027
2017-06-08 23:01:49 +00:00
Dehao Chen e2a428bad7 Do not early-inline recursive calls in sample profile loader.
Summary: Early-inlining of recursive call makes the code size bloat exponentially. We should not disable it.

Reviewers: davidxl, dnovillo, iteratee

Reviewed By: iteratee

Subscribers: iteratee, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D34017

llvm-svn: 305009
2017-06-08 20:11:57 +00:00
Peter Collingbourne aaae7eed5c LowerTypeTests: Generate simpler IR for br(llvm.type.test, then, else).
This makes it so that the code quality for CFI checks when compiling
with -O2 and linking with --lto-O0 is similar to that of the rest of
the code.

Reduces the size of a chrome binary built with -O2/--lto-O0 by
about 750KB.

Differential Revision: https://reviews.llvm.org/D33925

llvm-svn: 304921
2017-06-07 15:49:14 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Evgeniy Stepanov 704003ea3d Revert "[CFI] Remove LinkerSubsectionsViaSymbols."
This reverts commit r304582: breaks cfi-devirt :: anon-namespace.cpp on Darwin.

llvm-svn: 304626
2017-06-03 00:46:27 +00:00
Xinliang David Li 5fdc75aea1 Fix debug build test failure
llvm-svn: 304600
2017-06-02 22:38:48 +00:00
Xinliang David Li 0b7d858fa3 [PartialInlining] Minor cost anaysis tuning
Also added a test option and 2 cost analysis related tests.

llvm-svn: 304599
2017-06-02 22:08:04 +00:00
David Blaikie 6aeacaa527 FunctionAttrs: Skip it if the effective SCC (ignoring optnone functions) is empty
Minor optimization but mostly simplifies my debugging so I'm not dealing
with empty SCCNodeSets while investigating issues in this optimization.

llvm-svn: 304597
2017-06-02 21:24:17 +00:00
Evgeniy Stepanov 63f056327d [CFI] Remove LinkerSubsectionsViaSymbols.
Since D17854 LinkerSubsectionsViaSymbols is unnecessary.

It is interfering with ThinLTO implementation of CFI-ICall, where
the aliases used on the !LinkerSubsectionsViaSymbols branch are
needed to export jump tables to ThinLTO backends.

llvm-svn: 304582
2017-06-02 18:45:14 +00:00
Evgeniy Stepanov b933ad3a77 Skip CFI for dead functions.
Differential Revision: https://reviews.llvm.org/D33805

llvm-svn: 304578
2017-06-02 18:24:23 +00:00
Davide Italiano 1dd5558e52 [PM] GVNSink is off by default, fix an obvious typo.
llvm-svn: 304497
2017-06-01 23:47:53 +00:00
Evgeniy Stepanov 56584bbf16 (NFC) Track global summary liveness in GVFlags.
Replace GVFlags::LiveRoot with GVFlags::Live and use that instead of
all the DeadSymbols sets. This is refactoring in order to make
liveness information available in the RegularLTO pipeline.

llvm-svn: 304466
2017-06-01 20:30:06 +00:00
Tim Shen 6b41141863 [ThinLTO] Migrate ThinLTOBitcodeWriter to the new PM.
Summary: Also see D33429 for other ThinLTO + New PM related changes.

Reviewers: davide, chandlerc, tejohnson

Subscribers: mehdi_amini, Prazek, cfe-commits, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D33525

llvm-svn: 304378
2017-06-01 01:02:12 +00:00
Xinliang David Li 32c5e809be [PartialInlining] Reduce outlining overhead by removing unneeded live-out(s)
Differential Revision: http://reviews.llvm.org/D33694

llvm-svn: 304375
2017-06-01 00:12:41 +00:00
Vitaly Buka a637489ef1 [PartialInlining] Replace delete with unique_ptr in computeCallsiteToProfCountMap
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D33220

llvm-svn: 304064
2017-05-27 05:32:09 +00:00
Peter Collingbourne 7730b24448 PMB: Run the whole-program-devirt pass during LTO at --lto-O0.
The whole-program-devirt pass needs to run at -O0 because only it
knows about the llvm.type.checked.load intrinsic: it needs to both
lower the intrinsic itself and handle it in the summary.

Differential Revision: https://reviews.llvm.org/D33571

llvm-svn: 304019
2017-05-26 18:27:13 +00:00
James Molloy a929063233 [GVNSink] GVNSink pass
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.

This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:

[ %a1 = add i32 %b, 1  ]   [ %c1 = add i32 %d, 1  ]
[ %a2 = xor i32 %a1, 1 ]   [ %c2 = xor i32 %c1, 1 ]
                 \           /
           [ %e = phi i32 %a2, %c2 ]
           [ add i32 %e, 4         ]

GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.

What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.

This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.

Differential revision: https://reviews.llvm.org/D24805

llvm-svn: 303850
2017-05-25 12:51:11 +00:00
Xinliang David Li 126157c3b4 [PartialInlining] Add internal options to enable partial inlining in pass pipeline (off by default)
1. Legacy: -mllvm -enable-partial-inlining
2. New:  -mllvm -enable-npm-partial-inlining -fexperimental-new-pass-manager

Differential Revision: http://reviews.llvm.org/D33382

llvm-svn: 303567
2017-05-22 16:41:57 +00:00
Adrian Prantl 660437975b Revert "ThinLTO: Verify bitcode before lauching the ThinLTOCodeGenerator."
This reverts commit r303438 while deliberating buildbot breakage.

llvm-svn: 303467
2017-05-19 23:32:21 +00:00
Adrian Prantl f9ab9bfc39 ThinLTO: Verify bitcode before lauching the ThinLTOCodeGenerator.
rdar://problem/31233625

Differential Revision: https://reviews.llvm.org/D33151

llvm-svn: 303438
2017-05-19 17:55:02 +00:00
Xinliang David Li 8726d91d29 Fix memory leak
llvm-svn: 303126
2017-05-15 22:43:52 +00:00
Xinliang David Li 392e975693 Fix test failure on windows -- do not return deleted func
llvm-svn: 302999
2017-05-14 02:54:02 +00:00
Xinliang David Li 66bdfca77a [PartialInlining] Profile based cost analysis
Implemented frequency based cost/saving analysis
and related options.

The pass is now in a state ready to be turne on
in the pipeline (in follow up).

Differential Revision: http://reviews.llvm.org/D32783

llvm-svn: 302967
2017-05-12 23:41:43 +00:00
Serge Guelton f4dc59ba8e Remove spurious cast of nullptr. NFC.
Conversion rules allow automatic casting of nullptr to any pointer type.

llvm-svn: 302780
2017-05-11 08:53:00 +00:00
Teresa Johnson 94624aca2a Ensure non-null ProfileSummaryInfo passed to ModuleSummaryIndex builder
This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.

Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.

llvm-svn: 302705
2017-05-10 18:52:16 +00:00
Easwaran Raman f5f9160072 [ProfileSummary] Make getProfileCount a non-static member function.
This change is required because the notion of count is different for
sample profiling and getProfileCount will need to determine the
underlying profile type.

Differential revision: https://reviews.llvm.org/D33012

llvm-svn: 302597
2017-05-09 23:21:10 +00:00
Peter Collingbourne c3d677f9d9 FunctionImport: Simplify function llvm::thinLTOInternalizeModule. NFCI.
llvm-svn: 302595
2017-05-09 22:43:31 +00:00
Davide Italiano aa42a10051 [PartialInlining] Capture by reference rather than by value.
llvm-svn: 302464
2017-05-08 20:44:01 +00:00
Peter Collingbourne 9667b91b13 Re-apply r302108, "IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI."
with a fix for the clang backend.

llvm-svn: 302176
2017-05-04 18:03:25 +00:00
Eric Liu f6039f255e Revert "IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI."
This reverts commit r302108. This causes crash in clang bootstrap with LTO.

Contacted the auther in the original commit.

llvm-svn: 302140
2017-05-04 11:49:39 +00:00
Martin Storsjo e81233d0ed [ArgPromotion] Fix a truncated variable
This fixes a regression since SVN rev 273808 (which was supposed to
not change functionality).

The regression caused miscompilations (noted in the wild when targeting
AArch64) on platforms with 32 bit long.

Differential Revision: https://reviews.llvm.org/D32850

llvm-svn: 302137
2017-05-04 10:54:35 +00:00
Peter Collingbourne 5f85a9deda IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI.
When profiling a no-op incremental link of Chromium I found that the functions
computeImportForFunction and computeDeadSymbols were consuming roughly 10% of
the profile. The goal of this change is to improve the performance of those
functions by changing the map lookups that they were previously doing into
pointer dereferences.

This is achieved by changing the ValueInfo data structure to be a pointer to
an element of the global value map owned by ModuleSummaryIndex, and changing
reference lists in the GlobalValueSummary to hold ValueInfos instead of GUIDs.
This means that a ValueInfo will take a client directly to the summary list
for a given GUID.

Differential Revision: https://reviews.llvm.org/D32471

llvm-svn: 302108
2017-05-04 03:36:16 +00:00
Reid Kleckner a0b45f4bfc [IR] Abstract away ArgNo+1 attribute indexing as much as possible
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
  to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
  sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
  take attribute list indices.  Most of these were only used from
  BuildLibCalls, and doesNotAlias was only used to test or set if the
  return value is malloc-like.

I'm happy to split the patch, but I think they are probably easier to
review when taken together.

This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
  0: func attrs
  1: retattrs
  2...: arg attrs

Reviewers: chandlerc, pete, javed.absar

Subscribers: david2050, llvm-commits

Differential Revision: https://reviews.llvm.org/D32811

llvm-svn: 302060
2017-05-03 18:17:31 +00:00
Xinliang David Li ab8722f80a [PartialInlining] Add more early filtering
This is a follow up to the previous
inline cost patch for quicker filtering.

llvm-svn: 301959
2017-05-02 18:43:21 +00:00
Xinliang David Li 6133846be1 [PartialInlining] Hook up inline cost analysis
Differential Revision: http://reviews.llvm.org/D32666

llvm-svn: 301894
2017-05-02 02:44:14 +00:00
Peter Collingbourne a992f53099 IPO: Add missing build dep.
llvm-svn: 301835
2017-05-01 20:57:20 +00:00
Peter Collingbourne c15d60b772 Object: Remove ModuleSummaryIndexObjectFile class.
Differential Revision: https://reviews.llvm.org/D32195

llvm-svn: 301832
2017-05-01 20:42:32 +00:00
Sanjoy Das e6bca0eecb Rename WeakVH to WeakTrackingVH; NFC
This relands r301424.

llvm-svn: 301812
2017-05-01 17:07:49 +00:00
Davide Italiano b6681e2b4e [IPO/MergeFunctions] This function is used only under DEBUG().
llvm-svn: 301672
2017-04-28 19:39:45 +00:00
Reid Kleckner 6652a52e2b Use Argument::hasAttribute and AttributeList::ReturnIndex more
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.

NFC

llvm-svn: 301666
2017-04-28 18:37:16 +00:00
Evgeniy Stepanov 964f4663c4 [asan] Fix dead stripping of globals on Linux.
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.

Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.

This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.

At this moment it does not work on Gold (as in the globals are never
stripped).

This is a second re-land of r298158. This time, this feature is
limited to -fdata-sections builds.

llvm-svn: 301587
2017-04-27 20:27:27 +00:00
Chandler Carruth 1353f9a48b [PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
  below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
  infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
  and instead incrementally updates it.

I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.

My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.

This pass makes two very significant assumptions that enable most of these
improvements:

1) Focus on *full* unswitching -- that is, completely removing whatever
   control flow construct is being unswitched from the loop. In the case
   of trivial unswitching, this means removing the trivial (exiting)
   edge. In non-trivial unswitching, this means removing the branch or
   switch itself. This is in opposition to *partial* unswitching where
   some part of the unswitched control flow remains in the loop. Partial
   unswitching only really applies to switches and to folded branches.
   These are very similar to full unrolling and partial unrolling. The
   full form is an effective canonicalization, the partial form needs
   a complex cost model, cannot be iterated, isn't canonicalizing, and
   should be a separate pass that runs very late (much like unrolling).

2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
   dates from a time when a great deal of LLVM's loop infrastructure was
   missing, ineffective, and/or unreliable. As a consequence, a lot of
   complexity was added which we no longer need.

With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.

Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.

One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.

Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.

Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.

Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.

I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.

Differential Revision: https://reviews.llvm.org/D32409

llvm-svn: 301576
2017-04-27 18:45:20 +00:00
Eli Friedman 10ab923b32 [GlobalOpt] Correctly update metadata when localizing a global.
Just calling dropAllReferences leaves pointers to the ConstantExpr
behind, so we would eventually crash with a null pointer dereference.

Differential Revision: https://reviews.llvm.org/D32551

llvm-svn: 301575
2017-04-27 18:39:08 +00:00
Xinliang David Li d21601a929 [PartialInlining]: Improve partial inlining to handle complex conditions
Differential Revision: http://reviews.llvm.org/D32249

llvm-svn: 301561
2017-04-27 16:34:00 +00:00
Chandler Carruth c246a4c973 Disable GVN Hoist due to still more bugs being found in it. There is
also a discussion about exactly what we should do prior to re-enabling
it.

The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.

llvm-svn: 301505
2017-04-27 00:28:03 +00:00
Sanjoy Das 2cbeb00f38 Reverts commit r301424, r301425 and r301426
Commits were:

"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"

The changes assumed pointers are 8 byte aligned on all architectures.

llvm-svn: 301429
2017-04-26 16:37:05 +00:00
Sanjoy Das 01de557738 Rename WeakVH to WeakTrackingVH; NFC
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.

Reviewers: dblaikie, davide

Reviewed By: davide

Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D32266

llvm-svn: 301424
2017-04-26 16:20:52 +00:00
Filipe Cabecinhas 92dc348773 Simplify the CFG after loop pass cleanup.
Summary:
Otherwise we might end up with some empty basic blocks or
single-entry-single-exit basic blocks.

This fixes PR32085

Reviewers: chandlerc, danielcdh

Subscribers: mehdi_amini, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D30468

llvm-svn: 301395
2017-04-26 12:02:41 +00:00
Davide Italiano 058abf1f61 [PM] Run IndirectCallPromotion only when PGO is enabled.
Differential Revision:  https://reviews.llvm.org/D32465

llvm-svn: 301327
2017-04-25 16:54:45 +00:00
Xinliang David Li db8d09b6c2 [PartialInine]: add triaging options
There are more bugs (runtime failures) triggered when partial
inlining is turned on. Add options to help triaging problems.

llvm-svn: 301148
2017-04-23 23:39:04 +00:00
Xinliang David Li 15744ad87b [PartialInlining] Add optimization remark support
Differential Revision: http://reviews.llvm.org/D32387

llvm-svn: 301143
2017-04-23 21:40:58 +00:00
Davide Italiano 5da7090256 [ThinLTO/Summary] Rename anonymous globals as last action ...
... in the per-TU -O0 pipeline.
The problem is that there could be passes registered using
`addExtensionsToPM()` introducing unnamed globals.
Asan is an example, but there may be others. Building cppcheck
with `-flto=thin` and `-fsanitize=address` triggers an assertion
while we're reading bitcode (in lib/LTO), as the BitcodeReader
assumes there are no unnamed globals (because the namer has run).
Unfortunately I wasn't able to find an easy way to test this.
I added a comment in the hope nobody moves this again.

llvm-svn: 301102
2017-04-23 04:49:34 +00:00
Xinliang David Li 016a82ba51 [PartialInlining] Using existing hasAddressTaken interface to legality check/NFC
llvm-svn: 301090
2017-04-22 19:24:19 +00:00
Xinliang David Li 0e9f6df169 [PartialInliner] Partial inliner needs to check use kind before transformation
Differential Revision: https://reviews.llvm.org/D32373

llvm-svn: 301042
2017-04-21 21:20:56 +00:00
Reid Kleckner f1de9e83c2 [DAE] Simplify attribute list creation, NFC
Removes a use of getSlotAttributes, which I intend to change.

llvm-svn: 300795
2017-04-19 23:45:45 +00:00
Reid Kleckner 0a5ed3d5dc [GlobalOpt] Simplify attribute code stripping nest, NFC
llvm-svn: 300787
2017-04-19 23:26:44 +00:00
Reid Kleckner 9d16fa09c6 Prefer addAttr(Attribute::AttrKind) over the AttributeList overload
This should simplify the call sites, which typically want to tweak one
attribute at a time. It should also avoid creating ephemeral
AttributeLists that live forever.

llvm-svn: 300718
2017-04-19 17:28:52 +00:00
Andrea Di Biagio 517e3fc34c [SampleProfile] Don't assert when printing the DebugLoc of a branch. NFC.
llvm-svn: 300544
2017-04-18 11:27:58 +00:00
Andrea Di Biagio e3edef0977 [SampleProfile] Skip intrinsic calls when visiting callsites in InlineHotFunctions.
Before this patch, we always called method 'findCalleeFunctionSamples()' on
intrinsic calls. However, intrinsic calls like llvm.dbg.value() are not viable
candidates for obvious reasons.

No functional change intended.

Differential Revision: https://reviews.llvm.org/D32008

llvm-svn: 300541
2017-04-18 10:08:53 +00:00
Dehao Chen 1ea8bd8109 Build SymbolMap in SampleProfileLoader to help matchin function names with suffix.
Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.

Reviewers: davidxl, dnovillo, tejohnson

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D31952

llvm-svn: 300507
2017-04-17 22:23:05 +00:00
Peter Collingbourne a0f371a106 Bitcode: Add a string table to the bitcode format.
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.

This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.

On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html

Differential Revision: https://reviews.llvm.org/D31838

llvm-svn: 300464
2017-04-17 17:51:36 +00:00
Reid Kleckner fb502d2f5e [IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.

Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.

llvm-svn: 300367
2017-04-14 20:19:02 +00:00
Davide Italiano 91239088a1 [FunctionImport] assert(false) -> llvm_unreachable(). NFCI.
llvm-svn: 300344
2017-04-14 17:22:02 +00:00
Reid Kleckner f021fab2af [IR] Make getParamAttributes take argument numbers, not ArgNo+1
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.

The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.

NFC

llvm-svn: 300272
2017-04-13 23:12:13 +00:00
Dehao Chen 2c7ca9b5df SamplePGO: convert callsite samples map key from callsite_location to callsite_location+callee_name
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.

Reviewers: davidxl, dnovillo

Reviewed By: davidxl

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31950

llvm-svn: 300240
2017-04-13 19:52:10 +00:00
Reid Kleckner aea2a28098 [DAE] Simplify call site replacement code with CallSite NFC
llvm-svn: 300235
2017-04-13 18:42:03 +00:00
Reid Kleckner 3a1150352d [ArgPromotion] Don't drop !prof metadata on promoted calls
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.

Add a test case for this.

llvm-svn: 300229
2017-04-13 18:10:30 +00:00
Geoff Berry 85a530fb59 Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts commit r296872 now that PR32153 has been fixed.

llvm-svn: 300200
2017-04-13 15:36:25 +00:00
Reid Kleckner 7f72033e1c [IR] Take func, ret, and arg attrs separately in AttributeList::get
This seems like a much more natural API, based on Derek Schuff's
comments on r300015. It further hides the implementation detail of
AttributeList that function attributes come last and appear at index
~0U, which is easy for the user to screw up. git diff says it saves code
as well: 97 insertions(+), 137 deletions(-)

This also makes it easier to change the implementation, which I want to
do next.

llvm-svn: 300153
2017-04-13 00:58:09 +00:00
Bob Haarman 4075ccc717 ThinLTOBitcodeWriter: keep comdats together, rename if leader is renamed
Summary:
COFF requires that every comdat contain a symbol with the same name as
the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this
requirement to be violated. This change avoids such violations by
renaming comdats if their leaders are renamed. It also keeps comdats
together when splitting modules.

Reviewers: pcc, mehdi_amini, tejohnson

Reviewed By: pcc

Subscribers: rnk, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31963

llvm-svn: 300019
2017-04-12 01:43:07 +00:00
Reid Kleckner c2cb560045 [IR] Add AttributeSet to hide AttributeSetNode* again, NFC
Summary:
For now, it just wraps AttributeSetNode*. Eventually, it will hold
AvailableAttrs as an inline bitset, and adding and removing enum
attributes will be super cheap.

This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h.

Reviewers: pete, chandlerc

Subscribers: llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D31940

llvm-svn: 300014
2017-04-12 00:38:00 +00:00
Serge Guelton 59a2d7b909 Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299949
2017-04-11 15:01:18 +00:00
Geoff Berry 9d597adde4 [GVNHoist] Re-enable GVNHoist by default
Turn GVNHoist back on by default now that PR32153 has been fixed.

llvm-svn: 299944
2017-04-11 14:36:30 +00:00
Keno Fischer 30779772cf [StripDeadDebug/DIFinder] Track inlined SPs
Summary:
In rL299692 I improved strip-dead-debug-info's ability to drop CUs that are not
referenced from the current module. However, in doing so I neglected to realize
that some SPs could be referenced entirely from inlined functions. It appears
I was not the only one to make this mistake, because DebugInfoFinder, doesn't
find those SPs either. Fix this in DebugInfoFinder and then use that to make
sure not to drop those CUs in strip-dead-debug-info.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31904

llvm-svn: 299936
2017-04-11 13:32:11 +00:00
Diana Picus b050c7fbe0 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299925 because it broke the buildbots. See e.g.
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008

llvm-svn: 299928
2017-04-11 10:07:12 +00:00
Serge Guelton 5fd75fb72e Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

llvm-svn: 299925
2017-04-11 08:36:52 +00:00
Reid Kleckner eb9dd5b87f Reland "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This re-lands r299875.

I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:

       // Collect any return attributes from the call.
  -    if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
  -      newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
  -                                                  oldAttrs.getRetAttributes()));
  +    newAttrs.push_back(oldAttrs.getRetAttributes());

Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
  AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)

That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.

llvm-svn: 299899
2017-04-10 23:31:05 +00:00
Matt Arsenault 3c1fc768ed Allow DataLayout to specify addrspace for allocas.
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.

The problematic assumptions include:
- That the pointer size used for the stack is the same size as
  the code size pointer, which is also the maximum sized pointer.

- That 0 is an invalid, non-dereferencable pointer value.

These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.

llvm-svn: 299888
2017-04-10 22:27:50 +00:00
Dehao Chen d4a3397861 Emit less compiler optimization remarks in samplepgo to reduce a call to findCalleeFunctionSamples which is going to be refactored.
Summary: Now the SamplePGO support is more stable, we do not need so many verbose optimization remarks emitted.

Reviewers: dnovillo, davidxl

Reviewed By: davidxl

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D31826

llvm-svn: 299883
2017-04-10 20:49:16 +00:00
Evgeniy Stepanov ba7c2e9661 Revert "[asan] Fix dead stripping of globals on Linux."
This reverts commit r299697, which caused a big increase in object file size.

llvm-svn: 299879
2017-04-10 20:36:30 +00:00
Reid Kleckner 211b1f324f Revert "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This reverts r299875. A Linux bot came back with a test failure:
http://bb.pgr.jp/builders/test-clang-i686-linux-RA/builds/741/steps/test_clang/logs/Clang%20%3A%3A%20CodeGen__2006-05-19-SingleEltReturn.c

llvm-svn: 299878
2017-04-10 20:34:19 +00:00
Reid Kleckner 324c99dee5 [IR] Make AttributeSetNode public, avoid temporary AttributeList copies
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.

I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.

My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.

I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.

Reviewers: pete, chandlerc, sanjoy, hfinkel

Reviewed By: pete

Subscribers: jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31198

llvm-svn: 299875
2017-04-10 20:18:10 +00:00
Evgeniy Stepanov 349adbacca [cfi] Take over existing __cfi_check in CrossDSOCFI.
https://reviews.llvm.org/D31796 will emit a dummy __cfi_check in the
frontend.

llvm-svn: 299805
2017-04-07 23:00:20 +00:00
Mehdi Amini db11fdfda5 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299699, the examples needs to be updated.

llvm-svn: 299702
2017-04-06 20:23:57 +00:00
Mehdi Amini 579540a8f7 Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299699
2017-04-06 20:09:31 +00:00
Evgeniy Stepanov 6c3a8cbc4d [asan] Fix dead stripping of globals on Linux.
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.

Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.

This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.

At this moment it does not work on Gold (as in the globals are never
stripped).

This is a re-land of r298158 rebased on D31358. This time,
asan.module_ctor is put in a comdat as well to avoid quadratic
behavior in Gold.

llvm-svn: 299697
2017-04-06 19:55:17 +00:00
Keno Fischer bacc64b5fa [StripDeadDebugInfo] Drop dead CUs entirely
Summary:
Prior to this while it would delete the dead DIGlobalVariables, it would
leave dead DICompileUnits and everything referenced therefrom. For a bit
bitcode file with thousands of compile units those dead nodes easily
outnumbered the real ones. Clean that up.

Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D31720

llvm-svn: 299692
2017-04-06 19:26:22 +00:00
Bob Haarman 6de8134784 ThinLTOBitcodeWriter: handle aliases first in filterModule
Summary: This change fixes a "local linkage requires default visibility" assert when attempting to build LLVM with ThinLTO on Windows.

Reviewers: pcc, tejohnson, mehdi_amini

Reviewed By: pcc

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D31632

llvm-svn: 299491
2017-04-05 00:42:07 +00:00
Rong Xu 48596b6f7a [PGO] Memory intrinsic calls optimization based on profiled size
This patch optimizes two memory intrinsic operations: memset and memcpy based
on the profiled size of the operation. The high level transformation is like:
  mem_op(..., size)
  ==>
  switch (size) {
    case s1:
       mem_op(..., s1);
       goto merge_bb;
    case s2:
       mem_op(..., s2);
       goto merge_bb;
    ...
    default:
       mem_op(..., size);
       goto merge_bb;
    }
  merge_bb:

Differential Revision: http://reviews.llvm.org/D28966

llvm-svn: 299446
2017-04-04 16:42:20 +00:00
Peter Collingbourne 6b193966ac ThinLTOBitcodeWriter: Use Module::global_values(). NFCI.
llvm-svn: 299132
2017-03-30 23:43:08 +00:00
Joerg Sonnenberger fa7367428a Split the SimplifyCFG pass into two variants.
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.

The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.

Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.

Differential Revision: https://reviews.llvm.org/D30333

llvm-svn: 298799
2017-03-26 06:44:08 +00:00
Dehao Chen 722e94061b Set the prof weight correctly for call instructions in DeadArgumentElimination.
Summary: In DeadArgumentElimination, the call instructions will be replaced. We also need to set the prof weights so that function inlining can find the correct profile.

Reviewers: eraman

Reviewed By: eraman

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31143

llvm-svn: 298660
2017-03-23 23:26:00 +00:00
Dehao Chen 8c88671985 Disable loop unrolling and icp in SamplePGO ThinLTO compile phase
Summary:
loop unrolling and icp will make the sample profile annotation much harder in the backend. So disable these 2 optimization in the ThinLTO compile phase.
Will add a test in cfe in a separate patch.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: mehdi_amini, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D31217

llvm-svn: 298646
2017-03-23 21:20:05 +00:00
Teresa Johnson 0c6a4ff8dc [ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.

The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.

Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.

However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.

Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).

Added various tests to ensure that we get identical index files out of
the thin link step.

Reviewers: mehdi_amini, pcc

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31027

llvm-svn: 298638
2017-03-23 19:47:39 +00:00
Dehao Chen 53a0c082d2 Do not set branch weight if the branch weight annotation is present.
Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: mehdi_amini, aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D31228

llvm-svn: 298602
2017-03-23 14:43:10 +00:00
Peter Collingbourne f7691d8b41 IPO: Const correctness for summaries passed into passes.
Pass const qualified summaries into importers and unqualified summaries into
exporters. This lets us const-qualify the summary argument to thinBackend.

Differential Revision: https://reviews.llvm.org/D31230

llvm-svn: 298534
2017-03-22 18:22:59 +00:00
Peter Collingbourne 9a3f97977f IR: Fix a race condition in type id clients of ModuleSummaryIndex.
Add a const version of the getTypeIdSummary accessor that avoids
mutating the TypeIdMap.

Differential Revision: https://reviews.llvm.org/D31226

llvm-svn: 298531
2017-03-22 18:04:39 +00:00
Evgeny Astigeevich 7823c66e05 r286814 resulted that CallPenalty can be subtracted twice:
- First time, during calculation of the cost in InlineCost.cpp
- Second time, during calculation of the cost in Inliner.cpp

This patches fixes this.

Differential Revision: https://reviews.llvm.org/D31137

llvm-svn: 298496
2017-03-22 12:01:57 +00:00
Dehao Chen 9907e9d860 Do not inline hot callsites for samplepgo in thinlto compile phase.
Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase.

Reviewers: tejohnson, eraman

Reviewed By: tejohnson

Subscribers: mehdi_amini, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D31201

llvm-svn: 298428
2017-03-21 19:55:36 +00:00
Reid Kleckner b518054b87 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

llvm-svn: 298393
2017-03-21 16:57:19 +00:00
Evgeniy Stepanov c440572715 Revert r298158.
Revert "[asan] Fix dead stripping of globals on Linux."

OOM in gold linker.

llvm-svn: 298288
2017-03-20 18:45:34 +00:00
Evgeniy Stepanov c5aa6b9411 [asan] Fix dead stripping of globals on Linux.
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.

Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.

This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.

At this moment it does not work on Gold (as in the globals are never
stripped).

Differential Revision: https://reviews.llvm.org/D30121

llvm-svn: 298158
2017-03-17 22:17:29 +00:00
Stanislav Mekhanoshin ee2dd785f6 Only unswitch loops with uniform conditions
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.

Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.

Differential Revision: https://reviews.llvm.org/D30796

llvm-svn: 298104
2017-03-17 17:13:41 +00:00
Chandler Carruth 814e0df1c5 [PM/Inliner] Fix a bug in r297374 where we would leave stale calls in
the work queue and crash when trying to visit them after deleting the
function containing those calls.

llvm-svn: 297940
2017-03-16 10:45:42 +00:00
Dehao Chen 4a435e0896 SamplePGO ThinLTO ICP fix for local functions.
Summary:
In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen:

1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName.
2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName

This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining:

1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import.
2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName.
3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote.

Reviewers: mehdi_amini, tejohnson

Reviewed By: tejohnson

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D30754

llvm-svn: 297757
2017-03-14 17:33:01 +00:00
Peter Collingbourne 14dcf02fcb WholeProgramDevirt: Implement export/import support for VCP.
Differential Revision: https://reviews.llvm.org/D30017

llvm-svn: 297503
2017-03-10 20:13:58 +00:00
Peter Collingbourne 59675ba0f8 WholeProgramDevirt: Implement export/import support for unique ret val opt.
Differential Revision: https://reviews.llvm.org/D29917

llvm-svn: 297502
2017-03-10 20:09:11 +00:00
George Rimar 5d8aea1009 WholeProgramDevirt: Fixed compilation error under MSVS2015.
It was introduced in:

r296945
WholeProgramDevirt: Implement exporting for single-impl devirtualization.
---------------------
r296939
WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
---------------------

Microsoft Visual Studio Community 2015
Version 14.0.23107.0 D14REL
Does not compile that code without additional brackets, showing multiple error like below:

WholeProgramDevirt.cpp(1216): error C2958: the left bracket '[' found at 'c:\access_softek\llvm\lib\transforms\ipo\wholeprogramdevirt.cpp(1216)' was not matched correctly
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ']' before '}'
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ';' before '}'
WholeProgramDevirt.cpp(1216): error C2059: syntax error: ']'

llvm-svn: 297451
2017-03-10 10:31:56 +00:00
Chandler Carruth 20e588e1af [PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.

This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.

Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.

Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;

void g(const char *);

template <bool K, int N> void f(bool *B, bool *E) {
  if (K)
    g(str<N>);
  if (B == E)
    return;
  if (*B)
    f<true, N + 1>(B + 1, E);
  else
    f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }

extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```

When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.

What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.

The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.

We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.

I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.

The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
   really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
   we consider for inlining.

llvm-svn: 297374
2017-03-09 11:35:40 +00:00
Peter Collingbourne 0152c8156b WholeProgramDevirt: Implement importing for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29854

llvm-svn: 297350
2017-03-09 01:11:15 +00:00
Peter Collingbourne 6d284fab20 WholeProgramDevirt: Implement importing for single-impl devirtualization.
Differential Revision: https://reviews.llvm.org/D29844

llvm-svn: 297333
2017-03-09 00:21:25 +00:00
Teresa Johnson d820447212 Perform symbol binding for .symver versioned symbols
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".

While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.

E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.

The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.

Reviewers: rafael, pcc

Reviewed By: pcc

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30485

llvm-svn: 297332
2017-03-09 00:19:49 +00:00
Evgeniy Stepanov 8537d9994d Don't merge global constants with non-dbg metadata.
!type metadata can not be dropped. An alternative to this is adding
!type metadata from the replaced globals to the replacement, but that
may weaken type tests and make them slower at the same time.

The merged global gets !dbg metadata from replaced globals, and can
end up with multiple debug locations.

llvm-svn: 297327
2017-03-09 00:03:37 +00:00
Evgeniy Stepanov 7a5cfa9a11 Fix one-after-the-end type metadata handling in globalsplit.
Itanium ABI may have an address point one byte after the end of a
vtable. When such vtable global is split, the !type metadata needs to
follow the right vtable.

Differential Revision: https://reviews.llvm.org/D30716

llvm-svn: 297236
2017-03-07 22:18:48 +00:00
Hans Wennborg 254f5fa5f2 Disable gvn-hoist (PR32153)
llvm-svn: 297075
2017-03-06 21:10:40 +00:00
Dehao Chen c632a393b7 Remove the sample pgo annotation heuristic that uses call count to annotate basic block count.
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.

Reviewers: davidxl, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D30658

llvm-svn: 297038
2017-03-06 17:49:59 +00:00
Peter Collingbourne f0bb90b126 Fix build.
llvm-svn: 296949
2017-03-04 01:38:05 +00:00
Peter Collingbourne 77a8d563a3 WholeProgramDevirt: Implement exporting for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29846

llvm-svn: 296948
2017-03-04 01:34:53 +00:00
Peter Collingbourne 2325bb34c1 WholeProgramDevirt: Implement exporting for single-impl devirtualization.
Differential Revision: https://reviews.llvm.org/D29811

llvm-svn: 296945
2017-03-04 01:31:01 +00:00
Peter Collingbourne b406baaeef WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
Any unsuccessful llvm.type.checked.load devirtualizations will be translated
into uses of llvm.type.test, so we need to add the resulting llvm.type.test
intrinsics to the function summaries so that the LowerTypeTests pass will
export them.

Differential Revision: https://reviews.llvm.org/D29808

llvm-svn: 296939
2017-03-04 01:23:30 +00:00
Benjamin Kramer 9528f8c2fb Revert "Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline.""
This reverts commit r296759. Miscompiles bash.

llvm-svn: 296872
2017-03-03 14:27:53 +00:00
Peter Collingbourne 3baa72af7d ThinLTOBitcodeWriter: Do not follow operand edges of type GlobalValue when looking for virtual functions.
Such edges may otherwise result in infinite recursion if a pointer to a vtable
is reachable from the vtable itself. This can happen in practice if a TU
defines the ABI types used to implement RTTI, and is itself compiled with RTTI.

Fixes PR32121.

llvm-svn: 296839
2017-03-02 23:10:17 +00:00
Geoff Berry 484d756583 Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This re-applies r289696, which caused TSan perf regression, which has
since been addressed in separate changes (see PR for details).

See PR31382.

llvm-svn: 296759
2017-03-02 16:16:47 +00:00
Dehao Chen a60cdd3881 Add function importing info from samplepgo profile to the module summary.
Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported.

Reviewers: mehdi_amini, tejohnson

Reviewed By: tejohnson

Subscribers: davidxl, llvm-commits

Differential Revision: https://reviews.llvm.org/D30053

llvm-svn: 296498
2017-02-28 18:09:44 +00:00
Adam Nemet de53bfb94d [OptDiag] Hide legacy remark ctors
These are only used when emitting remarks without ORE directly using the free
functions emitOptimizationRemark*.

llvm-svn: 296037
2017-02-23 23:11:11 +00:00
Dehao Chen cc75d2441d Add call branch annotation for ICP promoted direct call in SamplePGO mode.
Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.

Reviewers: davidxl, eraman, xur

Reviewed By: davidxl, xur

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30282

llvm-svn: 296028
2017-02-23 22:15:18 +00:00
Dehao Chen 533bc6ea8e Use base discriminator in sample pgo profile matching.
Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching.

Reviewers: dblaikie, davidxl

Reviewed By: dblaikie, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30218

llvm-svn: 295999
2017-02-23 18:27:45 +00:00
Dehao Chen 7d230325ef Increases full-unroll threshold.
Summary:
The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold

This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge):

Performance:

403	0.11%
433	0.51%
445	0.48%
447	3.50%
453	1.49%
464	0.75%

Code size:

403	0.56%
433	0.96%
445	2.16%
447	2.96%
453	0.94%
464	8.02%

The compiler time overhead is similar with code size.

Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc

Reviewed By: hfinkel, chandlerc

Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D28368

llvm-svn: 295538
2017-02-18 03:46:51 +00:00
Justin Bogner 7bc978b543 OptDiag: Allow constructing DiagnosticLocation from DISubprograms
This avoids creating a DILocation just to represent a line number,
since creating Metadata is expensive. Creating a DiagnosticLocation
directly is much cheaper.

llvm-svn: 295531
2017-02-18 02:00:27 +00:00
Peter Collingbourne 184773d81f WholeProgramDevirt: For VCP use a 32-bit ConstantInt for the byte offset.
A future change will cause this byte offset to be inttoptr'd and then exported
via an absolute symbol. On the importing end we will expect the symbol to be
in range [0,2^32) so that it will fit into a 32-bit relocation. The problem
is that on 64-bit architectures if the offset is negative it will not be in
the correct range once we inttoptr it.

This change causes us to use a 32-bit integer so that it can be inttoptr'd
(which zero extends) into the correct range.

Differential Revision: https://reviews.llvm.org/D30016

llvm-svn: 295487
2017-02-17 19:43:45 +00:00
Peter Collingbourne 37317f1207 WholeProgramDevirt: Examine the function body when deciding whether functions are readnone.
The goal is to get an analysis result even for de-refineable functions.

Differential Revision: https://reviews.llvm.org/D29803

llvm-svn: 295472
2017-02-17 18:17:04 +00:00
Peter Collingbourne 08eb081ac3 PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline.
Differential Revision: https://reviews.llvm.org/D30008

llvm-svn: 295260
2017-02-15 23:48:38 +00:00
Peter Collingbourne 50cbd7cc90 Re-apply r295110 and r295144 with a fix for the ASan issue.
llvm-svn: 295241
2017-02-15 21:56:51 +00:00
Daniel Jasper eef9b03395 Revert r295110 and r295144.
This fails under ASAN:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio

llvm-svn: 295162
2017-02-15 09:56:08 +00:00
Peter Collingbourne e2367415b6 WholeProgramDevirt: Separate the code that applies optzns from the code that decides whether to apply them. NFCI.
The idea is that the apply* functions will also be called when importing
devirt optimizations.

Differential Revision: https://reviews.llvm.org/D29745

llvm-svn: 295144
2017-02-15 02:13:08 +00:00
Peter Collingbourne 534c0175b6 WholeProgramDevirt: Change internal vcall data structures to match summary.
Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).

This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
  types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).

Differential Revision: https://reviews.llvm.org/D29744

llvm-svn: 295110
2017-02-14 22:12:23 +00:00
Taewook Oh f22fa72e4a Do not apply redundant LastCallToStaticBonus
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.

If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.

Reviewers: chandlerc, eraman

Reviewed By: eraman

Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29169

llvm-svn: 295075
2017-02-14 17:30:05 +00:00
Peter Collingbourne 002c2d5380 ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module.
Differential Revision: https://reviews.llvm.org/D29701

llvm-svn: 295021
2017-02-14 03:42:38 +00:00
Peter Collingbourne c45f7f3eb4 FunctionAttrs: Factor out a function for querying memory access of a specific copy of a function. NFC.
This will later be used by ThinLTOBitcodeWriter to add copies of readnone
functions to the regular LTO module.

Differential Revision: https://reviews.llvm.org/D29695

llvm-svn: 295008
2017-02-14 00:28:13 +00:00
Sanjay Patel 4f74216da0 [FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back to its parent function
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html
...we should be able to propagate 'nonnull' info from a callsite back to its parent.

The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430),
but this won't solve that problem.

The transform is currently disabled by default while we wait for clang to work-around
potential security problems:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html

Differential Revision: https://reviews.llvm.org/D27855

llvm-svn: 294998
2017-02-13 23:10:51 +00:00
Peter Collingbourne 2b33f65317 IR: Type ID summary extensions for WPD; thread summary into WPD pass.
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.

Differential Revision: https://reviews.llvm.org/D29782

llvm-svn: 294981
2017-02-13 19:26:18 +00:00
Chandler Carruth addcda483e [PM] Port ArgumentPromotion to the new pass manager.
Now that the call graph supports efficient replacement of a function and
spurious reference edges, we can port ArgumentPromotion to the new pass
manager very easily.

The old PM-specific bits are sunk into callbacks that the new PM simply
doesn't use. Unlike the old PM, the new PM simply does argument
promotion and afterward does the update to LCG reflecting the promoted
function.

Differential Revision: https://reviews.llvm.org/D29580

llvm-svn: 294667
2017-02-09 23:46:27 +00:00
Peter Collingbourne 17febdbb25 WholeProgramDevirt: Check that VCP candidate functions are defined before evaluating them.
This was crashing before.

llvm-svn: 294666
2017-02-09 23:46:26 +00:00
Chandler Carruth aaad9f84be [PM/LCG] Teach the LazyCallGraph how to replace a function without
disturbing the graph or having to update edges.

This is motivated by porting argument promotion to the new pass manager.
Because of how LLVM IR Function objects work, in order to change their
signature a new object needs to be created. This is efficient and
straight forward in the IR but previously was very hard to implement in
LCG. We could easily replace the function a node in the graph
represents. The challenging part is how to handle updating the edges in
the graph.

LCG previously used an edge to a raw function to represent a node that
had not yet been scanned for calls and references. This was the core
of its laziness. However, that model causes this kind of update to be
very hard:
1) The keys to lookup an edge need to be `Function*`s that would all
   need to be updated when we update the node.
2) There will be some unknown number of edges that haven't transitioned
   from `Function*` edges to `Node*` edges.

All of this complexity isn't necessary. Instead, we can always build
a node around any function, always pointing edges at it and always using
it as the key to lookup an edge. To maintain the laziness, we need to
sink the *edges* of a node into a secondary object and explicitly model
transitioning a node from empty to populated by scanning the function.
This design seems much cleaner in a number of ways, but importantly
there is now exactly *one* place where the `Function*` has to be
updated!

Some other cleanups that fall out of this include having something to
model the *entry* edges more accurately. Rather than hand rolling parts
of the node in the graph itself, we have an explicit `EdgeSequence`
object that gives us exactly the functionality needed. We also have
a consistent place to define the edge iterators and can use them for
both the entry edges and the internal edges of the graph.

The API used to model the separation between a node and its edges is
intentionally very thin as most clients are expected to deal with nodes
that have populated edges. We model this exactly as an optional does
with an additional method to populate the edges when that is
a reasonable thing for a client to do. This is based on API design
suggestions from Richard Smith and David Blaikie, credit goes to them
for helping pick how to model this without it being either too explicit
or too implicit.

The patch is somewhat noisy due to shifting around iterator types and
new syntax for walking the edges of a node, but most of the
functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.

Differential Revision: https://reviews.llvm.org/D29577

llvm-svn: 294653
2017-02-09 23:24:13 +00:00
Peter Collingbourne cea1e4e79a De-duplicate some code for creating an AARGetter suitable for the legacy PM.
I'm about to use this in a couple more places.

Differential Revision: https://reviews.llvm.org/D29793

llvm-svn: 294648
2017-02-09 23:11:52 +00:00
Peter Collingbourne 857aba4410 Rename LowerTypeTestsSummaryAction to PassSummaryAction. NFCI.
I intend to use the same type with the same semantics in the WholeProgramDevirt
pass.

Differential Revision: https://reviews.llvm.org/D29746

llvm-svn: 294629
2017-02-09 21:45:01 +00:00
Peter Collingbourne 28ffd3261f ThinLTOBitcodeWriter: Strip debug info from merged module.
This module will contain nothing but vtable definitions and (soon)
available_externally function definitions, so there is no point in keeping
debug info in the module.

Differential Revision: https://reviews.llvm.org/D28913

llvm-svn: 294511
2017-02-08 20:44:00 +00:00
Peter Collingbourne 1ea1fd893c LowerTypeTests: Simplify. NFC.
llvm-svn: 294273
2017-02-07 03:20:58 +00:00
Dehao Chen 4a9dd70213 Fix the samplepgo indirect call promotion bug: we should not promote a direct call.
Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29570

llvm-svn: 294260
2017-02-06 23:33:15 +00:00
Dehao Chen c81483d79c Fix the bug of samplepgo indirect call promption when type casting of the return value is needed.
Summary: When type casting of the return value is needed, promoteIndirectCall will return the type casting instruction instead of the direct call. This patch changed to return the direct call instruction instead.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29569

llvm-svn: 294205
2017-02-06 18:10:36 +00:00
Dehao Chen 448b5790f6 Refactor SampleProfile.cpp to make it cleaner. (NFC)
llvm-svn: 294118
2017-02-05 07:32:17 +00:00
Davide Italiano ec49313b11 [IPCP] Don't propagate return value for naked functions.
This is pretty much the same change made in SCCP.

llvm-svn: 294098
2017-02-04 19:44:14 +00:00
Peter Collingbourne e6fd9ff96a IRMover: Merge flags LinkModuleInlineAsm and IsPerformingImport.
Currently these flags are always the inverse of each other, so there is
no need to keep them separate.

Differential Revision: https://reviews.llvm.org/D29471

llvm-svn: 294016
2017-02-03 17:01:14 +00:00
Peter Collingbourne 6d8f817f8b FunctionImport: Use IRMover directly.
The importer was previously using ModuleLinker in a sort of "IRMover mode". Use
IRMover directly instead in order to remove a level of indirection.

I will remove all importing support from ModuleLinker in a separate
change.

Differential Revision: https://reviews.llvm.org/D29468

llvm-svn: 294014
2017-02-03 16:56:27 +00:00
Mehdi Amini 1380edf4ef Revert "[ThinLTO] Add an auto-hide feature"
This reverts commit r293970.

After more discussion, this belongs to the linker side and
there is no added value to do it at this level.

llvm-svn: 293993
2017-02-03 07:41:43 +00:00
Mehdi Amini b0a8ff71e5 [ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.

This is a recommit of r293912 after fixing build failures,
and a recommit of r293918 after fixing LLD tests.

Differential Revision: https://reviews.llvm.org/D28978

llvm-svn: 293970
2017-02-03 00:32:38 +00:00
Mehdi Amini 21c89dc920 Revert "[ThinLTO] Add an auto-hide feature"
This reverts commit r293918, one lld test does not pass.

llvm-svn: 293961
2017-02-02 23:20:36 +00:00
Peter Collingbourne 37e2459186 FunctionImport: Remove the -disable-force-link-odr flag and change importFunctions to never force link.
This removes some functionality that was only being used by tests.

Differential Revision: https://reviews.llvm.org/D29439

llvm-svn: 293919
2017-02-02 18:42:25 +00:00
Mehdi Amini 97624fb1ec [ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.

This is a recommit of r293912 after fixing build failures.

Differential Revision: https://reviews.llvm.org/D28978

llvm-svn: 293918
2017-02-02 18:31:35 +00:00
Mehdi Amini 827600deaf Revert "[ThinLTO] Add an auto-hide feature"
This reverts r293912, bots are broken.

llvm-svn: 293914
2017-02-02 18:24:37 +00:00
Mehdi Amini dc5a7444f0 [ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.

Differential Revision: https://reviews.llvm.org/D28978

llvm-svn: 293912
2017-02-02 18:13:46 +00:00
Dehao Chen 274df5ea41 Explicitly promote indirect calls before sample profile annotation.
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.

Reviewers: xur, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29040

llvm-svn: 293657
2017-01-31 17:49:37 +00:00
Dehao Chen 6217fa44b8 Revert r292979 which causes compile time failure.
llvm-svn: 293557
2017-01-30 22:26:05 +00:00
Adam Nemet e7bdf227f6 [Inliner] Fold analysis remarks into missed remarks
This significantly reduces the noise level of these messages.

llvm-svn: 293492
2017-01-30 16:22:45 +00:00
Haicheng Wu f8dc2d8c8b [Inliner] Fix a comment to match the code. NFC.
TotalAltCost => TotalSecondaryCost

Differential Revision: https://reviews.llvm.org/D29231

llvm-svn: 293490
2017-01-30 16:15:14 +00:00
Chandler Carruth 8e9c0a8472 [ArgPromote] Move static helpers to modern LLVM naming conventions while
here. NFC.

Simple refactoring while prepping a port to the new PM.

Differential Revision: https://reviews.llvm.org/D29249

llvm-svn: 293426
2017-01-29 08:03:21 +00:00
Chandler Carruth ae9ce3d402 [ArgPromote] Run clang-format to normalize remarkably idiosyncratic
formatting that has evolved here over the past years prior to making
somewhat invasive changes to thread new PM support through the business
logic.

Differential Revision: https://reviews.llvm.org/D29248

llvm-svn: 293425
2017-01-29 08:03:19 +00:00
Chandler Carruth cd836cd4ee [ArgPromote] Re-arrange the code in a more typical, logical way.
This arranges the static helpers in an order where they are defined
prior to their use to avoid the need of forward declarations, and
collect the core pass components at the bottom below their helpers.

This also folds one trivial function into the pass itself. Factoring
this 'runImpl' was an attempt to help porting to the new pass manager,
however in my attempt to begin this port in earnest it turned out to not
be a substantial help. I think it will be easier to factor things
without it.

This is an NFC change and does a minimal amount of edits over all.
Subsequent NFC cleanups will normalize the formatting with clang-format
and improve the basic doxygen commenting.

Differential Revision: https://reviews.llvm.org/D29247

llvm-svn: 293424
2017-01-29 08:03:16 +00:00
Davide Italiano 9b8738d7c8 [PM] MLSM has been enabled for a way. Reclaim a cl::opt.
llvm-svn: 293401
2017-01-28 23:45:37 +00:00
Matthias Braun 194ded551c Use print() instead of dump() in code
llvm-svn: 293371
2017-01-28 06:53:55 +00:00
Mehdi Amini 888dee444b Global DCE performance improvement
Change the original algorithm so that it scales better when meeting
very large bitcode where every instruction does not implies a global.

The target query is "how to you get all the globals referenced by
another global"?

Before this patch, it was doing this by walking the body (or the
initializer) and collecting the references. What this patch is doing,
it precomputing the answer to this query for the whole module by
walking the use-list of every global instead.

Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

Differential Revision: https://reviews.llvm.org/D28549

llvm-svn: 293328
2017-01-27 19:48:57 +00:00
Peter Collingbourne 1df6e858ef LowerTypeTests: Ignore external globals with type metadata.
Thanks to Davide Italiano for finding the problem and providing a test case.

llvm-svn: 293119
2017-01-26 00:32:15 +00:00
Krzysztof Parzyszek 0fd6296b82 Add loop pass insertion point EP_LateLoopOptimizations
Differential Revision: https://reviews.llvm.org/D28694

llvm-svn: 293067
2017-01-25 16:12:25 +00:00
Dehao Chen a5eb1689dc Explicitly promote indirect calls before sample profile annotation.
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.

Reviewers: xur, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29040

llvm-svn: 292979
2017-01-24 21:05:51 +00:00
Chandler Carruth 6acdca78a0 [PH] Replace uses of AssertingVH from members of analysis results with
a lazy-asserting PoisoningVH.

AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.

This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.

The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.

The rest is straight cleanup.

I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.

Differential Revision: https://reviews.llvm.org/D29006

llvm-svn: 292928
2017-01-24 12:55:57 +00:00
David L. Jones d21529fa0d [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848
2017-01-23 23:16:46 +00:00
Evgeniy Stepanov 29e56bceec Revert "Refactor SampleProfile.cpp to move computation inside a branch. (NFC)"
Causes MSan failures on the buildbot.

llvm-svn: 292840
2017-01-23 22:40:08 +00:00
Dehao Chen a53219a5f2 Refactor SampleProfile.cpp to move computation inside a branch. (NFC)
llvm-svn: 292803
2017-01-23 17:09:02 +00:00
Chandler Carruth e8c66b2766 [PM] Replace the hard invalidate in JumpThreading for LVI with correct
invalidation of deleted functions in GlobalDCE.

This was always testing a bug really triggered in GlobalDCE. Right now
we have analyses with asserting value handles into IR. As long as those
remain, when *deleting* an IR unit, we cannot wait for the normal
invalidation scheme to kick in even though it was designed to work
correctly in the face of these kinds of deletions. Instead, the pass
needs to directly handle invalidating the analysis results pointing at
that IR unit.

I've tought the Inliner about this and this patch teaches GlobalDCE.
This will handle the asserting VH case in the existing test as well as
other issues of the same fundamental variety. I've moved the test into
the GlobalDCE directory and added a comment explaining what is going on.

Note that we cannot simply require LVI here because LVI is too lazy.

llvm-svn: 292773
2017-01-23 08:33:24 +00:00
Chandler Carruth 9524af4aac [PM] Clear any analyses for a dead function after inlining it and before
clearing its body. This is essential to avoid triggering asserting value
handles in analyses on the function's body.

I'm working on a test case for this behavior in LLVM, but Clang has
a great one that managed to trigger this on all of the bots already.

llvm-svn: 292770
2017-01-23 07:03:41 +00:00
Chandler Carruth b698d5964d [PM] Fix a really nasty bug introduced when adding PGO support to the
new PM's inliner.

The bug happens when we refine an SCC after having computed a proxy for
the FunctionAnalysisManager, and then proceed to compute fresh analyses
for functions in the *new* SCC using the manager provided by the old
SCC's proxy. *And* when we manage to mutate a function in this new SCC
in a way that invalidates those analyses. This can be... challenging to
reproduce.

I've managed to contrive a set of functions that trigger this and added
a test case, but it is a bit brittle. I've directly checked that the
passes run in the expected ways to help avoid the test just becoming
silently irrelevant.

This gets the new PM back to passing the LLVM test suite after the PGO
improvements landed.

llvm-svn: 292757
2017-01-22 10:34:01 +00:00
Chandler Carruth d4be9f4b8d [PM] Add some debug logging to the new PM inliner to make it easier to
trace its behavior.

llvm-svn: 292756
2017-01-22 10:33:58 +00:00
Anmol P. Paralkar 910dc8de3f MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info
Summary:
Under option -mergefunc-preserve-debug-info we:
- Do not create a new function for a thunk.
- Retain the debug info for a thunk's parameters (and associated
  instructions for the debug info) from the entry block.
  Note: -debug will display the algorithm at work.
- Create debug-info for the call (to the shared implementation) made by
  a thunk and its return value.
- Erase the rest of the function, retaining the (minimally sized) entry
  block to create a thunk.
- Preserve a thunk's call site to point to the thunk even when both occur
  within the same translation unit, to aid debugability. Note that this
  behaviour differs from the underlying -mergefunc implementation which
  modifies the thunk's call site to point to the shared implementation
  when both occur within the same translation unit.

Reviewers: echristo, eeckstein, dblaikie, aprantl, friss

Reviewed By: aprantl

Subscribers: davide, fhahn, jfb, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D28075

llvm-svn: 292702
2017-01-21 02:02:56 +00:00
Peter Collingbourne b365d921cf LowerTypeTests: Fix use-after-free. Found by asan/msan.
llvm-svn: 292700
2017-01-21 01:57:44 +00:00
Peter Collingbourne 67addbcacf LowerTypeTests: Simplify; always create SizeM1 with type IntPtrTy, move initialization out of if statement.
llvm-svn: 292674
2017-01-20 23:22:28 +00:00
Dehao Chen 77079003dd Add indirect call promotion to SamplePGO
Summary: This patch adds metadata for indirect call promotion in the sample profile loader.

Reviewers: xur, davidxl, dnovillo

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28923

llvm-svn: 292672
2017-01-20 22:56:07 +00:00
Easwaran Raman 12585b0148 Improve PGO support for the new inliner
This adds the following to the new PM based inliner in PGO mode:

* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.

* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.

* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.

Differential revision: https://reviews.llvm.org/D28331

llvm-svn: 292666
2017-01-20 22:44:04 +00:00
Peter Collingbourne e02b74e294 IPO, LTO: Plumb the summary from the LTO API into the pass manager.
Differential Revision: https://reviews.llvm.org/D28840

llvm-svn: 292661
2017-01-20 22:18:52 +00:00
Teresa Johnson 4566c6db87 [ThinLTO] Drop non-prevailing non-ODR weak to declarations
Summary:
Allow non-ODR weak/linkonce non-prevailing copies to be marked
as available_externally in the index. Add support for dropping these to
declarations in the backend.

Reviewers: mehdi_amini, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28806

llvm-svn: 292656
2017-01-20 21:54:58 +00:00
Peter Collingbourne f04a390099 LowerTypeTests: Implement importing of type identifiers.
To import a type identifier we read the summary and create external
references to the symbols defined when exporting.

Differential Revision: https://reviews.llvm.org/D28546

llvm-svn: 292654
2017-01-20 21:49:34 +00:00
Peter Collingbourne ee1416037e LowerTypeTests: Compute SizeM1BitWidth in exportTypeId. NFCI.
This avoids needing to store it in a separate field in TypeIdLowering.

llvm-svn: 292647
2017-01-20 20:57:40 +00:00
Dehao Chen 94f369fc7f clang-format SampleProfile.cpp (NFC)
llvm-svn: 292533
2017-01-19 23:20:31 +00:00
Peter Collingbourne 22d9d3cdce LowerTypeTests: Implement exporting of type identifiers.
Type identifiers are exported by:
- Adding coarse-grained information about how to test the type
  identifier to the summary.
- Creating symbols in the object file (aliases and absolute symbols)
  containing fine-grained information about the type identifier.

Differential Revision: https://reviews.llvm.org/D28424

llvm-svn: 292462
2017-01-19 01:20:11 +00:00
Peter Collingbourne 20a00933fb ThinLTOBitcodeWriter: Clear comdats on filtered globals.
Differential Revision: https://reviews.llvm.org/D28839

llvm-svn: 292431
2017-01-18 20:03:02 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Teresa Johnson 83aaf358fd [ThinLTO] Import static functions from the same module as caller
Summary:
We can sometimes end up with multiple copies of a local function that
have the same GUID in the index. This happens when there are local
functions with the same name that are in different source files with the
same name (but in different directories), and they were compiled in
their own directory so had the same path at compile time.

In this case make sure we import the copy in the caller's module. While
it isn't a correctness problem (the renamed reference which is based on the
module IR hash will be unique since the module must have had an
externally visible function that was imported), importing the wrong copy
will result in lost performance opportunity since it won't be referenced
and inlined.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28440

llvm-svn: 291841
2017-01-12 22:04:45 +00:00
Peter Collingbourne 7636532c1b LowerTypeTests: Represent the memory region size with the constant size-1.
This means that we can use a shorter instruction sequence in the case where
the size is a power of two and on the boundary between two representations.

Differential Revision: https://reviews.llvm.org/D28421

llvm-svn: 291706
2017-01-11 21:32:10 +00:00
Peter Collingbourne 6bca5a0d82 Re-apply r291205, "LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.", with a fix for an off-by-one error.
llvm-svn: 291699
2017-01-11 20:28:46 +00:00
Ivan Krasin 42e6b4fd98 Revert rL291205 because it breaks Chrome tests under CFI.
Summary:
Revert LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.

This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.

Original URL: https://reviews.llvm.org/D28341

Reviewers: pcc

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D28532

llvm-svn: 291684
2017-01-11 16:54:04 +00:00
Peter Collingbourne d79e49d807 LowerTypeTests: Thread summary and action from the API and command line into the pass.
Also move command line handling out of the pass constructor and into
a separate function.

Differential Revision: https://reviews.llvm.org/D28422

llvm-svn: 291323
2017-01-07 01:17:24 +00:00
Peter Collingbourne 81271b7bd2 LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.

Differential Revision: https://reviews.llvm.org/D28341

llvm-svn: 291205
2017-01-06 02:22:47 +00:00
Teresa Johnson 6c475a7595 ThinLTO: add early "dead-stripping" on the Index
Summary:
Using the linker-supplied list of "preserved" symbols, we can compute
the list of "dead" symbols, i.e. the one that are not reachable from
a "preserved" symbol transitively on the reference graph.
Right now we are using this information to mark these functions as
non-eligible for import.

The impact is two folds:
- Reduction of compile time: we don't import these functions anywhere
  or import the function these symbols are calling.
- The limited number of import/export leads to better internalization.

Patch originally by Mehdi Amini.

Reviewers: mehdi_amini, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23488

llvm-svn: 291177
2017-01-05 21:34:18 +00:00
Teresa Johnson 519465b993 [ThinLTO] Subsume all importing checks into a single flag
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28169

llvm-svn: 291108
2017-01-05 14:32:16 +00:00
Peter Collingbourne b2ce2b6805 IR: Module summary representation for type identifiers; summary test scaffolding for lowertypetests.
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).

Differential Revision: https://reviews.llvm.org/D28041

llvm-svn: 291069
2017-01-05 03:39:00 +00:00
Mehdi Amini 19ef4fad91 Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.

Two main current limitations are:

1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.

These two limitations can be lifted in a subsequent improvement if
needed.

With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).

Reviewers: pcc, tejohnson, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28113

llvm-svn: 291027
2017-01-04 22:54:33 +00:00
Davide Italiano b672537cbf [PMBuilder] Remove RunFloat2Int cl::opt.
The pass has been on by default for a long time without problems.

llvm-svn: 290814
2017-01-02 17:49:18 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Chandler Carruth 141bf5d14d [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
Chandler Carruth 03130d981c [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Chandler Carruth 6e9bb7e064 [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Davide Italiano fe7a3ee51e [NewGVN] Add a flag to enable the pass via `-mllvm`.
NewGVN can be tested passing `-mllvm -enable-newgvn` to clang.

Differential Revision:  https://reviews.llvm.org/D28059

llvm-svn: 290548
2016-12-26 18:26:19 +00:00
Chandler Carruth 4eaff12ba2 [PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.

Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.

Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?

Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.

llvm-svn: 290457
2016-12-23 23:33:35 +00:00
Mehdi Amini 94f86ad4e0 Function-import: Disable IRVerifier on lazy-loaded modules: the ODR TypeUniquing generates invalid debug info.
llvm-svn: 290442
2016-12-23 19:19:44 +00:00
Mehdi Amini fc06b83ee7 Fix build after r290437 (missing include)
llvm-svn: 290438
2016-12-23 18:04:51 +00:00
Mehdi Amini 9a9077fdad FunctionImport: fix typo '#ifndef NDEBUG' instead of '#ifndef DEBUG'
llvm-svn: 290437
2016-12-23 17:59:24 +00:00
Mehdi Amini 96cdc49305 [ThinLTO] Verify lazy-loaded source module for function importing when assertions are enabled (NFC)
llvm-svn: 290416
2016-12-23 05:16:19 +00:00
Evgeniy Stepanov 27d4c9b71b [cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.

The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.

This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.

Differential Revision: https://reviews.llvm.org/D28012

llvm-svn: 290384
2016-12-22 22:22:35 +00:00
Easwaran Raman 180bd9f6b3 Pass GetAssumptionCache to InlineFunctionInfo constructor
Differential revision: https://reviews.llvm.org/D28038

llvm-svn: 290295
2016-12-22 01:07:01 +00:00
Adam Nemet 32e6a34c02 [LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).

As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.

clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.

This changes opt's default to off as well to match clang.  The tests are
modified to explicitly enable the transformation.

llvm-svn: 290235
2016-12-21 04:07:40 +00:00
Peter Collingbourne 598bd2a262 IPO: Remove the ModuleSummary argument to the FunctionImport pass. NFCI.
No existing client is passing a non-null value here. This will come back
in a slightly different form as part of the type identifier summary work.

Differential Revision: https://reviews.llvm.org/D28006

llvm-svn: 290222
2016-12-21 00:50:12 +00:00
Chandler Carruth 1d96311447 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Evgeniy Stepanov 95294127d0 Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts r289696, which caused TSan perf regression.

See PR31382.

llvm-svn: 290030
2016-12-17 01:53:15 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Teresa Johnson edddca22c9 [ThinLTO] Thin link efficiency: More efficient export list computation
Summary:
Instead of checking whether a global referenced by a function being
imported is defined in the same module, speculatively always add the
referenced globals to the module's export list. After all imports are
computed, for each module prune any not in its defined set from its
export list.

For a huge C++ app with aggressive importing thresholds, even with
D27687 we spent a lot of time invoking modulePath() from
exportGlobalInModule (modulePath() was still the 2nd hottest routine in
profile). The reason is that with comdat/linkonce the summary lists for
each GUID can be long. For the app in question, for example, we were
invoking exportGlobalInModule almost 2 million times, and we traversed
an average of 63 entries in the summary list each time.

This patch reduced the thin link time for the app by about 10% (on top
of D27687) when using aggressive importing thresholds, and about 3.5% on
average with default importing thresholds.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27755

llvm-svn: 289918
2016-12-16 04:11:51 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Peter Collingbourne 7a4be21d46 Add missing library dep.
llvm-svn: 289903
2016-12-16 00:43:00 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Peter Collingbourne 1398a32e28 IPO: Introduce ThinLTOBitcodeWriter pass.
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.

All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.

Differential Revision: https://reviews.llvm.org/D27324

llvm-svn: 289899
2016-12-16 00:26:30 +00:00
Teresa Johnson 19f2aa7891 [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27687

llvm-svn: 289896
2016-12-15 23:50:06 +00:00
Teresa Johnson eb0ac24172 [ThinLTO] Revert part of r289843 that belonged to another patch.
The code change for D27687 accidentally got committed along with the
main change in r289843. Revert it temporarily, so that I can recommit it
along with its test as intended.

llvm-svn: 289875
2016-12-15 21:39:42 +00:00
Teresa Johnson 0c3f57b133 [ThinLTO] Remove stale comment (NFC)
This should have been removed with r288446.

llvm-svn: 289871
2016-12-15 20:53:31 +00:00
Teresa Johnson 475b51a700 [ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC)
Summary:
Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.

This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27696

llvm-svn: 289867
2016-12-15 20:48:19 +00:00
Teresa Johnson 1b859a2306 [ThinLTO] Ensure callees get hot threshold when first seen on cold path
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.

Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.

llvm-svn: 289843
2016-12-15 18:21:01 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Dehao Chen 40dd8c5109 Only sets profile summary when it was not preset.
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725
2016-12-14 22:06:49 +00:00
Dehao Chen fb699619a0 Fix the bug in r289714 (NFC).
llvm-svn: 289724
2016-12-14 22:03:08 +00:00
Dehao Chen a99e082e15 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289714
2016-12-14 21:40:47 +00:00
Geoff Berry ca11a1e147 [GVNHoist] Move GVNHoist to function simplification part of pipeline.
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline.  The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:
  Benchmarks/CoyoteBench/fftbench  -24.952%
  spec2006/bzip2                    -4.071%
  internal bmark                    -3.177%
  Benchmarks/PAQ8p/paq8p            -1.754%
  spec2000/perlbmk                  -1.328%
  spec2006/h264ref                  -1.140%

Regressions:
  internal bmark                    +1.818%
  Benchmarks/mafft/pairlocalalign   +1.084%

Reviewers: sebpop, dberlin, hiraditya

Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27722

llvm-svn: 289696
2016-12-14 19:38:22 +00:00
Dehao Chen 23025f8483 revert r289669 which breaks bots
llvm-svn: 289676
2016-12-14 17:23:16 +00:00
Dehao Chen cb61c94d87 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289669
2016-12-14 16:49:28 +00:00
Dehao Chen 0f35fa907d Change CoverageTracker from a global variable to member variable to avoid breaking thread-safety. (NFC)
llvm-svn: 289603
2016-12-13 22:13:18 +00:00
Teresa Johnson a29bd6ffcc [ThinLTO] Remove useless code (NFC)
Should have been removed in r288446.

llvm-svn: 289466
2016-12-12 20:34:28 +00:00
Peter Collingbourne 8786754cc3 WholeProgramDevirt: Teach the pass to handle structs of arrays.
This will become necessary in some cases once D22296 lands.

llvm-svn: 289165
2016-12-09 01:10:11 +00:00
Peter Collingbourne 7a1e5bbe4e Make WholeProgramDevirt understand ConstStruct vtables.
Based on a patch by LemonBoy!

Differential Revision: https://reviews.llvm.org/D26581

llvm-svn: 289162
2016-12-09 00:33:27 +00:00
Evgeniy Stepanov 0c8957c198 CFI-icall on Thumb
Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb.
Use b.w branch instruction.
Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set.

This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc.

Differential Revision: https://reviews.llvm.org/D27499

llvm-svn: 289008
2016-12-08 00:32:26 +00:00
Benjamin Kramer b1332d8bf6 Try unbreaking the MSVC build.
llvm-svn: 288907
2016-12-07 13:35:11 +00:00
Benjamin Kramer 926ab5b00b [LowerTypeTests] Use the TrailingObjects infrastructure for trailing objects.
Also avoid allocating ~3x as much memory as needed.

llvm-svn: 288904
2016-12-07 12:31:45 +00:00
Peter Collingbourne 7357b2ad62 LowerTypeTests: Improve performance by optimising type metadata queries.
Requesting metadata for a global is a relatively expensive operation as it
involves a map lookup, but it's one that we need to do relatively frequently in
this pass to collect the list of type metadata nodes associated with a global.
This change improves the performance of type metadata queries by prebuilding
data structures that keep the global together with its list of type metadata,
and changing the pass to use that data structure wherever we were previously
passing global references around.

This change also eliminates some O(N^2) behavior by collecting the list of
globals associated with each type identifier during the first pass over the
list of globals rather than visiting each global to compute that list every
time we add a new type identifier.

Reduces pass runtime on a module containing Chrome's vtables from over 60s
to 0.9s.

Differential Revision: https://reviews.llvm.org/D27484

llvm-svn: 288859
2016-12-06 23:02:13 +00:00
Peter Collingbourne bc0705240e IR: Move NumElements field from {Array,Vector}Type to SequentialType.
Now that PointerType is no longer a SequentialType, all SequentialTypes
have an associated number of elements, so we can move that information to
the base class, allowing for a number of simplifications.

Differential Revision: https://reviews.llvm.org/D27122

llvm-svn: 288464
2016-12-02 03:20:58 +00:00
Peter Collingbourne 4568158c4d IR: Change PointerType to derive from Type rather than SequentialType.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html

This is for a couple of reasons:

- Values of type PointerType are unlike the other SequentialTypes (arrays
  and vectors) in that they do not hold values of the element type. By moving
  PointerType we can unify certain aspects of how the other SequentialTypes
  are handled.
- PointerType will have no place in the SequentialType hierarchy once
  pointee types are removed, so this is a necessary step towards removing
  pointee types.

Differential Revision: https://reviews.llvm.org/D26595

llvm-svn: 288462
2016-12-02 03:05:41 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
Teresa Johnson 185b4ab6d4 [ThinLTO] Stop importing constant global vars as copies in the backend
Summary:
We were doing an optimization in the ThinLTO backends of importing
constant unnamed_addr globals unconditionally as a local copy (regardless
of whether the thin link decided to import them). This should be done in
the thin link instead, so that resulting exported references are marked
and promoted appropriately, but will need a summary enhancement to mark
these variables as constant unnamed_addr.

The function import logic during the thin link was trying to handle
this proactively, by conservatively marking all values referenced in
the initializer lists of exported global variables as also exported.
However, this only handled values referenced directly from the
initializer list of an exported global variable. If the value is itself
a constant unnamed_addr variable, we could end up exporting its
references as well. This caused multiple issues. The first is that the
transitively exported references weren't promoted. Secondly, some could
not be promoted/renamed (e.g. they had a section or other constraint).
recursively, instead of just adding the first level of initializer list
references to the ExportList directly.

Remove this optimization and the associated handling in the function
import backend. SPEC measurements indicate we weren't getting much
from it in any case.

Fixes PR31052.

Reviewers: mehdi_amini

Subscribers: krasin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26880

llvm-svn: 288446
2016-12-02 01:02:30 +00:00
Peter Collingbourne 863cbfbeba Object: Extract a ModuleSymbolTable class from IRObjectFile.
This class represents a symbol table built from in-memory IR. It provides
access to GlobalValues and should only be used if such access is required
(e.g. in the LTO implementation). We will eventually change IRObjectFile
to read from a bitcode symbol table rather than using ModuleSymbolTable,
so it would not be able to expose the module.

Differential Revision: https://reviews.llvm.org/D27073

llvm-svn: 288319
2016-12-01 06:51:47 +00:00
David Blaikie 831b652020 Use CallSite to simplify code
llvm-svn: 288192
2016-11-29 19:42:27 +00:00
David Majnemer d5648c7a7d Replace some callers of setTailCall with setTailCallKind
We were a little sloppy with adding tailcall markers.  Be more
consistent by using setTailCallKind instead of setTailCall.

llvm-svn: 287955
2016-11-25 22:35:09 +00:00
Dehao Chen 554f500ae2 Before sample pgo annotation, do not inline a function that has no debug info. (NFC)
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.

llvm-svn: 287710
2016-11-22 22:50:01 +00:00
Davide Italiano 2ae76dd239 [GlobalSplit] Port to the new pass manager.
llvm-svn: 287511
2016-11-21 00:28:23 +00:00
Simon Pilgrim 7d18a70dac Fix spelling mistakes in Transforms comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287488
2016-11-20 13:19:49 +00:00
Chris Bieneman 05c279fc4b [CMake] NFC. Updating CMake dependency specifications
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.

llvm-svn: 287206
2016-11-17 04:36:50 +00:00
Peter Collingbourne f72a8d4e08 Introduce GlobalSplit pass.
This pass splits globals into elements using inrange annotations on
getelementptr indices.

Differential Revision: https://reviews.llvm.org/D22295

llvm-svn: 287178
2016-11-16 23:40:26 +00:00
Teresa Johnson 4fef68cb8d [ThinLTO] Only promote exported locals as marked in index
Summary:
We have always speculatively promoted all renamable local values
(except const non-address taken variables) for both the exporting
and importing module. We would then internalize them back based on
the ThinLink results if they weren't actually exported. This is
inefficient, and results in unnecessary renames. It also meant we
had to check the non-renamability of a value in the summary, which
was already checked during function importing analysis in the ThinLink.

Made renameModuleForThinLTO (which does the promotion/renaming) instead
use the index when exporting, to avoid unnecessary renames/promotions.
For importing modules, we can simply promoted all values as any local
we import by definition is exported and needs promotion.

This required changes to the method used by the FunctionImport pass
(only invoked from 'opt' for testing) and when invoked from llvm-link,
since neither does a ThinLink. We simply conservatively mark all locals
in the index as promoted, which preserves the current aggressive
promotion behavior.

I also needed to change an llvm-lto based test where we had previously
been aggressively promoting values that weren't importable (aliasees),
but now will not promote.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26467

llvm-svn: 286871
2016-11-14 19:21:41 +00:00
Teresa Johnson d5033a4576 [ThinLTO] Make inline assembly handling more efficient in summary
Summary:
The change in r285513 to prevent exporting of locals used in
inline asm added all locals in the llvm.used set to the reference
set of functions containing inline asm. Since these locals were marked
NoRename, this automatically prevented importing of the function.

Unfortunately, this caused an explosion in the summary reference lists
in some cases. In my particular example, it happened for a large protocol
buffer generated C++ file, where many of the generated functions
contained an inline asm call. It was exacerbated when doing a ThinLTO
PGO instrumentation build, where the PGO instrumentation included
thousands of private __profd_* values that were added to llvm.used.

We really only need to include a single llvm.used local (NoRename) value
in the reference list of a function containing inline asm to block it
being imported. However, it seems cleaner to add a flag to the summary
that explicitly describes this situation, which is what this patch does.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26402

llvm-svn: 286840
2016-11-14 16:40:19 +00:00
Peter Collingbourne d9445c49ad Bitcode: Change module reader functions to return an llvm::Expected.
Differential Revision: https://reviews.llvm.org/D26562

llvm-svn: 286752
2016-11-13 07:00:17 +00:00
Evgeniy Stepanov 1fe189d795 [cfi] Fix weak functions handling.
When a function pointer is replaced with a jumptable pointer, special
case is needed to preserve the semantics of extern_weak functions.
Since a jumptable entry can not be extern_weak, we emulate that
behaviour by replacing all references to F (the extern_weak function)
with the following expression: F != nullptr ? JumpTablePtr : nullptr.

Extra special care is needed for global initializers, since most (or
probably all) backends can not lower an initializer that includes
this kind of constant expression. Initializers like that are replaced
with a global constructor (i.e. a runtime initializer).

llvm-svn: 286636
2016-11-11 21:39:26 +00:00
Erik Eckstein 4d6fb72aa9 Make the FunctionComparator of the MergeFunctions pass a stand-alone utility.
This is pure refactoring. NFC.

This change moves the FunctionComparator (together with the GlobalNumberState
utility) in to a separate file so that it can be used by other passes.
For example, the SwiftMergeFunctions pass in the Swift compiler:
https://github.com/apple/swift/blob/master/lib/LLVMPasses/LLVMMergeFunctions.cpp

Details of the change:

*) The big part is just moving code out of MergeFunctions.cpp into FunctionComparator.h/cpp
*) Make FunctionComparator member functions protected (instead of private)
   so that a derived comparator class can use them.

Following refactoring helps to share code between the base FunctionComparator
class and a derived class:

*) Add a beginCompare() function
*) Move some basic function property comparisons into a separate function compareSignature()
*) Do the GEP comparison inside cmpOperations() which now has a new
   needToCmpOperands reference parameter

https://reviews.llvm.org/D25385

llvm-svn: 286632
2016-11-11 21:15:13 +00:00
Peter Collingbourne 6de481a378 Bitcode: Change getModuleSummaryIndex() to return an llvm::Expected.
Differential Revision: https://reviews.llvm.org/D26539

llvm-svn: 286624
2016-11-11 19:50:39 +00:00
Evgeniy Stepanov f48ffab554 [cfi] Implement cfi-icall using inline assembly.
The current implementation is emitting a global constant that happens
to evaluate to the same bytes + relocation as a jump instruction on
X86. This does not work for PIE executables and shared libraries
though, because we end up with a wrong relocation type. And it has no
chance of working on ARM/AArch64 which use different relocation types
for jump instructions (R_ARM_JUMP24) that is never generated for
data.

This change replaces the constant with module-level inline assembly
followed by a hidden declaration of the jump table. Works fine for
ARM/AArch64, but has some drawbacks.
* Extra symbols are added to the static symbol table, which inflate
the size of the unstripped binary a little. Stripped binaries are not
affected. This happens because jump table declarations must be
external (because their body is in the inline asm).
* Original functions that were anonymous are now named
<original name>.cfi, and it affects symbolization sometimes. This is
necessary because the only user of these functions is the (inline
asm) jump table, so they had to be added to @llvm.used, which does
not allow unnamed functions.

llvm-svn: 286611
2016-11-11 18:49:09 +00:00
Dehao Chen 5492f8646c Add comments about why we put LoopSink pass at the very late stage.
llvm-svn: 286480
2016-11-10 17:42:18 +00:00
Peter Collingbourne 7f00d0a125 Bitcode: Change the materializer interface to return llvm::Error.
Differential Revision: https://reviews.llvm.org/D26439

llvm-svn: 286382
2016-11-09 17:49:19 +00:00
Dehao Chen 947dbe1254 Enable Loop Sink pass for functions that has profile.
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.

Reviewers: davidxl, hfinkel

Subscribers: mehdi_amini, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26155

llvm-svn: 286325
2016-11-09 00:58:19 +00:00
Chad Rosier fbc7b7d154 Fix typo in comment. NFC.
llvm-svn: 286270
2016-11-08 19:10:25 +00:00
Chad Rosier c244349b85 Remove unused include. NFC.
llvm-svn: 286250
2016-11-08 16:51:19 +00:00
Chad Rosier 611b73b1f9 Fix 80-column violations. NFC.
llvm-svn: 286117
2016-11-07 16:28:04 +00:00
Xinliang David Li f450b88191 Fix typo
llvm-svn: 285978
2016-11-04 03:00:52 +00:00
Teresa Johnson 58fbc916a0 [ThinLTO] Rename HasSection to NoRename (NFC)
Summary:
This is in preparation for a change to utilize this flag for symbols
referenced/defined in either inline or module level assembly.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26048

llvm-svn: 285376
2016-10-28 02:24:59 +00:00
Peter Collingbourne 4f3b2df9bb GlobalDCE: Restore a statement accidentally removed in r285048.
llvm-svn: 285052
2016-10-25 02:57:27 +00:00
Peter Collingbourne 7695cb6da8 GlobalDCE: Deduplicate code. NFCI.
llvm-svn: 285048
2016-10-25 01:58:26 +00:00
Rong Xu 1c0e9b97d2 Conditionally eliminate library calls where the result value is not used
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
   sqrt(val);
 is transformed to
   if (val < 0)
     sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.

These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.

Reviewers: hfinkel, mehdi_amini, davidxl

Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz

Differential Revision: https://reviews.llvm.org/D24414

llvm-svn: 284542
2016-10-18 21:36:27 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
David Callahan c1051ab26e Modify df_iterator to support post-order actions
Summary: This makes a change to the state used to maintain visited information for depth first iterator. We know assume a method "completed(...)" which is called after all children of a node have been visited. In all existing cases, this method does nothing so this patch has no functional changes.  It will however allow a client to distinguish back from cross edges in a DFS tree.

Reviewers: nadav, mehdi_amini, dberlin

Subscribers: MatzeB, mzolotukhin, twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D25191

llvm-svn: 283391
2016-10-05 21:36:16 +00:00
Sanjoy Das 0359a193a7 [PruneEH] Be correct in the face IPO
This fixes one spot I had missed in r265762.  Credit goes to Philip
Reames for spotting this one!

llvm-svn: 283137
2016-10-03 19:35:30 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Piotr Padlewski d28694739c [thinlto] Don't decay threshold for hot callsites
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24976

llvm-svn: 282833
2016-09-30 03:01:17 +00:00
Piotr Padlewski ba72b95f7b [thinlto] Add cold-callsite import heuristic
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006

Reviewers: tejohnson, mehdi_amini, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24940

llvm-svn: 282733
2016-09-29 17:32:07 +00:00