Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41883
llvm-svn: 322771
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
While updating clang tests for having clang set dso_local I noticed
that:
- There are *a lot* of tests to update.
- Many of the updates are redundant.
They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.
llvm-svn: 322317
Summary:
LowerTypeTests moves some function definitions from individual object
files to the merged module, leaving a stub to be called in the merged
module's jump table. If an alias was pointing to such a function
definition LowerTypeTests would fail because the alias would be left
without a definition to point to.
This change 1) emits information about aliases to the ThinLTO summary,
2) replaces aliases pointing to function definitions that are moved to
the merged module with function declarations, and 3) re-emits those
aliases in the merged module pointing to the correct function
definitions.
The patch does not correctly fix all possible mis-uses of aliases in
LowerTypeTests. For example, it does not handle aliases with a different
type from the pointed to function.
The addition of alias data increases the size of Chrome build artifacts
by less than 1%.
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc
Differential Revision: https://reviews.llvm.org/D41741
llvm-svn: 322139
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.
The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)
Reviewers: davidxl, silvas
Subscribers: mgorny, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D41604
llvm-svn: 322110
Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.
Reviewers: davidxl, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41461
llvm-svn: 321331
Summary:
In r277849, getEntryCount was changed to return None when the entry
count was 0, specifically for SamplePGO where it means no samples were
recorded. However, for instrumentation PGO a 0 entry count should be
returned directly, since it does mean that the function was completely
cold. Otherwise we end up treating these functions conservatively
in isFunctionEntryCold() and isColdBB().
Instead, for SamplePGO use -1 when there are no samples, and change
getEntryCount to return None when the value is -1.
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41307
llvm-svn: 321018
Summary:
This implements a missing feature to allow importing of aliases, which
was previously disabled because alias cannot be available_externally.
We instead import an alias as a copy of its aliasee.
Some additional work was required in the IndexBitcodeWriter for the
distributed build case, to ensure that the aliasee has a value id
in the distributed index file (i.e. even when it is not being
imported directly).
This is a performance win in codes that have many aliases, e.g. C++
applications that have many constructor and destructor aliases.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D40747
llvm-svn: 320895
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.
Differential Revision: https://reviews.llvm.org/D38566
llvm-svn: 320749
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
Summary:
Make enum ModRefInfo an enum class. Changes to ModRefInfo values should
be done using inline wrappers.
This should prevent future bit-wise opearations from being added, which can be more error-prone.
Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40933
llvm-svn: 320107
This patch factors out the main code transformation utilities in the pgo-driven
indirect call promotion pass and places them in Transforms/Utils. The change is
intended to be a non-functional change, letting non-pgo-driven passes share a
common implementation with the existing pgo-driven pass.
The common utilities are used to conditionally promote indirect call sites to
direct call sites. They perform the underlying transformation, and do not
consider profile information. The pgo-specific details (e.g., the computation
of branch weight metadata) have been left in the indirect call promotion pass.
Differential Revision: https://reviews.llvm.org/D40658
llvm-svn: 319963
Summary:
The aim is to make ModRefInfo checks and changes more intuitive
and less error prone using inline methods that abstract the bit operations.
Ideally ModRefInfo would become an enum class, but that change will require
a wider set of changes into FunctionModRefBehavior.
Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel
Subscribers: nlopes, llvm-commits
Differential Revision: https://reviews.llvm.org/D40749
llvm-svn: 319821
If the thin module has no references to an internal global in the
merged module, we need to make sure to preserve that property if the
global is a member of a comdat group, as otherwise promotion can end
up adding global symbols to the comdat, which is not allowed.
This situation can arise if the external global in the thin module
has dead constant users, which would cause use_empty() to return
false and would cause us to try to promote it. To prevent this from
happening, discard the dead constant users before asking whether a
global is empty.
Differential Revision: https://reviews.llvm.org/D40593
llvm-svn: 319494
Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning).
Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers.
Differential Revision: https://reviews.llvm.org/D38190
llvm-svn: 319398
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.
This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)
LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.
Differential Revision: https://reviews.llvm.org/D39287
llvm-svn: 318195
Summary:
This patch extends the partial inliner to support inlining parts of
vararg functions, if the vararg handling is done in the outlined part.
It adds a `ForwardVarArgsTo` argument to InlineFunction. If it is
non-null, all varargs passed to the inlined function will be added to
all calls to `ForwardVarArgsTo`.
The partial inliner takes care to only pass `ForwardVarArgsTo` if the
varargs handing is done in the outlined function. It checks that vastart
is not part of the function to be inlined.
`test/Transforms/CodeExtractor/PartialInlineNoInline.ll` (already part
of the repo) checks we do not do partial inlining if vastart is used in
a basic block that will be inlined.
Reviewers: davide, davidxl, grosser
Reviewed By: davide, davidxl, grosser
Subscribers: gyiu, grosser, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D39607
llvm-svn: 318028
Summary:
In ThinLTO compilation, we exit populateModulePassManager early and
were not adding PM extension passes meant to run at the end of the
pipeline. This includes sanitizer passes. Add these passes before
the early exit.
A test will be added to projects/compiler-rt.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D39565
llvm-svn: 317714
Blockaddresses refer to the function itself, therefore replacing them
would cause an assertion in doRAUW.
Fixes https://bugs.llvm.org/show_bug.cgi?id=35201
This was found when trying CFI on a proprietary kernel by Dmitry Mikulin.
Differential Revision: https://reviews.llvm.org/D39695
llvm-svn: 317527
Summary: When computing the SUM for indirect call promotion, if the callsite is already promoted in the profile, it will be promoted before ICP. In the current implementation, ICP only sees remaining counts in SUM. This may cause extra indirect call targets being promoted. This patch updates the SUM to include the counts already promoted earlier. This way we do not end up promoting too many indirect call targets.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38763
llvm-svn: 317502
This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide
Reviewed By: davidxl
Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D39137
llvm-svn: 317351
Summary:
InlineFunction can fail, for example when trying to inline vararg
fuctions. In those cases, we do not want to bump partial inlining
counters or set AnyInlined to true, because this could leave an unused
function hanging around.
Reviewers: davidxl, davide, gyiu
Reviewed By: davide
Subscribers: llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39581
llvm-svn: 317314
Summary: In the compile phase of SamplePGO+ThinLTO, ICP is not invoked. Instead, indirect call targets will be included as function metadata for ThinIndex to buidl the call graph. This should not only include functions defined in other modules, but also functions defined in the same module, otherwise ThinIndex may find the callee dead and eliminate it, while ICP in backend will revive the symbol, which leads to undefined symbol.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D39480
llvm-svn: 317118
This is necessary because DCE is applied to full LTO modules. Without
this change, a reference from a dead ThinLTO global to a dead full
LTO global will result in an undefined reference at link time.
This problem is only observable when --gc-sections is disabled, or
when targeting COFF, as the COFF port of lld requires all symbols to
have a definition even if all references are dead (this is consistent
with link.exe).
This change also adds an EliminateAvailableExternally pass at -O0. This
is necessary to handle the situation on Windows where a non-prevailing
copy of a linkonce_odr function has an SEH filter function; any
such filters must be DCE'd because they will contain a call to the
llvm.localrecover intrinsic, passing as an argument the address of the
function that the filter belongs to, and llvm.localrecover requires
this function to be defined locally.
Fixes PR35142.
Differential Revision: https://reviews.llvm.org/D39484
llvm-svn: 317108
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.
Differential Revision: https://reviews.llvm.org/D37355
llvm-svn: 316576
MergeFunctions uses (through FunctionComparator) a map of GlobalValues
to identifiers because it needs to compare functions and globals
do not have an inherent total order. Thus, FunctionComparator
(through GlobalNumberState) has a ValueMap<GlobalValue *>.
r315852 added a RAUW on globals that may have been previously
encountered by the FunctionComparator, which would replace
a GlobalValue * key with a ConstantExpr *, which is illegal.
This commit adjusts that code path to remove the function being
replaced from the ValueMap as well.
llvm-svn: 316145
Summary:
std::unordered_multimap happens to be very slow when the number of elements
grows large. On one of our internal applications we observed a 17x compile time
improvement from changing it to DenseMap.
Reviewers: mehdi_amini, serge-sans-paille, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38916
llvm-svn: 316045
This can result in significant code size savings in some cases,
e.g. an interrupt table all filled with the same assembly stub
in a certain Cortex-M BSP results in code blowup by a factor of 2.5.
Differential Revision: https://reviews.llvm.org/D34806
llvm-svn: 315853
This reduces code size for constructs like vtables or interrupt
tables that refer to functions in global initializers.
Differential Revision: https://reviews.llvm.org/D34805
llvm-svn: 315852
It is possible for both a base and a derived class to be satisfied
with a unique vtable. If a program contains casts of the same pointer
to both of those types, the CFI checks will be lowered to this
(with ThinLTO):
if (p != &__typeid_base_global_addr)
trap();
if (p != &__typeid_derived_global_addr)
trap();
The optimizer may then use the first condition combined
with the assumption that __typeid_base_global_addr and
__typeid_derived_global_addr may not alias to optimize away the second
comparison, resulting in an unconditional trap.
This patch fixes the bug by giving imported globals the type [0 x i8]*,
which prevents the optimizer from assuming that they do not alias.
Differential Revision: https://reviews.llvm.org/D38873
llvm-svn: 315753
parameterized emit() calls
Summary: This is not functional change to adopt new emit() API added in r313691.
Reviewed By: anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38285
llvm-svn: 315476
Summary: In the current implementation, we only have accurate profile count for standalone symbols. For inlined functions, we do not have entry count data because it's not available in LBR. In this patch, we use the first instruction's frequency to estimiate the function's entry count, especially for inlined functions. This may be inaccurate due to debug info in optimized code. However, this is a better estimate than the static 80/20 estimation we have in the current implementation.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D38478
llvm-svn: 315369
Summary: stripPointerCast is not reliably returning the value that's being type-casted. Instead it may look further at function attributes to further propagate the value. Instead of relying on stripPOintercast, the more reliable solution is to directly use the pointer to the promoted direct call.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38603
llvm-svn: 315077
This is a vestige from the GCC-3 days, which disables IPO passes
when set. I don't think anybody actually uses it as there are
several IPO passes which still run with this flag set and
nobody complained/noticed. This reduces the delta between
current and new pass manager and allows us to easily review
the difference when we decide to flip the switch (or audit
which passes should run, FWIW).
llvm-svn: 315043
Summary: In SamplePGO, when an indirect call is promoted in the profiled binary, before profile annotation, it will be promoted and inlined. For the original indirect call, the current implementation will not mark VP profile on it. This is an issue when profile becomes stale. This patch annotates VP prof on indirect calls during annotation.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38477
llvm-svn: 315016
The inliner performs some kind of dead code elimination as it goes,
but there are cases that are not really caught by it. We might
at some point consider teaching the inliner about them, but it
is OK for now to run GlobalOpt + GlobalDCE in tandem as their
benefits generally outweight the cost, making the whole pipeline
faster.
This fixes PR34652.
Differential Revision: https://reviews.llvm.org/D38154
llvm-svn: 314997
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
Reviewers: dblaikie, aprantl
Reviewed By: dblaikie, aprantl
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37877
llvm-svn: 314694
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38094
llvm-svn: 314619
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
This patch contains fix for reverted commit
rL312318 which was causing failure due to use
of unchecked dyn_cast to CIInit.
Patch by: Nikola Prica.
llvm-svn: 313870
Summary:
The fix for dead stripping analysis in the case of SamplePGO indirect
calls to local functions (r313151) introduced the possibility of an
infinite loop.
Make sure we check for the value being already live after we update it
for SamplePGO indirect call handling.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D38086
llvm-svn: 313766
Summary:
See comment for why I think this is a good idea.
This change also:
- Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop.
- Updates the loop pass manager test case to deal with a private ~Loop::Loop.
- Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37996
llvm-svn: 313695
In the lambda we are now returning the remark by value so we need to preserve
its type in the insertion operator. This requires making the insertion
operator generic.
I've also converted a few cases to use the new API. It seems to work pretty
well. See the LoopUnroller for a slightly more interesting case.
llvm-svn: 313691
Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36637
llvm-svn: 313678
Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38018
llvm-svn: 313658
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: vitalybuka, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37779
llvm-svn: 313277
This reland includes a fix for the LowerTypeTests pass so that it
looks past aliases when determining which type identifiers are live.
Differential Revision: https://reviews.llvm.org/D37842
llvm-svn: 313229
This broke Chromium's CFI build; see crbug.com/765004.
> We were previously handling aliases during dead stripping by adding
> the aliased global's "original name" GUID to the worklist. This will
> lead to incorrect behaviour if the global has local linkage because
> the original name GUID will not correspond to the global's GUID in
> the summary.
>
> Because an alias is just another name for the global that it
> references, there is no need to mark the referenced global as used,
> or to follow references from any other copies of the global. So all
> we need to do is to follow references from the aliasee's summary
> instead of the alias.
>
> Differential Revision: https://reviews.llvm.org/D37789
llvm-svn: 313222
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37779
llvm-svn: 313195
We were previously handling aliases during dead stripping by adding
the aliased global's "original name" GUID to the worklist. This will
lead to incorrect behaviour if the global has local linkage because
the original name GUID will not correspond to the global's GUID in
the summary.
Because an alias is just another name for the global that it
references, there is no need to mark the referenced global as used,
or to follow references from any other copies of the global. So all
we need to do is to follow references from the aliasee's summary
instead of the alias.
Differential Revision: https://reviews.llvm.org/D37789
llvm-svn: 313157
Summary:
SamplePGO indirect call profiles record the target as the original GUID
for statics. The importer had special handling to map to the normal GUID
in that case. The dead global analysis needs the same treatment or
inconsistencies arise, resulting in linker unsats due to some dead
symbols being exported and kept, leaving in references to other dead
symbols that are removed.
This can happen when a SamplePGO profile collected by one binary is used
for a different binary, so the indirect call profiles may not accurately
reflect live targets.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37783
llvm-svn: 313151
Summary: This change passes down ACT to SampleProfileLoader for the new PM. Also remove the default value for SampleProfileLoader class as it is not used.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37773
llvm-svn: 313080
Not all targets support the use of absolute symbols to export
constants. In particular, ARM has a wide variety of constant encodings
that cannot currently be relocated by linkers. So instead of exporting
the constants using symbols, export them directly in the summary.
The values of the constants are left as zeroes on targets that support
symbolic exports.
This may result in more cache misses when targeting those architectures
as a result of arbitrary changes in constant values, but this seems
somewhat unavoidable for now.
Differential Revision: https://reviews.llvm.org/D37407
llvm-svn: 312967
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null
Differential Revision: https://reviews.llvm.org/D37628
llvm-svn: 312869
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
Differential Revision: https://reviews.llvm.org/D37121
llvm-svn: 312862
This is required when targeting COFF, as the comdat name must match
one of the names of the symbols in the comdat.
Differential Revision: https://reviews.llvm.org/D37550
llvm-svn: 312767
r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318
Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490
llvm-svn: 312758
This patch provides such debug information for integer
variables whose type is shrinked to bool by providing
dwarf expression which returns either constant initial
value or other value.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D35994
llvm-svn: 312318
Summary:
When jumptable encoding does not match target code encoding (arm vs
thumb), a veneer is inserted by the linker. We can not avoid this
in all cases, because entries within one jumptable must have the same
encoding, but we can make it less common by selecting the jumptable
encoding to match the majority of its targets.
This change only covers FullLTO, and not ThinLTO.
Reviewers: pcc
Subscribers: aemerson, mehdi_amini, javed.absar, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37171
llvm-svn: 312054
Summary:
Cross-DSO CFI needs all __cfi_check exports to use the same encoding
(ARM vs Thumb).
Reviewers: pcc
Subscribers: aemerson, srhines, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37243
llvm-svn: 312052
Summary:
Remove some code that was no longer needed. The first FIXME is
stale since we long ago started using the index to drive importing,
rather than doing force importing based on linkage type. And
now with r309278, we no longer import any aliases.
Reviewers: dblaikie
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D37266
llvm-svn: 312019
Prior to this change (and after r311371), we computed it
unconditionally, causin gsevere compile time regressions (in some
cases, 5 to 10x).
llvm-svn: 311804
We can't reuse the llvm.assume instruction's bitcast because it may not
dominate every user of the vtable pointer.
Differential Revision: https://reviews.llvm.org/D36994
llvm-svn: 311491
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.
I took over this work from Chad Rosier (mcrosier@codeaurora.org).
Differential Revision: https://reviews.llvm.org/D35850
llvm-svn: 311371
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311349
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311273
Summary:
Follow up to fix in r311023, which fixed the case where the combined
index is written to disk. The same samplePGO logic exists for the
in-memory index when computing imports, so we need to filter out
GlobalVariable summaries there too.
Reviewers: davidxl
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D36919
llvm-svn: 311254
Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.
Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.
Patch by Tarun Rajendran.
Differential Revision: https://reviews.llvm.org/D36127
llvm-svn: 310763
The frontend may have requested a higher alignment for any reason, and
downstream optimizations may already have taken advantage of it. We
should keep the same alignment when moving the allocation from the
parameter area to the local variable area.
Fixes PR34038
llvm-svn: 310071
This is similar to what we are doing in "regular" SROA and creates
DW_OP_LLVM_fragment operations to describe the resulting variables.
rdar://problem/33654891
llvm-svn: 310014
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.
This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.
The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).
I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.
A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.
The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.
I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.
We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.
Differential Revision: https://reviews.llvm.org/D36188
llvm-svn: 309784
D33925 added a control flow simplification for -O2 --lto-O0 builds that
manually splits blocks and reassigns conditional branches but does not
correctly update phi nodes. If the else case being branched to had
incoming phi nodes the control-flow simplification would leave phi nodes
in that BB with an unhandled predecessor.
Patch by Vlad Tsyrklevich!
Differential Revision: https://reviews.llvm.org/D36012
llvm-svn: 309621
Summary:
Since r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.
Reviewers: MatzeB
Reviewed By: MatzeB
Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D35949
llvm-svn: 309553
Summary: The original 3.0 hot mupltiplier is too small, and would prevent hot callsites from being inline. This patch increases the hot multilier to 10.0
Reviewers: davidxl, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D35969
llvm-svn: 309344
The alias support was dead code since 2011. It was last touched
in r124182, where it was reintroduced after being removed
in r110434, and since then it was gated behind a HasGlobalAliases
flag that was permanently stuck as `false`.
It is also broken. I'm not sure if it bitrotted or was just broken
in the first place because it appears to have never been tested,
but the following IR results in a crash:
define internal i32 @a(i32 %a, i32 %b) unnamed_addr {
%c = add i32 %a, %b
%d = xor i32 %a, %c
ret i32 %c
}
define internal i32 @b(i32 %a, i32 %b) unnamed_addr {
%c = add i32 %a, %b
%d = xor i32 %a, %c
ret i32 %c
}
It seems safe to remove buggy untested code that no one cared about
for seven years.
Differential Revision: https://reviews.llvm.org/D34802
llvm-svn: 309313
Summary:
Until a more advanced version of importing can be implemented for
aliases (one that imports an alias as an available_externally definition
of the aliasee), skip the narrow subset of cases that was possible but
came at a cost: aliases of linkonce_odr functions could be imported
because the linkonce_odr function could be safely duplicated from the
source module. This came/comes at the cost of not being able to 'home'
imported linkonce functions (they had to be emitted linkonce_odr in all
the destination modules (even if they weren't used by an alias) rather
than as available_externally - causing extra object size).
Tangentially, this also was the only reason ThinLTO would emit multiple
CUs in to the resulting DWARF - which happens to be a problem for
Fission (there's a fix for this in GDB but not released yet, etc).
(actually it's not the only reason - but I'm sending a patch to fix the
other reason shortly)
There's no reason to believe this particularly narrow alias importing
was especially/meaningfully important, only that it was /possible/ to
implement in this way. When a more general solution is done, it should
still satisfy the DWARF concerns above, since the import will still be
available_externally, and thus not create extra CUs.
Since now all aliases are treated the same, I removed/simplified some
test cases since they were testing corner cases where there are no
longer any corners.
Reviewers: tejohnson, mehdi_amini
Differential Revision: https://reviews.llvm.org/D35875
llvm-svn: 309278
Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.
Reviewers: danielcdh, tejohnson, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D35334
llvm-svn: 308760
Previously we were (mis)handling jump table members with a prevailing
definition in a full LTO module and a non-prevailing definition in a
ThinLTO module by dropping type metadata on those functions entirely,
which would cause type tests involving such functions to fail.
This patch causes us to drop metadata only if we are about to replace
it with metadata from cfi.functions.
We also want to replace metadata for available_externally functions,
which can arise in the opposite scenario (prevailing ThinLTO
definition, non-prevailing full LTO definition). The simplest way
to handle that is to remove the definition; there's little value in
keeping it around at this point (i.e. after most optimization passes
have already run) and later code will try to use the function's linkage
to create an alias, which would result in invalid IR if the function
is available_externally.
Fixes PR33832.
Differential Revision: https://reviews.llvm.org/D35604
llvm-svn: 308642
functions.
In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.
While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.
We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.
This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.
llvm-svn: 308417
This reverts commit r308114 (and follow on fixes to test).
There is a linking failure in a ThinLTO bot:
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/
(and undefined reference). It seems like it must be a second order
effect of the heuristic change I made, and may take some time to try
to reproduce locally and track down. Therefore, reverting for now.
llvm-svn: 308206
This restores r308078/r308079 with a fix for bot non-determinisim (make
sure we run llvm-lto in single threaded mode so the debug output doesn't get
interleaved).
llvm-svn: 308114
that appears to exhibit non-determinism and is flaking on the bots
pretty consistently.
r308078: [ThinLTO] Ensure we always select the same function copy to import
r308079: Require asserts in new test that uses debug flag
llvm-svn: 308095
Summary:
Check if the first eligible callee is under the instruction threshold.
Checking this on the first eligible callee ensures that we don't end
up selecting different callees to import when we invoke this routine
with different thresholds due to reaching the callee via paths that
are shallower or hotter (when there are multiple copies, i.e. with
weak or linkonce linkage). We don't want to leave the decision of which
copy to import up to the backend.
Reviewers: mehdi_amini
Subscribers: inglorion, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D35436
llvm-svn: 308078
Summary:
DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere.
This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction.
Reviewers: dberlin, sanjoy, davide, grosser
Reviewed By: dberlin
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35315
llvm-svn: 308040
This normally indicates mixed CFI + non-CFI compilation, and will
result in us treating the function in the same way as a function
defined outside of the LTO unit.
Part of PR33752.
Differential Revision: https://reviews.llvm.org/D35281
llvm-svn: 307744
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.
llvm-svn: 307729
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
querying for analysis results on a function declaration rather than
a definition.
The only reason this worked previously is by chance -- because the way
we got alias analysis results with the legacy PM, we happened to not
compute a dominator tree and so we happened to not hit an assert even
though it didn't make any real sense. Now we bail out before trying to
compute alias analysis so that we don't hit these asserts.
llvm-svn: 307625
Summary:
This solves PR33641.
When removing a dead argument we must also handle possibly existing calls
to llvm.dbg.value that use the removed argument. Now we change the use
of the otherwise dead argument to an undef for some other pass to cleanup
later.
If the calls are left untouched, they will later on cause errors:
"function-local metadata used in wrong function"
since the ArgumentPromotion rewrites the code by creating a new function
with the wanted signature, but the metadata is not recreated so the new
function may then erroneously use metadata from the old function.
Reviewers: mstorsjo, rnk, arsenm
Reviewed By: rnk
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34874
llvm-svn: 307521
the invalidation propagation logic from an SCC to a Function.
I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.
However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.
Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.
Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.
llvm-svn: 307487
Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D35096
llvm-svn: 307439
This is the same as r304719 but for ThinLTO.
The substantial difference is that in this case we don't have
whole visibility, just the summary.
In the LTO case, when we got the resolution for the input file we
could just see if the linker told us whether a symbol was linker
redefined (using --wrap or --defsym) and switch the linkage directly
for the GV.
Here, we have the summary. So, we record that the linkage changed
from <whatever it was> to $weakany to prevent IPOs across this symbol
boundaries and actually just switch the linkage at FunctionImport time.
This patch should also fixes the lld bits (as all the scaffolding for
communicating if a symbol is linker redefined should be there & should
be the same), but I'll make sure to add some tests there as well.
Fixes PR33192.
Differential Revision: https://reviews.llvm.org/D35064
llvm-svn: 307303
Summary:
GlobalExtensions is dereferenced twice, once for iteration and then a check if it is empty.
As a ManagedStatic this dereference forces it's construction which is unnecessary.
Reviewers: efriedma, davide, mehdi_amini
Reviewed By: mehdi_amini
Subscribers: chapuni, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D33381
llvm-svn: 307229
LLVM's definition of dominance allows instructions that are cyclic
in unreachable blocks, e.g.:
%pat = select i1 %condition, @global, i16* %pat
because any instruction dominates an instruction in a block that's
not reachable from entry.
So, remove unreachable blocks from the function, because a) there's
no point in analyzing them and b) GlobalOpt should otherwise grow
some more complicated logic to break these cycles.
Differential Revision: https://reviews.llvm.org/D35028
llvm-svn: 307215
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html
Time to actually let go and move forward. =]
I've updated the release notes both about the removal and the
deprecation of the corresponding C API.
llvm-svn: 306797
Summary: r305009 disables recursive inlining for indirect calls in sample loader pass. The same logic applies to direct recursive calls.
Reviewers: iteratee, davidxl
Reviewed By: iteratee
Subscribers: sanjoy, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D34456
llvm-svn: 305934
Summary: Fixes an issue using RegisterStandardPasses from a statically linked object before PassManagerBuilder::addGlobalExtension is called from a dynamic library.
Reviewers: efriedma, theraven
Reviewed By: efriedma
Subscribers: mehdi_amini, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D33515
llvm-svn: 305303
This restores the order of evaluation (& conditionalized evaluation) of
isTriviallyDeadInstruction, InlineHistoryIncludes, and shouldInline
(with the addition of a shouldInline call after
isTriviallyDeadInstruction) from before r305245.
llvm-svn: 305267
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline. We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.
This is turned off by default. A follow-up change will turn it on to
allow for easier reversion in case it breaks something.
llvm-svn: 305146
Other comments/implications are that this isn't intended behavior (nor
perserved/reimplemented in the new inliner) & complicates fixing the
'inlining' of trivially dead calls without consulting the cost function
first.
llvm-svn: 305052
Since D17854 LinkerSubsectionsViaSymbols is unnecessary.
It is interfering with ThinLTO implementation of CFI-ICall, where
the aliases used on the !LinkerSubsectionsViaSymbols branch are
needed to export jump tables to ThinLTO backends.
This is the second attempt to land this change after fixing PR33316.
llvm-svn: 305031
This is to prepare to allow for dead stripping of globals in the
merged modules.
Differential Revision: https://reviews.llvm.org/D33921
llvm-svn: 305027
Summary: Early-inlining of recursive call makes the code size bloat exponentially. We should not disable it.
Reviewers: davidxl, dnovillo, iteratee
Reviewed By: iteratee
Subscribers: iteratee, llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D34017
llvm-svn: 305009
This makes it so that the code quality for CFI checks when compiling
with -O2 and linking with --lto-O0 is similar to that of the rest of
the code.
Reduces the size of a chrome binary built with -O2/--lto-O0 by
about 750KB.
Differential Revision: https://reviews.llvm.org/D33925
llvm-svn: 304921
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Minor optimization but mostly simplifies my debugging so I'm not dealing
with empty SCCNodeSets while investigating issues in this optimization.
llvm-svn: 304597
Since D17854 LinkerSubsectionsViaSymbols is unnecessary.
It is interfering with ThinLTO implementation of CFI-ICall, where
the aliases used on the !LinkerSubsectionsViaSymbols branch are
needed to export jump tables to ThinLTO backends.
llvm-svn: 304582
Replace GVFlags::LiveRoot with GVFlags::Live and use that instead of
all the DeadSymbols sets. This is refactoring in order to make
liveness information available in the RegularLTO pipeline.
llvm-svn: 304466
Summary: Also see D33429 for other ThinLTO + New PM related changes.
Reviewers: davide, chandlerc, tejohnson
Subscribers: mehdi_amini, Prazek, cfe-commits, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D33525
llvm-svn: 304378
The whole-program-devirt pass needs to run at -O0 because only it
knows about the llvm.type.checked.load intrinsic: it needs to both
lower the intrinsic itself and handle it in the summary.
Differential Revision: https://reviews.llvm.org/D33571
llvm-svn: 304019
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.
This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:
[ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ]
[ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ]
\ /
[ %e = phi i32 %a2, %c2 ]
[ add i32 %e, 4 ]
GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.
What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.
This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.
Differential revision: https://reviews.llvm.org/D24805
llvm-svn: 303850
Implemented frequency based cost/saving analysis
and related options.
The pass is now in a state ready to be turne on
in the pipeline (in follow up).
Differential Revision: http://reviews.llvm.org/D32783
llvm-svn: 302967
This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.
Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.
llvm-svn: 302705
This change is required because the notion of count is different for
sample profiling and getProfileCount will need to determine the
underlying profile type.
Differential revision: https://reviews.llvm.org/D33012
llvm-svn: 302597
This fixes a regression since SVN rev 273808 (which was supposed to
not change functionality).
The regression caused miscompilations (noted in the wild when targeting
AArch64) on platforms with 32 bit long.
Differential Revision: https://reviews.llvm.org/D32850
llvm-svn: 302137
When profiling a no-op incremental link of Chromium I found that the functions
computeImportForFunction and computeDeadSymbols were consuming roughly 10% of
the profile. The goal of this change is to improve the performance of those
functions by changing the map lookups that they were previously doing into
pointer dereferences.
This is achieved by changing the ValueInfo data structure to be a pointer to
an element of the global value map owned by ModuleSummaryIndex, and changing
reference lists in the GlobalValueSummary to hold ValueInfos instead of GUIDs.
This means that a ValueInfo will take a client directly to the summary list
for a given GUID.
Differential Revision: https://reviews.llvm.org/D32471
llvm-svn: 302108
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
llvm-svn: 302060
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.
NFC
llvm-svn: 301666
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
This is a second re-land of r298158. This time, this feature is
limited to -fdata-sections builds.
llvm-svn: 301587
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
Just calling dropAllReferences leaves pointers to the ConstantExpr
behind, so we would eventually crash with a null pointer dereference.
Differential Revision: https://reviews.llvm.org/D32551
llvm-svn: 301575
also a discussion about exactly what we should do prior to re-enabling
it.
The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.
llvm-svn: 301505
Commits were:
"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"
The changes assumed pointers are 8 byte aligned on all architectures.
llvm-svn: 301429
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
Reviewers: dblaikie, davide
Reviewed By: davide
Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D32266
llvm-svn: 301424
Summary:
Otherwise we might end up with some empty basic blocks or
single-entry-single-exit basic blocks.
This fixes PR32085
Reviewers: chandlerc, danielcdh
Subscribers: mehdi_amini, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D30468
llvm-svn: 301395
... in the per-TU -O0 pipeline.
The problem is that there could be passes registered using
`addExtensionsToPM()` introducing unnamed globals.
Asan is an example, but there may be others. Building cppcheck
with `-flto=thin` and `-fsanitize=address` triggers an assertion
while we're reading bitcode (in lib/LTO), as the BitcodeReader
assumes there are no unnamed globals (because the namer has run).
Unfortunately I wasn't able to find an easy way to test this.
I added a comment in the hope nobody moves this again.
llvm-svn: 301102
This should simplify the call sites, which typically want to tweak one
attribute at a time. It should also avoid creating ephemeral
AttributeLists that live forever.
llvm-svn: 300718
Before this patch, we always called method 'findCalleeFunctionSamples()' on
intrinsic calls. However, intrinsic calls like llvm.dbg.value() are not viable
candidates for obvious reasons.
No functional change intended.
Differential Revision: https://reviews.llvm.org/D32008
llvm-svn: 300541
Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.
Reviewers: davidxl, dnovillo, tejohnson
Reviewed By: davidxl
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31952
llvm-svn: 300507
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.
This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.
On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html
Differential Revision: https://reviews.llvm.org/D31838
llvm-svn: 300464
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.
llvm-svn: 300367
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.
The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.
NFC
llvm-svn: 300272
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.
Reviewers: davidxl, dnovillo
Reviewed By: davidxl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31950
llvm-svn: 300240
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.
Add a test case for this.
llvm-svn: 300229
This seems like a much more natural API, based on Derek Schuff's
comments on r300015. It further hides the implementation detail of
AttributeList that function attributes come last and appear at index
~0U, which is easy for the user to screw up. git diff says it saves code
as well: 97 insertions(+), 137 deletions(-)
This also makes it easier to change the implementation, which I want to
do next.
llvm-svn: 300153
Summary:
COFF requires that every comdat contain a symbol with the same name as
the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this
requirement to be violated. This change avoids such violations by
renaming comdats if their leaders are renamed. It also keeps comdats
together when splitting modules.
Reviewers: pcc, mehdi_amini, tejohnson
Reviewed By: pcc
Subscribers: rnk, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31963
llvm-svn: 300019
Summary:
For now, it just wraps AttributeSetNode*. Eventually, it will hold
AvailableAttrs as an inline bitset, and adding and removing enum
attributes will be super cheap.
This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h.
Reviewers: pete, chandlerc
Subscribers: llvm-commits, jfb
Differential Revision: https://reviews.llvm.org/D31940
llvm-svn: 300014
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299949
Summary:
In rL299692 I improved strip-dead-debug-info's ability to drop CUs that are not
referenced from the current module. However, in doing so I neglected to realize
that some SPs could be referenced entirely from inlined functions. It appears
I was not the only one to make this mistake, because DebugInfoFinder, doesn't
find those SPs either. Fix this in DebugInfoFinder and then use that to make
sure not to drop those CUs in strip-dead-debug-info.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31904
llvm-svn: 299936
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
llvm-svn: 299925
This re-lands r299875.
I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:
// Collect any return attributes from the call.
- if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
- newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
- oldAttrs.getRetAttributes()));
+ newAttrs.push_back(oldAttrs.getRetAttributes());
Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)
That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.
llvm-svn: 299899
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.
The problematic assumptions include:
- That the pointer size used for the stack is the same size as
the code size pointer, which is also the maximum sized pointer.
- That 0 is an invalid, non-dereferencable pointer value.
These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.
llvm-svn: 299888
Summary: Now the SamplePGO support is more stable, we do not need so many verbose optimization remarks emitted.
Reviewers: dnovillo, davidxl
Reviewed By: davidxl
Subscribers: fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D31826
llvm-svn: 299883
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.
I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.
My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.
I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.
Reviewers: pete, chandlerc, sanjoy, hfinkel
Reviewed By: pete
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31198
llvm-svn: 299875
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299699
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
This is a re-land of r298158 rebased on D31358. This time,
asan.module_ctor is put in a comdat as well to avoid quadratic
behavior in Gold.
llvm-svn: 299697
Summary:
Prior to this while it would delete the dead DIGlobalVariables, it would
leave dead DICompileUnits and everything referenced therefrom. For a bit
bitcode file with thousands of compile units those dead nodes easily
outnumbered the real ones. Clean that up.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D31720
llvm-svn: 299692