Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Made getName helper to return std::string (instead of StringRef initially) to fix
asan builtbot failures on CGSCC tests.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342664
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342597
Summary:
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342544
Summary:
Like with other similar intrinsics, presense of strip or
launder.invariant.group should not change the result of inlining cost.
This is because they are just markers and do not perform any computation.
Reviewers: amharc, rsmith, reames, kuhar
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D51814
llvm-svn: 341725
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
define void @test1() {
call void @bar(i1 true) #0
call void @bar(i1 false) #2
ret void
}
attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
..
attributes #2 = { "inline-remark"="(cost=never): recursive" }
Patch by: yrouban (Yevgeny Rouban)
Reviewers: xbolva00, tejohnson, apilipenko
Reviewed By: xbolva00, tejohnson
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50435
llvm-svn: 340834
Summary:
This fixes PR31105.
There is code trying to delete dead code that does so by e.g. checking if
the single predecessor of a block is the block itself.
That check fails on a block like this
bb:
br i1 undef, label %bb, label %bb
since that has two (identical) predecessors.
However, after the check for dead blocks there is a call to
ConstantFoldTerminator on the basic block, and that call simplifies the
block to
bb:
br label %bb
Therefore we now do the call to ConstantFoldTerminator before the check if
the block is dead, so it can realize that it really is.
The original behavior lead to the block not being removed, but it was
simplified as above, and then we did a call to
Dest->replaceAllUsesWith(&*I);
with old and new being equal, and an assertion triggered.
Reviewers: chandlerc, fhahn
Reviewed By: fhahn
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D51280
llvm-svn: 340820
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
define void @test1() {
call void @bar(i1 true) #0
call void @bar(i1 false) #2
ret void
}
attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
..
attributes #2 = { "inline-remark"="(cost=never): recursive" }
Patch by: yrouban (Yevgeny Rouban)
Reviewers: xbolva00, tejohnson, apilipenko
Reviewed By: xbolva00, tejohnson
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50435
llvm-svn: 340618
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338969
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338494
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338387
Summary:
Normally, inling does not happen if caller does not have
"null-pointer-is-valid"="true" attibute but callee has it.
However, alwaysinline may force callee to be inlined.
In this case, if the caller has the "null-pointer-is-valid"="true"
attribute, copy the attribute to caller.
Reviewers: efriedma, a.elovikov, lebedev.ri, jyknight
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50000
llvm-svn: 338292
Summary:
Check if the parent basic block and caller exists
before calling CS.getCaller when constant folding
strip.invariant.group instrinsic.
This avoids a crash when the function containing the intrinsic
is being inlined. The instruction is checked for any simplifiction
but has not yet been added to a basic block.
Reviewers: Prazek, rsmith, efriedma
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D49690
llvm-svn: 337742
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.
The iterator may be invalidated if a pass removes a function, ex.:
llvm::LegacyInlinerBase::inlineCalls
inlineCallsImpl
llvm::CallGraph::removeFunctionFromModule
llvm-svn: 337018
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.
This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.
Should fix PR38029.
llvm-svn: 336419
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.
Reviewers: wmi, dexonsmith
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D48729
llvm-svn: 335914
Summary:
Patch for capture tracking broke
bootstrap of clang with -fstict-vtable-pointers
which resulted in debbugging nightmare. It was fixed
https://reviews.llvm.org/D46900 but as it turned
out, there were other parts like inliner (computing of
noalias metadata) that I found after bootstraping with enabled
assertions.
Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar
Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47088
llvm-svn: 333070
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is
!DILabel(scope: !1, name: "foo", file: !2, line: 3)
We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is
llvm.dbg.label(metadata !1)
It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.
We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.
Differential Revision: https://reviews.llvm.org/D45024
Patch by Hsiangkai Wang.
llvm-svn: 331841
Summary: Makes this consistent with the old PM.
Reviewers: eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D46526
llvm-svn: 331709
Summary: @llvm.icall.branch.funnel is musttail with variable number of
arguments. After inlining current backend can't separate call targets from call
arguments.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45116
llvm-svn: 329235
Summary:
This was motivated by absence of PrunEH functionality in new PM.
It was decided that a proper way to do PruneEH is to add NoUnwind inference
into PostOrderFunctionAttrs and then perform normal SimplifyCFG on top.
This change generalizes attribute handling implemented for (a removal of)
Convergent attribute, by introducing a generic builder-like class
AttributeInferer
It registers all the attribute inference requests, storing per-attribute
predicates into a vector, and then goes through an SCC Node, scanning all
the instructions for not breaking attribute assumptions.
The main idea is that as soon all the instructions from all the functions
of SCC Node conform to attribute assumptions then we are free to infer
the attribute as set for all the functions of SCC Node.
It handles two distinct cases of attributes:
- those that might break due to derefinement of the function code
for these attributes we are allowed to apply inference only if all the
functions are "exact definitions". Example - NoUnwind.
- those that do not care about derefinement
for these attributes we are allowed to apply inference as soon as we see
any function definition. Example - removal of Convergent attribute.
Also in this commit:
* Converted all the FunctionAttrs tests to use FileCheck and added new-PM
invocations to them
* FunctionAttrs/convergent.ll test demonstrates a difference in behavior between
new and old PM implementations. Marked with FIXME.
* PruneEH tests were converted to new-PM as well, using function-attrs+simplify-cfg
combo as intended
* some of "other" tests were updated since function-attrs now infers 'nounwind'
even for old PM pipeline
* -disable-nounwind-inference hidden option added as a possible workaround for a supposedly
rare case when nounwind being inferred by default presents a problem
Reviewers: chandlerc, jlebar
Reviewed By: jlebar
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D44415
llvm-svn: 328377
This prevents functions accessing varargs from being inlined if they
have the alwaysinline attribute.
Reviewers: efriedma, rnk, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42556
llvm-svn: 323619
Summary:
After teaching InlineCost more about address spaces ()
another fault was detected in the inliner. If an argument has
the byval attribute the parameter might be copied to an alloca.
That part seems to work fine even if the argument has a different
address space than the alloca address space. However, if the
address spaces differ, then the inlined function still might
refer to the parameter using the original address space (the
inliner does not handle that situation very well).
This patch avoids the problem by simply disallowing inlining
when there are byval arguments with address space that differs
from the alloca address space.
I'm not really sure how to transform the code if we want to
get inlining for this situation. I assume that it never has
been working, and that the fixes in r321809 just exposed an
old problem.
Fault found by skatkov (Serguei Katkov). It is mentioned in
follow up comments to https://reviews.llvm.org/D40455.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: uabelho, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D41898
llvm-svn: 322181
If the varargs are not accessed by a function, we can inline the
function.
Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41335
llvm-svn: 321940
Summary:
I basically copied this patch from here:
https://reviews.llvm.org/D1251
But I skipped some of the refactoring to make the patch more clean.
The new outer3/inner3 test case in ptr-diff.ll triggers the
following assert without this patch:
lib/IR/Constants.cpp:1834: static llvm::Constant *llvm::ConstantExpr::getCompare(unsigned short, llvm::Constant *, llvm::Constant *, bool): Assertion `C1->getType() == C2->getType() && "Op types should be identical!"' failed.
The other new test cases makes sure that there is code coverage
for all modifications in InlineCost.cpp (getting different values
due to not fetching sizes for address space zero). I only guarantee
code coverage for those tests. The tests are not written in a way
that they would break if not having the corrections in
InlineCost.cpp. I found it quite hard to fine tune the tests into
getting different results based on the pointer sizes (except for
the test case where we hit an assert if not teaching InlineCost
about address spaces).
Reviewers: chandlerc, arsenm, haicheng
Reviewed By: arsenm
Subscribers: wdng, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D40455
llvm-svn: 321809
Currently, inline cost model considers a binary operator as free only if both
its operands are constants. Some simple cases are missing such as a + 0, a - a,
etc. This patch modifies visitBinaryOperator() to call SimplifyBinOp() without
going through simplifyInstruction() to get rid of the constant restriction.
Thus, visitAnd() and visitOr() are not needed.
Differential Revision: https://reviews.llvm.org/D41494
llvm-svn: 321366
The penalty is currently getting applied in a bunch of places where it
doesn't make sense, like bitcasts (which are free) and calls (which
were getting the call penalty applied twice). Instead, just apply the
penalty to binary operators and floating-point casts.
While I'm here, also fix getFPOpCost() to do the right thing in more
cases, so we don't have to dig into function attributes.
Differential Revision: https://reviews.llvm.org/D41522
llvm-svn: 321332
SROA analysis of InlineCost can figure out that some stores can be removed
after inlining and then the repeated loads clobbered by these stores are also
free. This patch finds these clobbered loads and adjust the inline cost
accordingly.
Differential Revision: https://reviews.llvm.org/D33946
llvm-svn: 320814
This patch fix this FIXME in visitPHI()
FIXME: We should potentially be tracking values through phi nodes,
especially when they collapse to a single value due to deleted CFG edges
during inlining.
Differential Revision: https://reviews.llvm.org/D38594
llvm-svn: 320699
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
Re-commit after fixing the failing testcase in rL319576, rL319577 and
rL319578.
llvm-svn: 319581
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
llvm-svn: 319556
enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2,
TCK_NoTail = 3 };
TCK_NoTail is greater than TCK_Tail so taking the min does not do the
correct thing.
rdar://35639547
llvm-svn: 319075
Test needs some slight adjustment because we no longer check the existence of
BFI but rather that the actual hotness is set on the remark. If entry_count
is not set getBlockProfileCount returns None.
llvm-svn: 314874
InlineCost can understand Select IR now. This patch finds free Select IRs and
continue the propagation of SimplifiedValues, ConstantOffsetPtrs, and
SROAArgValues.
Differential Revision: https://reviews.llvm.org/D37198
llvm-svn: 314307
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.
Reviewers: majnemer
Subscribers: eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D36904
llvm-svn: 312569
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().
If someone needs to upgrade out-of-tree testcases:
perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.
llvm-svn: 312144
Summary: We need to have accurate-sample-profile in function attribute so that it works with LTO.
Reviewers: davidxl, rsmith
Reviewed By: davidxl
Subscribers: sanjoy, mehdi_amini, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37113
llvm-svn: 311706
Summary: This patch adds test to cover the logic guarded by "accurate-sample-profile" flag.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: sanjoy, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37084
llvm-svn: 311618
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.
This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.
See also PR22780, for making DIExpression not be an MDNode.
Reviewers: aprantl, dexonsmith, dblaikie
Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37075
llvm-svn: 311594
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311349
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311273
a function into itself.
We tried to fix this before in r306495 but that got reverted as the
assert was actually hit.
This fixes the original bug (which we seem to have lost track of with
the revert) by blocking a second remapping when the function being
inlined is also the caller and the remapping could succeed but
erroneously.
The included test case would actually load from an inlined copy of the
alloca before this change, failing to load the stored value and
miscompiling.
Many thanks to Richard Smith for diagnosing a user miscompile to this
bug, and to Kyle for the first attempt and initial analysis and David Li
for remembering the issue and how to fix it and suggesting the patch.
I'm just stitching it together and landing it. =]
llvm-svn: 311229
localized to the code that uses those analyses.
Technically, this can change behavior as we no longer require the
existence of the ProfileSummaryInfo analysis to use local profile
information via BFI. We didn't actually require the PSI to have an
interesting profile though, so this only really impacts the behavior in
non-default pass pipelines.
IMO, this makes it substantially less surprising how everything works --
before an analysis that wasn't actually used had to exist to trigger
*any* profile aware inlining. I think the new organization makes it more
obvious where various checks for profile signals happen.
Differential Revision: https://reviews.llvm.org/D36710
llvm-svn: 310888
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.
*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.
Reviewers: chandlerc
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36157
llvm-svn: 309886
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.
This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.
The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).
I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.
A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.
The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.
I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.
We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.
Differential Revision: https://reviews.llvm.org/D36188
llvm-svn: 309784
Summary:
Inlining threshold is increased by application of bonuses when the
callee has a single reachable basic block or is rich in vector
instructions. Similarly, inlining cost is reduced by applying a large
bonus when the last call to a static function is considered for
inlining. This patch disables the application of these bonuses when the
callsite or the callee is cold. The intention here is to prevent a large
cold callsite from being inlined to a non-cold caller that could prevent
the caller from being inlined. This is especially important when the
cold callsite is a last call to a static since the associated bonus is
very high.
Reviewers: chandlerc, davidxl
Subscribers: danielcdh, llvm-commits
Differential Revision: https://reviews.llvm.org/D35823
llvm-svn: 309441
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.
Differential Revision: https://reviews.llvm.org/D34458
llvm-svn: 308076
Summary:
Similar to X86, it should be safe to inline callees if their
target-features are a subset of the caller. As some subtarget features
provide different instructions depending on whether they are set or
unset (e.g. ThumbMode and ModeSoftFloat), we use a whitelist of
target-features describing hardware capabilities only.
Reviewers: kristof.beyls, rengolin, t.p.northover, SjoerdMeijer, peter.smith, silviu.baranga, efriedma
Reviewed By: SjoerdMeijer, efriedma
Subscribers: dschuff, efriedma, aemerson, sdardis, javed.absar, arichardson, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34697
llvm-svn: 307889
invalidation of analyses when merging SCCs.
While I've added a bunch of testing of this, it takes something much
more like the inliner to really trigger this as you need to have
partially-analyzed SCCs with updates at just the right time. So I've
added a direct test for this using the inliner and verifying the
domtree. Without the changes here, this test ends up finding a stale
dominator tree.
However, to handle this properly, we need to invalidate analyses
*before* merging the SCCs. After talking to Philip and Sanjoy about this
they convinced me this was the right approach. To do this, we need
a callback mechanism when merging SCCs so we can observe the cycle that
will be merged before the merge happens. This API update ended up being
surprisingly easy.
With this commit, the new PM passes the test-suite again. It hadn't
since MemorySSA was enabled for EarlyCSE as that also will find this bug
very quickly.
llvm-svn: 307498
the invalidation propagation logic from an SCC to a Function.
I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.
However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.
Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.
Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.
llvm-svn: 307487
This commit pretty much rolls back the logic added in r306495
as in the testcase provided we simplify an `icmp` looking through
a PHI that hasn't been mapped yet.
I think instsimplify shouldn't do threading over select/phis or
just looking through phis in general, but this is what we have
now. Also, add a test to prevent this from happening in case somebody
wants to modify this code again.
Briefly discussed with Kyle Butt (thanks Kyle!).
llvm-svn: 306938
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html
Reviewers: anemet, davidxl, hfinkel
Reviewed By: anemet
Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34867
llvm-svn: 306912
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.
Reviewers: hans, echristo, dmgreen
Reviewed By: dmgreen
Subscribers: mcrosier, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D34436
llvm-svn: 306118
Also document the attribute, since "probe-stack" already is.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D34528
llvm-svn: 306069
This attribute is used to ensure the guard page is triggered on stack
overflow. Stack frames larger than the guard page size will generate
a call to __probestack to touch each page so the guard page won't
be skipped.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D34386
llvm-svn: 305939
Other comments/implications are that this isn't intended behavior (nor
perserved/reimplemented in the new inliner) & complicates fixing the
'inlining' of trivially dead calls without consulting the cost function
first.
llvm-svn: 305052
Summary:
This is to enable the new switch inline cost heuristic (r301649) by removing the
old heuristic as well as the flag itself.
In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and
8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64.
No significant code size / performance regression was found in O3/O2/Os. No
significant complain was reported from the llvm-dev thread.
Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo
Reviewed By: echristo
Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini
Differential Revision: https://reviews.llvm.org/D32653
llvm-svn: 304594
The added test case is to check whether the simplified value is passed to
getGEPCost().
Differential Revision: https://reviews.llvm.org/D33779
llvm-svn: 304454
Summary:
With instrumentation profiling, when updating the VP metadata after
an inline, VP metadata on the inlined copy was inadvertantly having
all counts zeroed out. This was causing indirect calls from code inlined
during the call step to be marked as cold in the ThinLTO summaries and
not imported.
The CallerBFI needs to be passed down so that the CallSiteCount can be
computed from the profile summary info. With Sample PGO this was working
since the count is extracted from the branch weight metadata on the
call being inlined (even before we stopped looking at metadata for
non-sample PGO in r302844 this largely wasn't working for instrumentation
PGO since only promoted indirect calls would be getting inlined and have
the metadata).
Added an instrumentation PGO test and renamed the sample PGO test.
Reviewers: danielcdh, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D33389
llvm-svn: 303574
Update threshold based on callee's hotness only when BFI is not available.
Otherwise use only callsite's hotness. This makes it easier to reason about
hotness related threshold updates.
Differential revision: https://reviews.llvm.org/D33157
llvm-svn: 303210
Summary:
Don't use the metadata on call instructions for determining hotness
unless we are in sample PGO mode, where it is needed because profile
counts are not accurate. In instrumentation mode this is not necessary
and does more harm than good when calls have VP metadata that hasn't
been properly scaled after transformations or dropped after constant
prop based devirtualization (both should be fixed, but we don't need
to do this in the first place for instrumentation PGO).
This required adjusting a number of tests to distinguish between sample
and instrumentation PGO handling, and to add in profile summary metadata
so that getProfileCount can get the summary.
Reviewers: davidxl, danielcdh
Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D32877
llvm-svn: 302844
I ran the test-suite (including SPEC 2006) in PGO mode comparing cold
thresholds of 225 and 45. Here are some stats on the text size:
Out of 904 tests that ran, 197 see a change in text size. The average
text size reduction (of all the 904 binaries) is 1.07%. Of the 197
binaries, 19 see a text size increase, as high as 18%, but most of them
are small single source benchmarks. There are 3 multisource benchmarks
with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On
the other side of the spectrum, 31 benchmarks see >10% size reduction
and 6 of them are MultiSource.
I haven't run the test-suite with other values of inlinecold-threshold.
Since we have a cold callsite threshold of 45, I picked this value.
Differential revision: https://reviews.llvm.org/D33106
llvm-svn: 302829
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets
```
lor.lhs.false2: ; preds = %if.then
switch i32 %Status, label %if.then27 [
i32 -7012, label %if.end35
i32 -10008, label %if.end35
i32 -10016, label %if.end35
i32 15000, label %if.end35
i32 14013, label %if.end35
i32 10114, label %if.end35
i32 10107, label %if.end35
i32 10105, label %if.end35
i32 10013, label %if.end35
i32 10011, label %if.end35
i32 7008, label %if.end35
i32 7007, label %if.end35
i32 5002, label %if.end35
]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)
```
.LBB853_9: // %lor.lhs.false2
mov w8, #10012
cmp w19, w8
b.gt .LBB853_14
// BB#10: // %lor.lhs.false2
mov w8, #5001
cmp w19, w8
b.gt .LBB853_18
// BB#11: // %lor.lhs.false2
mov w8, #-10016
cmp w19, w8
b.eq .LBB853_23
// BB#12: // %lor.lhs.false2
mov w8, #-10008
cmp w19, w8
b.eq .LBB853_23
// BB#13: // %lor.lhs.false2
mov w8, #-7012
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_14: // %lor.lhs.false2
mov w8, #14012
cmp w19, w8
b.gt .LBB853_21
// BB#15: // %lor.lhs.false2
mov w8, #-10105
add w8, w19, w8
cmp w8, #9 // =9
b.hi .LBB853_17
// BB#16: // %lor.lhs.false2
orr w9, wzr, #0x1
lsl w8, w9, w8
mov w9, #517
and w8, w8, w9
cbnz w8, .LBB853_23
.LBB853_17: // %lor.lhs.false2
mov w8, #10013
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_18: // %lor.lhs.false2
mov w8, #-7007
add w8, w19, w8
cmp w8, #2 // =2
b.lo .LBB853_23
// BB#19: // %lor.lhs.false2
mov w8, #5002
cmp w19, w8
b.eq .LBB853_23
// BB#20: // %lor.lhs.false2
mov w8, #10011
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_21: // %lor.lhs.false2
mov w8, #14013
cmp w19, w8
b.eq .LBB853_23
// BB#22: // %lor.lhs.false2
mov w8, #15000
cmp w19, w8
b.ne .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.
This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
-inline-generic-switch-cost=false
This change was originally proposed by Haicheng in D29870.
Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier
Reviewed By: hans
Subscribers: joerg, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D31085
llvm-svn: 301649
Summary: Declarations need to be filtered out when counting functions.
Reviewers: eraman
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31336
llvm-svn: 298720
Summary: Inliner should update the branch_weights annotation to scale it to proper value.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D30767
llvm-svn: 298270
in r297374.
I've extracted a small version of this from the C++ metaprogram Richard
came up with to exercise these kinds of issues and written comments to
describe both how to reproduce a fresh version of the test case and what
likely failure modes are.
The test case is still a bit brittle as it depends on the particular
inline cost modeling and SCC visitation order, but it definitely would
have caught the bug right away when developing things so it seems
a really valuable test case to have.
llvm-svn: 297935
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.
This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.
Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.
Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;
void g(const char *);
template <bool K, int N> void f(bool *B, bool *E) {
if (K)
g(str<N>);
if (B == E)
return;
if (*B)
f<true, N + 1>(B + 1, E);
else
f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }
extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```
When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.
What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.
The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.
We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.
I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.
The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
we consider for inlining.
llvm-svn: 297374
This reverts commit r296488.
As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.
llvm-svn: 297163
The LLVM backend cannot produce any debug info for an llvm::Function
without a DISubprogram attachment. When inlining a debug-info-carrying
function into a nodebug function, there is therefore no reason to keep
any debug info intrinsic calls or debug locations on the instructions.
This fixes a problem discovered in PR32042.
rdar://problem/30679307
llvm-svn: 296488