minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
Thanks for the review from Saleem!
Differential Revision: https://reviews.llvm.org/D55641
llvm-svn: 350503
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.
llvm-svn: 350456
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.
Reviewers: tejohnson, grimar
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56117
llvm-svn: 350269
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced
Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.
Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.
Differential Revision: https://reviews.llvm.org/D55785
llvm-svn: 349513
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.
This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.
The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.
Differential Revision: https://reviews.llvm.org/D55716
llvm-svn: 349509
This modifies the IPO pass so that it respects any explicit function
address space specified in the data layout.
In targets with nonzero program address spaces, all functions should, by
default, be placed into the default program address space.
This is required for Harvard architectures like AVR. Without this, the
functions will be marked as residing in data space, and thus not be
callable.
This has no effect to any in-tree official backends, as none use an
explicit program address space in their data layouts.
Patch by Tim Neumann.
llvm-svn: 349469
ProfileSampleAccurate is used to indicate the profile has exact match to the
code to be optimized.
Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
ProfileSampleAccurate in multiple places.
Differential Revision: https://reviews.llvm.org/D55660
llvm-svn: 349088
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.
Reviewers: tejohnson
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43521
llvm-svn: 349076
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.
#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)
is the same as
#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)
and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.
This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,
!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}
defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.
Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.
For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.
Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.
To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.
With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).
Reviewed By: hfinkel, dmgreen
Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288
llvm-svn: 348944
It's currently not safe to outline landingpad instructions (see
llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of
previous landingpad instructions in a function alters the lowering of
subsequent landingpads by renumbering type info ID's. Outlining a
landingpad therefore breaks exception handling & unwinding.
llvm-svn: 348870
Summary:
When eliminating a dead argument or return value in a function with
local linkage, all uses, including in dbg.value intrinsics, would be
replaced with null constants. This would mean that, for example for an
integer argument, the debug info would incorrectly express that the
value is 0. Instead, replace all uses with undef to indicate that the
argument/return value is optimized out.
Also, make sure that metadata uses of return values are rewritten even
if there are no non-metadata uses of the value.
As a bit of historical curiosity, the code that emitted null constants
was introduced in the initial check-in of the pass in 2003, before
'undef' values even existed in LLVM.
This fixes PR23260.
Reviewers: dblaikie, aprantl, vsk, djtodoro
Reviewed By: aprantl
Subscribers: llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55513
llvm-svn: 348837
The splitting pass uses its 'unlikelyExecuted' predicate to statically
decide which blocks are cold.
- Do not treat noreturn calls as if they are cold unless they are actually
marked cold. This is motivated by functions like exit() and longjmp(), which
are not beneficial to outline.
- Do not treat inline asm as an outlining barrier. In practice asm("") is
frequently used to inhibit basic block merging; enabling outlining in this case
results in substantial memory savings.
- Treat invokes of cold functions as cold.
As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
no longer needed. The pass can identify & outline blocks dominated by EH pads,
so there's no need to special-case __cxa_begin_catch etc.
Differential Revision: https://reviews.llvm.org/D54244
llvm-svn: 348640
Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.
Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.
Differential Revision: https://reviews.llvm.org/D53887
llvm-svn: 348639
Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks.
Reviewers: chandlerc, jmolloy, aprantl
Reviewed By: aprantl
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D55187
llvm-svn: 348381
Summary:
We can sometimes end up with multiple copies of a local variable that
have the same GUID in the index. This happens when there are local
variables with the same name that are in different source files having the
same name/path at compile time (but compiled into different bitcode objects).
In this case make sure we import the copy in the caller's module.
This enables importing both of the variables having the same GUID
(but which will have different promoted names since the module paths,
and therefore the module hashes, will be distinct).
Importing the wrong copy is particularly problematic for read only
variables, since we must import them as a local copy whenever
referenced. Otherwise we get undefs at link time.
Note that the llvm-lto.cpp and ThinLTOCodeGenerator changes are needed
for testing the distributed index case via clang, which will be sent as
a separate clang-side patch shortly. We were previously not doing the
dead code/read only computation before computing imports when testing
distributed index generation (like it was for testing importing and
other ThinLTO mechanisms alone).
Reviewers: evgeny777
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D55047
llvm-svn: 347886
InlineCost also treats them as free and the current implementation
can cause assertion failures if PHI nodes are moved outside the region
from entry BBs to the region.
It also updates the code to use the instructionsWithoutDebug iterator.
Reviewers: davidxl, davide, vsk, graham-yiu-huawei
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D54748
llvm-svn: 347683
The MergeFunctions pass was originally intended to emit aliases
instead of thunks where possible (unnamed_addr). However, for a
long time this functionality was behind a flag hardcoded to false,
bitrotted and was eventually removed in r309313.
Originally the functionality was first disabled in r108417 due to
lack of support for aliases in Mach-O. I believe that this is no
longer the case nowadays, but not really familiar with this area.
In the interest of being conservative, this patch reintroduces the
aliasing functionality behind a default disabled -mergefunc-use-aliases
flag.
Differential Revision: https://reviews.llvm.org/D53285
llvm-svn: 347407
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.
This can be more efficient than using pred_size(), which is a linear
time operation.
We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).
With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.
See llvm.org/PR39702 for a harder but more general way of achieving
similar results.
Differential Revision: https://reviews.llvm.org/D54686
llvm-svn: 347256
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI(). I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.
Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively. Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB. For example,
Function::getEntryBlock, Loop:getExitBlock, etc.
I also fixed one of the comments.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D54669
llvm-svn: 347182
An attempt to recommit r346584 after failure on OSX build bot.
Fixed cache key computation in ThinLTOCodeGenerator and added
test case
llvm-svn: 347033
This patch allows internalising globals if all accesses to them
(from live functions) are from non-volatile load instructions
Differential revision: https://reviews.llvm.org/D49362
llvm-svn: 346584
After D45330, Dominators are required for IPSCCP and can be preserved.
This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis.
Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima
Reviewed By: chandlerc, kuhar, NutshellySima
Differential Revision: https://reviews.llvm.org/D47259
llvm-svn: 346486
Summary:
This fixes PR 37422
In ELF, non-weak symbols can also be non-prevailing. In this particular
PR, the __llvm_profile_* symbols are non-prevailing but weren't getting
dropped - causing multiply-defined errors with lld.
Also add a test, strong_non_prevailing.ll, to ensure that multiple
copies of a strong symbol are dropped.
To fix the test regressions exposed by this fix,
- do not mark prevailing copies for symbols with 'appending' linkage.
There's no one prevailing copy for such symbols.
- fix the prevailing version in dead-strip-fulllto.ll
- explicitly pass exported symbols to llvm-lto in fumcimport.ll and
funcimport_var.ll
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith,
dang, srhines, llvm-commits
Differential Revision: https://reviews.llvm.org/D54125
llvm-svn: 346436
Summary:
MergeFunctions currently tries to process strong functions before
weak functions, because weak functions can simply call strong
functions, while a strong/weak function cannot call a weak function
(a backing strong function is needed).
This patch additionally tries to process external functions before
local functions, because we definitely have to keep the external
function, but may be able to drop the local one (and definitely
can if it is also unnamed_addr).
Unfortunately, this exposes an existing bug in the implementation:
The FnTree and FNodesInTree structures can currently go out of
sync in the case where two weak functions are merged, because the
function in FnTree/FNodesInTree is RAUWed. This leaves it behind in
FnTree (this is intended, as it is the strong backing function which
should be used for further merges), while it is replaced in
FNodesInTree (this is not intended).
This is fixed by switching FNodesInTree from using a ValueMap to
using a DenseMap of AssertingVH.
This exposes another minor issue: Currently FNodesInTree is not
cleared after MergeFunctions finishes running. Currently, this is
potentially dangerous (e.g. if something else wants to RAUW a function
with a non-function), but at the very least it is unnecessary/inefficient.
After the change to use AssertingVH it becomes more problematic,
because there are certainly passes that remove functions.
This issue is fixed by clearing FNodesInTree at the end of the pass.
Reviewers: jfb, whitequark
Reviewed By: whitequark
Subscribers: rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D53271
llvm-svn: 346386
Summary:
For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities.
Fix this by calling removeUsers() prior to RAUW.
Reviewers: jfb, whitequark
Reviewed By: whitequark
Subscribers: rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D53262
llvm-svn: 346385
Summary:
The NotEligibleToImport flag on the GlobalValueSummary was set if it
isn't legal to import (e.g. because it references unpromotable locals)
and when it can't be inlined (in which case importing is pointless).
I split out the inlinable piece into a separate flag on the
FunctionSummary (doesn't make sense for aliases or global variables),
because in the future we may want to import for reasons other than
inlining.
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53345
llvm-svn: 346261
Using TargetTransformInfo allows the splitting pass to factor in the
code size cost of instructions as it decides whether or not outlining is
profitable.
This did not regress the overall amount of outlining seen on the handful
of internal frameworks I tested.
Thanks to Jun Bum Lim for suggesting this!
Differential Revision: https://reviews.llvm.org/D53835
llvm-svn: 346108
It can be profitable to outline single-block cold regions because they
may be large.
Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.
In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.
Differential Revision: https://reviews.llvm.org/D53824
llvm-svn: 345524
The current splitting algorithm works in three stages:
1) Identify cold blocks, then
2) Use forward/backward propagation to mark hot blocks, then
3) Grow a SESE region of blocks *outside* of the set of hot blocks and
start outlining.
While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:
- An inconsistency can arise in the internal state of the hotness
propagation stage, as a block may end up in both the ColdBlocks set
and the HotBlocks set. Further inconsistencies can arise as these sets
do not match what's in ProfileSummaryInfo.
- It isn't necessary to limit outlining to single-exit regions.
This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.
Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.
Results:
- X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
47KB pre-patch, or a ~3x improvement. Did not see a performance impact
across two runs.
- AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
of TEXT were outlined. Ditto re: performance impact.
- Outlining results improve marginally in the internal frameworks I
tested.
Follow-ups:
- Outline more than once per function, outline large single basic
blocks, & try to remove unconditional branches in outlined functions.
Differential Revision: https://reviews.llvm.org/D53627
llvm-svn: 345209
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.
Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.
Reviewers: sebpop, hiraditya
Subscribers: llvm-commits, erik.pilkington
Differential Revision: https://reviews.llvm.org/D53534
llvm-svn: 345178
in the same round of SCC update.
In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.
We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.
What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.
Differential Revision: https://reviews.llvm.org/D52915
llvm-svn: 345103
Outlined code is cold by assumption, so it makes sense to optimize it
for minimal code size rather than performance.
After r344869 moved the splitting pass to the end of the IR pipeline,
this does not result in much of a code size reduction. This is probably
because a comparatively small number backend transforms make use of the
MinSize hint.
Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of
919 bytes of TEXT reduction. I didn't measure a significant performance
impact.
Differential Revision: https://reviews.llvm.org/D53518
llvm-svn: 345072
Summary:
TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass.
Fixes PR37959. The test case here is the simplified .ll for:
```
static int foo;
int bar() {
foo = 5;
return foo;
}
```
Reviewers: dblaikie, gbedwell, aprantl
Reviewed By: dblaikie
Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D53531
llvm-svn: 345046
Summary:
In the new+old pass manager, hot cold splitting was schedule too early.
Thanks to Vedant for pointing this out.
Reviewers: sebpop, vsk
Reviewed By: sebpop, vsk
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D53437
llvm-svn: 344869
Summary:
Previously we could only get the number of imported functions and
variables from the backend. This adds stats to the thin link where the
importing is decided.
Reviewers: wmi
Subscribers: inglorion, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D53337
llvm-svn: 344658
r344558 added an assignment to a TerminatorInst* from
BasicBlock::getTerminatorInst(), but BasicBlock::getTerminatorInst() returns an
Instruction* rather than a TerminatorInst* since r344504 so this fails to
compile.
Changing the variable to an Instruction* should get the bots building again.
llvm-svn: 344566
Make the code of blockEndsInUnreachable to match the function
blockEndsInUnreachable in CodeGen/BranchFolding.cpp. I also have
added a note to make sure the code of this function will not be
modified unless the back-end version is also modified.
An early return before outlining has been added to avoid
outlining the full function body when the first block in the
function is marked cold.
The static analysis of cold code has been amended to avoid
marking the whole function as cold by back-propagation
because the back-propagation would mark blocks with return
statements as cold.
The patch adds debug statements to help discover these problems.
Differential Revision: https://reviews.llvm.org/D52904
llvm-svn: 344558
by `getTerminator()` calls instead be declared as `Instruction`.
This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).
llvm-svn: 344502