Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.
This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).
This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.
Depends on D71913.
Reviewers: pcc, evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
Summary:
A recent change to enable more importing of global variables with
references exposed some efficiency issues with export computation.
See D73724 for more information and detailed analysis.
The first was specific to variable importing. The code was marking every
copy of a referenced value (from possibly thousands of files in the case
of linkonce_odr) as exported, and we only need to mark the copy in the
module containing the variable def being imported as exported. The
reason is that this is tracking what values are newly exported as a
result of importing. Anything that was defined in another module and
simply used in the exporting module is already exported, and would have
been identified by the caller (e.g. the LTO API implementations).
The second issue is that the code was re-adding previously exported
values (along with all references). It is easy to identify when a
variable was already imported into the same module (via the
import list insert call return value), and we already did this for
function importing. However, what we weren't doing for either function
or variable importing was avoiding a re-insertion when it was previously
exported into a different importing module. The reason we couldn't do
this is there was no way of telling from the export list whether it was
previously inserted there because its definition was exported (in which
case we already marked all its references as exported) from when it was
inserted there because it was referenced by another exported value (in
which case we haven't yet inserted its own references).
To address this we can restructure the way the export list is
constructed. This patch only adds the actual imported definitions
(variable or function) to the export list for its module during the
import computation. After import computation is complete, where we were
already post-processing the export list we go ahead and add all
references made by those exported values to the export list.
These changes speed up the thin link not only with constant variable
importing enabled, but also without (due to the efficiency improvement
in function importing).
Some thin link user time measurements for one large application, average
of 5 runs:
With constant variable importing enabled:
- without this patch: 479.5s
- with this patch: 74.6s
Without constant variable importing enabled:
- without this patch: 80.6s
- with this patch: 70.3s
Note I have not re-enabled constant variable importing here, as I would
like to do additional compile time measurements with these fixes first.
Reviewers: evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73851
If all call sites are in `norecurse` functions we can derive `norecurse`
as the ReversePostOrderFunctionAttrsPass does. This should make
ReversePostOrderFunctionAttrsLegacyPass obsolete once the Attributor is
enabled.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D72017
If we know that all call sites have been processed we can derive an
early fixpoint. The use in this patch is likely not to trigger right now
but a follow up patch will make use of it.
Reviewed By: uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D72016
Summary:
In a release build this variable becomes unused and may break the build
with `-Werror,-Wunused-variable`.
Reviewers: gribozavr2, jdoerfert, sstefan1
Reviewed By: gribozavr2
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73683
A pointer is privatizeable if it can be replaced by a new, private one.
Privatizing pointer reduces the use count, interaction between unrelated
code parts. This is a first step towards replacing argument promotion.
While we can already handle recursion (unlike argument promotion!) we
are restricted to stack allocations for now because we do not analyze
the uses in the callee.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68852
The helpers AAReturnedFromReturnedValues and
AACallSiteReturnedFromReturned are useful not only to avoid code
duplication but also to avoid recomputation of results. If we have N
call sites we should not recompute the function return information N
times but once. These are mostly straightforward usages with some minor
improvements on the helpers and addition of a new one
(IRPosition::getAssociatedType) that knows about function return types.
This commit fixes PR39321.
GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction.
This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet.
Differential Revision: https://reviews.llvm.org/D71959
There was a TODO in AAValueConstantRangeArgument to reuse
AAArgumentFromCallSiteArguments. We now do this by allowing new States
to be build from the bestState.
If we invalidate an attribute we need to inform all dependent ones even
if the fixpoint state is not invalid. Before we only continued
invalidation if the fixpoint state was invalid, now we signal a change
in case the fixpoint state is valid.
The test case was already included in D71620 but the problem was hiding
because it only manifested with the old PM (for that input).
This patch modularizes the way we check for no-alias call site arguments
by putting the existing logic into helper functions. The reasoning was
not changed but special cases for readonly/readnone were added.
If `null` is not defined we cannot access it, hence the pointer is
`noalias`. While this is not helpful on it's own it simplifies later
deductions that can skip over already known `noalias` pointers in
certain situations.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
This restores 59733525d3 (D71913), along
with bot fix 19c76989bb.
The bot failure should be fixed by D73418, committed as
af954e441a.
I also added a fix for non-x86 bot failures by requiring x86 in new test
lld/test/ELF/lto/devirt_vcall_vis_public.ll.
When we use information only to short-cut deduction or improve it, we
can use OPTIONAL dependences instead of REQUIRED ones to avoid cascading
pessimistic fixpoints.
We also need to track dependences only when we use assumed information,
e.g., we act on assumed liveness information.
It can happen that we have instructions in the ToBeDeletedInsts set
which are deleted earlier already. To avoid dangling pointers we use
weak tracking handles.
When we follow uses, e.g., in AAMemoryBehavior or AANoCapture, we need
to make sure the value is a pointer before we ask for abstract
attributes only valid for pointers. This happens because we follow
pointers through calls that do not capture but may return the value.
We might accidentally ask AAValueSimplify to simplify a void value. That
can lead to very interesting, and very wrong, results. We now handle
this case gracefully.
If alignment was manifested but it is actually only as good as the
data-layout provided one we should not report it as a change.
For testing purposes we still manifest the information.
Summary:
Third part in series to support Safe Whole Program Devirtualization
Enablement, see RFC here:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
This patch adds type test metadata under -fwhole-program-vtables,
even for classes without hidden visibility. It then changes WPD to skip
devirtualization for a virtual function call when any of the compatible
vtables has public vcall visibility.
Additionally, internal LLVM options as well as lld and gold-plugin
options are added which enable upgrading all public vcall visibility
to linkage unit (hidden) visibility during LTO. This enables the more
aggressive WPD to kick in based on LTO time knowledge of the visibility
guarantees.
Support was added to all flavors of LTO WPD (regular, hybrid and
index-only), and to both the new and old LTO APIs.
Unfortunately it was not simple to split the first and second parts of
this part of the change (the unconditional emission of type tests and
the upgrading of the vcall visiblity) as I needed a way to upgrade the
public visibility on legacy WPD llvm assembly tests that don't include
linkage unit vcall visibility specifiers, to avoid a lot of test churn.
I also added a mechanism to LowerTypeTests that allows dropping type
test assume sequences we now aggressively insert when we invoke
distributed ThinLTO backends with null indexes, which is used in testing
mode, and which doesn't invoke the normal ThinLTO backend pipeline.
Depends on D71907 and D71911.
Reviewers: pcc, evgeny777, steven_wu, espindola
Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71913
The utility method RecursivelyDeleteTriviallyDeadInstructions receives
as input a vector of Instructions, where all inputs are valid
instructions. This same vector is used as a scratch storage (per the
header comment) to recursively delete instructions. If an instruction is
added as an operand of multiple other instructions, it may be added twice,
then deleted once, then the second reference in the vector is invalid.
Switch to using a Vector<WeakTrackingVH>.
This change facilitates a clean-up in LoopStrengthReduction.
Summary:
First patch to support Safe Whole Program Devirtualization Enablement,
see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
Always emit !vcall_visibility metadata under -fwhole-program-vtables,
and not just for -fvirtual-function-elimination. The vcall visibility
metadata will (in a subsequent patch) be used to communicate to WPD
which vtables are safe to devirtualize, and we will optionally convert
the metadata to hidden visibility at link time. Subsequent follow on
patches will help enable this by adding vcall_visibility metadata to the
ThinLTO summaries, and always emit type test intrinsics under
-fwhole-program-vtables (and not just for vtables with hidden
visibility).
In order to do this safely with VFE, since for VFE all vtable loads must
be type checked loads which will no longer be the case, this patch adds
a new "Virtual Function Elim" module flag to communicate to GlobalDCE
whether to perform VFE using the vcall_visibility metadata.
One additional advantage of using the vcall_visibility metadata to drive
more WPD at LTO link time is that we can use the same mechanism to
enable more aggressive VFE at LTO link time as well. The link time
option proposed in the RFC will convert vcall_visibility metadata to
hidden (aka linkage unit visibility), which combined with
-fvirtual-function-elimination will allow it to be done more
aggressively at LTO link time under the same conditions.
Reviewers: pcc, ostannard, evgeny777, steven_wu
Subscribers: mehdi_amini, Prazek, hiraditya, dexonsmith, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71907
Summary:
InlineResult is used both in APIs assessing whether a call site is
inlinable (e.g. llvm::isInlineViable) as well as in the function
inlining utility (llvm::InlineFunction). It means slightly different
things (can/should inlining happen, vs did it happen), and the
implicit casting may introduce ambiguity (casting from 'false' in
InlineFunction will default a message about hight costs,
which is incorrect here).
The change renames the type to a more generic name, and disables
implicit constructors.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72744
Summary:
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).
The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).
Currently, AAValueConstantRange is created in `getAssumedConstant` method when `AAValueSimplify` returns `nullptr`(worst state).
Supported
- BinaryOperator(add, sub, ...)
- CmpInst(icmp eq, ...)
- !range metadata
`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: phosek, davezarzycki, baziotis, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71620
This ports the MergeFunctions pass to the NewPM. This was rather
straightforward, as no analyses are used.
Additionally MergeFunctions needs to be conditionally enabled in
the PassBuilder, but I left that part out of this patch.
Differential Revision: https://reviews.llvm.org/D72537
Summary:
An assert added to the index-based WPD was trying to verify that we only
have multiple vtables for a given guid when they are all non-external
linkage. This is too conservative because we may have multiple external
vtable with the same guid when they are in comdat. Remove the assert,
as we don't have comdat information in the index, the linker should
issue an error in this case.
See discussion on D71040 for more information.
Reviewers: evgeny777, aganea
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72648
This reverts commit a03d7b0f24.
As discussed in D68298, this causes a compile-time regression, in case
the DTs requested are not used elsewhere in GlobalOpt. We should only
get the DTs if they are available here, but this seems not possible with
the legacy pass manager from a module pass.
Summary:
A recent fix in D69452 fixed index based WPD in the presence of
available_externally vtables. It added a cast of the vtable def
summary to a GlobalVarSummary. However, in some cases one def may be an
alias, in which case we need to get the base object before casting,
otherwise we will crash.
Reviewers: evgeny777, steven_wu, aganea
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71040
When we replace instructions with unreachable we delete instructions. We
now avoid dangling pointers to those deleted instructions in the
`ToBeChangedToUnreachableInsts` set. Other modification collections
might need to be updated in the future as well.
Summary:
For artificial cases (huge array, few usages), Global SRA optimization creates
a lot of redundant data. It creates an instance of GlobalVariable for each array
element. For huge array, that means huge compilation time and huge memory usage.
Following example compiles for 10 minutes and requires 40GB of memory.
namespace {
char LargeBuffer[64 * 1024 * 1024];
}
int main ( void ) {
LargeBuffer[0] = 0;
printf("\n ");
return LargeBuffer[0] == 0;
}
The fix is to avoid Global SRA for large arrays.
Reviewers: craig.topper, rnk, efriedma, fhahn
Reviewed By: rnk
Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71993
If we replace a function with a new one because we rewrite the
signature, dead users may still refer to the old version. With this
patch we reuse the code that deals with dead functions, which the old
versions are, to avoid problems.