Commit Graph

781 Commits

Author SHA1 Message Date
Gulfem Savrun Yeniceri e3a6d70c68 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 78a65cd945 which
caused buildbot failures.
2021-03-23 00:43:16 +00:00
Gulfem Savrun Yeniceri 78a65cd945 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-22 22:09:02 +00:00
Sriraman Tallam 0ba1ebcbb7 Remove original implementation of UniqueInternalLinkageNames pass.
D96109 was recently submitted which contains the refactored implementation of
-funique-internal-linakge-names by adding the unique suffixes in clang rather
than as an LLVM pass. Deleting the former implementation in this change.

Differential Revision: https://reviews.llvm.org/D98234
2021-03-10 11:57:40 -08:00
Ta-Wei Tu 8a003861a3 [NPM] Add -enable-loopinterchange option to NPM
We have the `enable-loopinterchange` option in legacy pass manager but not in NPM.
Add `LoopInterchange` pass to the optimization pipeline (at the same position as before)
when `enable-loopinterchange` is turned on.

Reviewed By: aeubanks, fhahn

Differential Revision: https://reviews.llvm.org/D98116
2021-03-07 02:39:28 +08:00
Arthur Eubanks a9b33ffb8f [ThinLTO][NewPM] Clean up dead code under -O0
We're running into undefined references using ThinLTO with -O0 on
Windows/Chrome. This fixes that.

This matches the legacy PM.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97414
2021-02-24 17:08:57 -08:00
Vitaly Buka 8560c2d426 [ThinLTO, NewPM] Run OptimizerLastEPCallbacks from buildThinLTOPreLinkDefaultPipeline
-O1 and above do dont call real optimizer pipeline in ThinLTO PreLink.
Also clang can't add PostLink OptimizerLastEPCallbacks for in-process ThinLTO.
This results in missing sanitizer passes with ThinLTO.

Simple working solution is just call OptimizerLastEPCallbacks
at the end of buildThinLTOPreLinkDefaultPipeline.

Differential Revision: https://reviews.llvm.org/D96320
2021-02-23 22:14:41 -08:00
Wenlei He a952d7291e [SampleFDO] Skip PreLink ICP for better profile quality of MonoLTO PostLink
For ThinLTO, PreLink ICP is skipped to favor better profile annotation during LTO PostLink. This change applies the same tweak for MonoLTO. Note that PreLink ICP not only makes PostLink profile annotation harder, it is also uncoordinated with PostLink ICP so duplicated ICP could happen.

Differential Revision: https://reviews.llvm.org/D97028
2021-02-19 19:35:23 -08:00
Nikita Popov 71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
David Green c141c6551b [NPM][LTO] Do not enable MemorySSA with LoopFullUnrollPass
As with the standard opt pipeline, we disable the MemorySSA dependency
in the LTO LPM pipeline as not all passes preserve MemorySSA.
2021-02-19 08:35:11 +00:00
David Green 908ac47ef4 [NPM][LTO] Update buildLTODefaultPipeline to be more in-line with the old pass manager
The NPM LTO pipeline has a lot of fixme's and missing passes, causing a
lot of regressions after the switch in c70737b. Notably unrolling and
vectorization were both disabled, but many other passes are missing
compared to the old pass manager. This attempt to enable the most
obvious missing passes like the unroller, vectorization and other loop
passes, fixing the existing FIXME comments.

Differential Revision: https://reviews.llvm.org/D96780
2021-02-17 16:56:28 +00:00
Sanne Wouda 93d9a4c95a Use LoopRotate PrepareForLTO stage in NPM
The PrepareForLTO stage of LoopRotate tries to avoid unrolling loops
with calls that might be inlined later.  See D94232 where this was
introduced.

We didn't catch all occurances of the LoopRotatePass in the New Pass
Manager, so the original regression in astar returned with the pass
manager switch.
2021-02-17 14:06:57 +00:00
Sameer Sahasrabuddhe 11bf7da64a [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager
The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis"
since there is no conflict with LegacyDivergenceAnalysis. In the
legacy PM, this analysis can only be used through the legacy DA
serving as a wrapper. It is now made available as a pass in the new
PM, and has no relation with the legacy DA.

The new DA currently cannot handle irreducible control flow; its
presence can cause the analysis to run indefinitely. The analysis is
now modified to detect this and report all instructions in the
function as divergent. This is super conservative, but allows the
analysis to be used without hanging the compiler.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D96615
2021-02-16 10:26:45 +05:30
Kazu Hirata 910e2d1e57 [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Arthur Eubanks 5d960cba34 [opt][NewPM] Add a --print-passes flag to print all available passes
It seems nicer to list passes given a flag rather than displaying all
passes in opt --help.

This is awkwardly structured because a PassBuilder is required, but
reusing the PassBuilder in runPassPipeline() doesn't work because we
read the input IR before getting to runPassPipeline(). So printing the
list of passes needs to happen before reading the input IR. If we remove
the legacy PM code in main() and move everything from NewPMDriver.cpp
into opt.cpp, we can create the PassBuilder before reading IR and check
if we should print the list of passes and exit. But until then this hack
seems fine.

Compared to the legacy PM, the new PM passes are lacking descriptions.
We'll need to figure out a way to add descriptions if we think this is
important.

Also, this only works for passes specified in PassRegistry.def. If we
want to print other custom registered passes, we'll need a different
mechanism.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D96101
2021-02-10 11:22:12 -08:00
Jamie Schmeiser 4b661b4059 Introduce -print-changed=[diff | diff-quiet] which show changes in patch-like format
Summary:
Introduce base classes that hold a textual represent of the IR
based on basic blocks and a base class for comparing this
representation.  A new change printer is introduced that uses these
classes to save and compare representations of the IR before and after
each pass.  It only reports when changes are made by a pass (similar to
-print-changed) except that the changes are shown in a patch-like format
with those lines that are removed shown in red prefixed with '-' and those
added shown in green with '+'.  This functionality was introduced in my
tutorial at the 2020 virtual developer's meeting.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D91890
2021-02-08 10:11:22 -05:00
Arthur Eubanks f020544601 [NewPM][HelloWorld] Move HelloWorld to Utils
To prevent creating a new component, which creates a new library.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D95907
2021-02-03 12:59:40 -08:00
Hongtao Yu 3d89b3cbec [CSSPGO] Introducing distribution factor for pseudo probe.
Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.

This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.

A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.

Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D93264
2021-02-02 11:55:01 -08:00
Arthur Eubanks 7739f9ff97 [NewPM][Unswitch] Add option to disable -O3 non-trivial unswitching
Some benchmarks regress with non-trivial unswitching, so add an option
to opt-out of performing non-trivial unswitching while investigating.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D95796
2021-02-01 11:11:59 -08:00
Bjorn Pettersson a9bd3d37bd [NewPM] Add ExtraVectorizerPasses support
As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D95457
2021-01-26 22:59:10 +01:00
Florian Hahn 83daa49758
[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands.
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.

The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.

This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.

I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.

From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.

There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).

There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.

I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.

Reviewed By: sanwou01

Differential Revision: https://reviews.llvm.org/D94232
2021-01-19 10:15:29 +00:00
Mircea Trofin e8049dc3c8 [NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.

Differential Revision: https://reviews.llvm.org/D94825
2021-01-15 17:59:38 -08:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Arthur Eubanks a03ffa9850 [NewPM] Fix placement of LoopFlatten
https://reviews.llvm.org/D90402 was inconsistent with where it put
LoopFlatten between the two pass managers. It also missed adding it to
the non-O1 function simplification pipeline.

PR48738

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D94650
2021-01-14 09:49:31 -08:00
Arthur Eubanks b196dc6607 [NFC] Remove unused entry in PassRegistry.def 2021-01-13 19:01:07 -08:00
Wei Mi 86341247c4 [NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h
to Pass.h.

In some compiler passes like SampleProfileLoaderPass, we want to know which
LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum
class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and
it also cannot represent phase in full LTO. The patch extends it to include
full LTO phases and move it from PassBuilder.h to Pass.h, then it is much
easier for PassBuilder to communiate with each pass about current LTO phase.

Differential Revision: https://reviews.llvm.org/D94613
2021-01-13 15:55:40 -08:00
Arthur Eubanks 39e6d24237 [NewPM] Only non-trivially loop unswitch at -O3 and for non-optsize functions
This matches the legacy pipeline/pass.

Reviewed By: asbirlea, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D94559
2021-01-13 14:54:49 -08:00
Arthur Eubanks f748e92295 [NewPM] Run non-trivial loop unswitching under -O2/3/s/z
Fixes https://bugs.llvm.org/show_bug.cgi?id=48715.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D94448
2021-01-12 11:04:40 -08:00
Jamie Schmeiser 43a830ed94 Introduce new quiet mode and new option handling for -print-changed.
Summary:
Introduce a new mode of operation for -print-changed that only reports
after a pass changes the IR with all of the other messages suppressed (ie,
no initial IR and no messages about ignored, filtered or non-modifying
passes).

The option processing for -print-changed is changed to take an optional
string indicating options for print-changed. Initially, the only option
supported is quiet (as described above). This new quiet mode is specified
with -print-changed=quiet while -print-changed will continue to function
in the same way. It is intended that there will be more options in the
future.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D92589
2021-01-11 14:15:18 -05:00
Arthur Eubanks 756dd70766 [NewPM] Run ObjC ARC passes
Match the legacy PM in running various ObjC ARC passes.

This requires making some module passes into function passes. These were
initially ported as module passes since they add function declarations
(e.g. https://reviews.llvm.org/D86178), but that's still up for debate
and other passes do so.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D93743
2021-01-08 15:47:11 -08:00
Arthur Eubanks 69cf735062 [NewPM] Don't error when there's an unrecognized pass name
This currently blocks --print-before/after with a legacy PM pass, for
example when we use the new PM for the optimization pipeline but the
legacy PM for the codegen pipeline. Also in the future when the codegen
pipeline works with the new PM there will be multiple places to specify
passes, so even when everything is using the new PM, there will still be
multiple places that can accept different pass names.

Reviewed By: hoy, ychen

Differential Revision: https://reviews.llvm.org/D94283
2021-01-07 22:33:32 -08:00
Arthur Eubanks 28a326eba0 [NFC] Rename registerAliasAnalyses -> registerDefaultAliasAnalyses
To clarify that this only affects the "default" AA.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D93980
2021-01-05 11:07:58 -08:00
Kazu Hirata eb198f4c3c [llvm] Use llvm::any_of (NFC) 2021-01-04 11:42:47 -08:00
Florian Hahn c367258b5c
[SimplifyCFG] Enabled hoisting late in LTO pipeline.
bb7d3af113 disabled hoisting in SimplifyCFG by default, but enabled it
late in the pipeline. But it appears as if the LTO pipelines got missed.

This patch adjusts the LTO pipelines to also enable hoisting in the
later stages.

Unfortunately there's no easy way to add a test for the change I think.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D93684
2021-01-04 16:26:58 +00:00
Hongtao Yu 01f0d162d6 Moving UniqueInternalLinkageNamesPass to the start of IR pipelines.
`UniqueInternalLinkageNamesPass` is useful to CSSPGO, especially when pseudo probe is used. It solves naming conflict for static functions which otherwise will share a merged profile and likely have a profile quality issue with mismatched CFG checksums. Since the pseudo probe instrumentation happens very early in the pipeline, I'm moving `UniqueInternalLinkageNamesPass` right before it. This is being done only to the new pass manager.

Reviewed By: dblaikie, aeubanks

Differential Revision: https://reviews.llvm.org/D93656
2021-01-02 14:26:21 -08:00
Arthur Eubanks c2ef06d3dd [NewPM] Port infer-address-spaces
And add it to the AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93880
2020-12-28 19:58:12 -08:00
Arthur Eubanks 6c36286a2e [NewPM] Fix CGSCCOptimizerLateEPCallbacks place in pipeline
CGSCCOptimizerLateEPCallbacks are supposed to be run before the function
simplification pipeline, like in the legacy PM and as specified in the
comments for registerCGSCCOptimizerLateEPCallback().

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D93871
2020-12-28 14:03:10 -08:00
Arthur Eubanks 0219cf7dfa [NewPM] Fix objc-arc-apelim pass typo 2020-12-22 21:40:43 -08:00
Arthur Eubanks 76f4f42eba [NewPM] Add TargetMachine method to add alias analyses
AMDGPUTargetMachine::adjustPassManager() adds some alias analyses to the
legacy PM. We need a way to do the same for the new PM in order to port
AMDGPUTargetMachine::adjustPassManager() to the new PM.

Currently the new PM adds alias analyses by creating an AAManager via
PassBuilder and overriding the AAManager a PassManager uses via
FunctionAnalysisManager::registerPass().

We will continue to respect a custom AA pipeline that specifies an exact
AA pipeline to use, but for "default" we will now add alias analyses
that backends specify. Most uses of PassManager use the "default"
AAManager created by PassBuilder::buildDefaultAAPipeline(). Backends can
override the newly added TargetMachine::registerAliasAnalyses() to add custom
alias analyses.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D93261
2020-12-21 13:46:07 -08:00
Samuel Eubanks 47dbee6790 Make NPM OptBisectInstrumentation use global singleton OptBisect
Currently there is an issue where the legacy pass manager uses a different OptBisect counter than the new pass manager.
This fix makes the npm OptBisectInstrumentation use the global OptBisect.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D92897
2020-12-20 13:47:56 -08:00
Andrew Litteken dae34463e3 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Recommit of bf899e8913 fixing memory
leaks.

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-12-17 11:27:26 -06:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Bardia Mahjour 6eff12788e [DDG] Data Dependence Graph - DOT printer - recommit
This is being recommitted to try and address the MSVC complaint.

This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D90159
2020-12-16 12:37:36 -05:00
Bardia Mahjour a29ecca781 Revert "[DDG] Data Dependence Graph - DOT printer"
This reverts commit fd4a10732c, to
investigate the failure on windows: http://lab.llvm.org:8011/#/builders/127/builds/3274
2020-12-14 16:54:20 -05:00
Bardia Mahjour fd4a10732c [DDG] Data Dependence Graph - DOT printer
This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.

Differential Revision: https://reviews.llvm.org/D90159
2020-12-14 16:41:14 -05:00
Zequan Wu b5216b2950 [PGO] Enable preinline and cleanup when optimize for size
Differential Revision: https://reviews.llvm.org/D91673
2020-12-10 12:29:17 -08:00
Arthur Eubanks ff7e1da68f [NPM] Support -fmerge-functions
I tried to put it in the same place in the pipeline as the legacy PM.

Fixes PR48399.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D93002
2020-12-10 11:45:08 -08:00
Anna Thomas 29356e3279 [ScalarizeMaskedMemIntrin] Add new PM support
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.

Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
2020-12-08 17:15:22 -05:00
Arthur Eubanks 0173eb0faf Use isIgnored instead of checking pass name
In preparation for https://reviews.llvm.org/D92616 which will remove
angle brackets from pass manager/adaptor names.

Reviewed By: dexonsmith, thakis

Differential Revision: https://reviews.llvm.org/D92625
2020-12-03 18:37:57 -08:00
Arthur Eubanks 2f0de58294 [NewPM] Support --print-before/after in NPM
This changes --print-before/after to be a list of strings rather than
legacy passes. (this also has the effect of not showing the entire list
of passes in --help-hidden after --print-before/after, which IMO is
great for making it less verbose).

Currently PrintIRInstrumentation passes the class name rather than pass
name to llvm::shouldPrintBeforePass(), meaning
llvm::shouldPrintBeforePass() never functions as intended in the NPM.
There is no easy way of converting class names to pass names outside of
within an instance of PassBuilder.

This adds a map of pass class names to their short names in
PassRegistry.def within PassInstrumentationCallbacks. It is populated
inside the constructor of PassBuilder, which takes a
PassInstrumentationCallbacks.

Add a pointer to PassInstrumentationCallbacks inside
PrintIRInstrumentation and use the newly created map.

This is a bit hacky, but I can't think of a better way since the short
id to class name only exists within PassRegistry.def. This also doesn't
handle passes not in PassRegistry.def but rather added via
PassBuilder::registerPipelineParsingCallback().

llvm/test/CodeGen/Generic/print-after.ll doesn't seem very useful now
with this change.

Reviewed By: ychen, jamieschmeiser

Differential Revision: https://reviews.llvm.org/D87216
2020-12-03 16:52:14 -08:00
Mircea Trofin 5fe10263ab [llvm][inliner] Reuse the inliner pass to implement 'always inliner'
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567
2020-11-30 12:03:39 -08:00
Hongtao Yu c083fededf [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.
This change introduces a new clang switch `-fpseudo-probe-for-profiling` to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86502
2020-11-30 10:16:54 -08:00
Hongtao Yu 64fa8cce22 [CSSPGO] Pseudo probe instrumentation pass
This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86499
2020-11-30 10:16:54 -08:00
Andrew Litteken a8a43b6338 Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner."
Reverting commit due to address sanitizer errors.

> Extracting the similar regions is the first step in the IROutliner.
> 
> Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
> sort them by how many instructions will be removed.  Each
> IRSimilarityCandidate is used to define an OutlinableRegion.  Each
> region is ordered by their occurrence in the Module and the regions that
> are not compatible with previously outlined regions are discarded.
> 
> Each region is then extracted with the CodeExtractor into its own
> function.
> 
> We test that correctly extract in:
> test/Transforms/IROutliner/extraction.ll
> test/Transforms/IROutliner/address-taken.ll
> test/Transforms/IROutliner/outlining-same-globals.ll
> test/Transforms/IROutliner/outlining-same-constants.ll
> test/Transforms/IROutliner/outlining-different-structure.ll
> 
> Reviewers: paquette, jroelofs, yroux
> 
> Differential Revision: https://reviews.llvm.org/D86975

This reverts commit bf899e8913.
2020-11-27 19:55:57 -06:00
Andrew Litteken bf899e8913 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-11-27 19:08:29 -06:00
Roman Lebedev a8d74517dc
[PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.

LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.

Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.

While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91800
2020-11-25 19:20:07 +03:00
Arthur Eubanks 2c7870dcca [NewPM] Add pipeline EP callback after initial frontend cleanup
This matches the legacy PM's EP_ModuleOptimizerEarly. Some backends use
this extension point and adding the pass somewhere else like
PipelineStartEPCallback doesn't work.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91804
2020-11-24 21:14:36 -08:00
Arthur Eubanks aff058b1a9 Reland [CGSCC] Detect devirtualization in more cases
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.

This fixes some tests testing this exact behavior in the legacy PM.

Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.

check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).

http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.

The initial land assumed CallBase WeakTrackingVHs would always be
CallBases, but they can be RAUW'd with undef.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89587
2020-11-23 21:28:59 -08:00
Arthur Eubanks 6a2799cf8e Revert "[CGSCC] Detect devirtualization in more cases"
This reverts commit 14a68b4aa9.

Causes building self hosted clang to crash when using NPM.
2020-11-23 13:21:05 -08:00
Arthur Eubanks 3c811ce4f3 [NPM] Share pass building options with legacy PM
We should share options when possible.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91741
2020-11-23 13:04:05 -08:00
Arthur Eubanks 7167e5203a Port -print-memderefs to NPM
There is lots of code duplication, but hopefully it won't matter soon.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91683
2020-11-23 11:56:22 -08:00
Arthur Eubanks 14a68b4aa9 [CGSCC] Detect devirtualization in more cases
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.

This fixes some tests testing this exact behavior in the legacy PM.

Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.

check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).

http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89587
2020-11-23 11:55:20 -08:00
Jamie Schmeiser 621efa6a5a [NFC intended] Refactor the code for printChanged for reuse and to facilitate subsequent reporters of changes to the IR in the new pass manager.
Summary:
[NFC intended] Refactor the code for printChanged for reuse and to facilitate
subsequent reporters of changes to the IR in the new pass manager.

Create abstract template base classes for common functionality and give
classes more appropriate names.  The base classes handle all of the
determination of when a function or pass is "interesting" and should be
reported or filtered out. They have pure virtual functions which are called
when a change by a pass has been recognized so the derived class need only
provide the overrides to present the information about the changing IR.
There are at least 2 more change reporters to come (which were presented
in my tutorial at the 2020 llvm developer's meeting) that derive from
these classes.

Respond to review comments:  move function out of line, remove inline keyword,
remove unneeded qualifiers, simplify comparison.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks), madhur13490 (Madhur Amilkanthwar)
Differential Revision: https://reviews.llvm.org/D87000
2020-11-20 09:43:06 -05:00
Arthur Eubanks b77436047a [PGO] Make -disable-preinline work with NPM
Fixes cspgo_profile_summary.ll under NPM.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D91826
2020-11-19 22:58:55 -08:00
Arthur Eubanks 513d165b80 Port -lower-matrix-intrinsics-minimal to NPM
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.

Use this new variant in the NPM -O0 pipeline.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91811
2020-11-19 17:42:48 -08:00
Arthur Eubanks 72badbcdcc [NPM] Move more O0 pass building into PassBuilder
This moves handling of alwaysinline, coroutines, matrix lowering, PGO,
and LTO-required passes into PassBuilder. Much of this is replicated
between Clang and opt. Other out-of-tree users also replicate some of
this, such as Rust [1] replicating the alwaysinline, LTO, and PGO
passes.

The LTO passes are also now run in
build(Thin)LTOPreLinkDefaultPipeline() since they are semantically
required for (Thin)LTO.

[1]: f5230fbf76/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L896)

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D91585
2020-11-19 11:22:23 -08:00
Arthur Eubanks 67f16e9e91 [NPM] Remove -enable-npm-optnone flag
It has been on by default for a couple months without complaint.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91743
2020-11-18 15:49:16 -08:00
Yevgeny Rouban cba3e78338 [NewPM] Disable PreservedCFGChecker and add regression unit tests
The design of the PreservedCFG Checker (landed with the commit
28012e00d8) has a fundamental flaw which makes it incorrect.
The checker is based on the PreservedAnalyses result returned
by functional passes: if CFGAnalyses is in the returned
PreservedAnalyses set, then the checker asserts that the CFG
snapshot saved before the pass is equal to the CFG snapshot
taken after the the pass. The problem is in passes that change
CFG and invalidate CFGAnalyses on their own. Such passes do not
return CFGanalyses in the returned PreservedAnalyses. So the
checker mistakenly expects CFG unchanged. As an example see the
class TestSimplifyCFGInvalidatingAnalysisPass in the new tests.

It is interesting that the bug was not found in LLVM. That is
because the CFG checker ran only if CFGAnalyses was checked
incorrectly:
  if (!PassPA.allAnalysesInSetPreserved<CFGAnalyses>())
    return;

but must be checked as follows:
  auto PAC = PA.getChecker<PreservedCFGCheckerAnalysis>();
  if (!(PAC.preserved() ||
        PAC.preservedSet<AllAnalysesOn<Function>>() ||
        PAC.preservedSet<CFGAnalyses>())
    return;

A fully redesigned checker will be sent as a separate follow-up
patch.

Reviewed By: Serguei Katkov, Jakub Kuderski

Differential Revision: https://reviews.llvm.org/D91324
2020-11-18 10:02:47 +07:00
Florian Hahn 8dbe44cb29 Add pass to add !annotate metadata from @llvm.global.annotations.
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated  using
__attribute__((annotate("_name"))) on functions in Clang.

This has been discussed on llvm-dev as part of
    RFC: Combining Annotation Metadata and Remarks
    http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D91195
2020-11-16 14:57:11 +00:00
Arthur Eubanks 6e04da0a5a [DCE] Port -redundant-dbg-inst-elim to NPM
This is used to test RemoveRedundantDbgInstrs(), which is used by other
passes.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91477
2020-11-14 16:55:20 -08:00
Florian Hahn 8bb6347939
Add !annotation metadata and remarks pass.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.

It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.

The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get answered:

* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?

Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

Reviewed By: thegameg, jdoerfert

Differential Revision: https://reviews.llvm.org/D91188
2020-11-13 13:24:10 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Jamie Schmeiser 782d6a6963 Introduce -print-before-changed, making -print-changed also print before passes that modify IR
Summary:
Add an option -print-before-changed that modifies the print-changed
behaviour so that it prints the IR before a pass that changed it in
addition to printing the IR after the pass. Note that the option
does nothing in isolation. The filtering options work as expected.
Lit tests are included.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: aeubanks (Arthur Eubanks)

Differential Revision: https://reviews.llvm.org/D88757
2020-11-12 15:20:50 +00:00
Arthur Eubanks b6ccff3d5f [NewPM] Provide method to run all pipeline callbacks, used for -O0
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.

This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.

Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.

Tests are a continuation of those added in
https://reviews.llvm.org/D89083.

Reviewed By: asbirlea, Meinersbur

Differential Revision: https://reviews.llvm.org/D89158
2020-11-11 15:10:27 -08:00
Sjoerd Meijer 2ef47910d5 [LoopFlatten] Run it earlier, just before IndVarSimplify
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.

Differential Revision: https://reviews.llvm.org/D90402
2020-11-10 20:22:41 +00:00
Sjoerd Meijer 706ead0e87 [LoopFlatten] Make it a FunctionPass
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.

Differential Revision: https://reviews.llvm.org/D90940
2020-11-10 20:03:31 +00:00
Arthur Eubanks 1cbf8e89b5 [NewPM] Port -separate-const-offset-from-gep
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91095
2020-11-09 17:42:36 -08:00
Josh Stone 4463b73e79 Enable opt-bisect for the new pass manager
This instruments a should-run-optional-pass callback using the existing
OptBisect class to decide if new passes should be skipped. Passes that
force isRequired never reach this at all, so they are not included in
"BISECT:" output nor its pass count.

The test case is resurrected from r267022, an early version of D19172
that had new pass manager support (later reverted and redone without).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D87951
2020-11-09 15:57:48 -08:00
Arthur Eubanks cdb51bfaa7 [NewPM] Add unique-internal-linkage-names to PassRegistry.def
Pass was already ported, just not properly hooked up.
2020-11-09 12:54:13 -08:00
Pedro Tammela 5e8ecff0d8 [Reg2Mem] add support for the new pass manager
This patch refactors the pass to accomodate the new pass manager
boilerplate.

Differential Revision: https://reviews.llvm.org/D91005
2020-11-08 11:14:05 +00:00
Arthur Eubanks 226e179f74 Revert "[NewPM] Provide method to run all pipeline callbacks, used for -O0"
This reverts commit ae38540042.
As well as some follow-up test fixes.

The original change causes new-pass-manager.ll to fail when polly is enabled.
2020-11-08 00:32:35 -08:00
Arthur Eubanks ae38540042 [NewPM] Provide method to run all pipeline callbacks, used for -O0
Some targets may add required passes via
    TargetMachine::registerPassBuilderCallbacks(). We need to run those even
    under -O0. As an example, BPFTargetMachine adds
    BPFAbstractMemberAccessPass, a required pass.

    This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
    usage of the NPM) by allowing us to share added passes like coroutines
    and sanitizers between -O0 and other optimization levels.

    Tests are a continuation of those added in
    https://reviews.llvm.org/D89083.

    In order to prevent TargetMachines from adding unnecessary optimization
    passes at -O0, TargetMachine::registerPassBuilderCallbacks() will be
    changed to take an OptimizationLevel, but that will be done separately.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89158
2020-11-04 22:27:16 -08:00
Arthur Eubanks ab0ddbc38a Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 13:11:40 -08:00
Arthur Eubanks 9173b5a99d Revert "[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback"
This reverts commit 7a83aa0520.

Causing buildbot failures.
2020-11-04 12:57:32 -08:00
Arthur Eubanks 7a83aa0520 [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 12:53:30 -08:00
Arthur Eubanks d8f531c42c [NewPM] Don't run before pass instrumentation on required passes
This allows those instrumentation to log when they decide to skip a
pass. This provides extra helpful info for optnone functions and also
will help with opt-bisect.

Have OptNoneInstrumentation print when it skips due to seeing optnone.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90545
2020-11-04 09:45:10 -08:00
Arthur Eubanks 06926e0f01 Port print-must-be-executed-contexts and print-mustexecute to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90207
2020-11-03 21:06:46 -08:00
Arthur Eubanks 2e31727a88 [NFC] Clean up PassBuilder
Make DebugLogging a member variable so that users of PassBuilder don't
need to pass it around so much.

Move call to TargetMachine::registerPassBuilderCallbacks() within
PassBuilder so users don't need to remember to call it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90437
2020-10-30 10:03:59 -07:00
Serguei Katkov b69919b537 [GVN LoadPRE] Add an option to disable splitting backedge
GVN Load PRE can split the backedge causing breaking the loop structure where the latch
contains the conditional branch with for example induction variable.

Different optimizations expect this form of the loop, so it is better to preserve it for some time.
This CL adds an option to control an ability to split backedge.

Default value is true so technically it is NFC and current behavior is not changed.

Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn
Reviewed By: mkazasntsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89854
2020-10-27 11:59:52 +07:00
Arthur Eubanks 3dd1c72458 Port -objc-arc-expand to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90182
2020-10-26 20:05:10 -07:00
Arthur Eubanks 90c0b0d3d6 Port -objc-arc-apelim to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90181
2020-10-26 20:01:46 -07:00
TaWeiTu 0efbfa38ae [NPM] Port -slsr to NPM
`-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D90149
2020-10-27 09:21:40 +08:00
Arthur Eubanks c039e83a2c Fix typo SSC -> SCC 2020-10-24 16:26:48 -07:00
TaWeiTu 65a36bbc3d [NPM] Port -loop-versioning-licm to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89371
2020-10-24 21:51:18 +08:00
Arthur Eubanks baffd052b0 [StructurizeCFG][NewPM] Port -structurizecfg to NPM
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.

The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.

This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89026
2020-10-23 15:54:03 -07:00
Arthur Eubanks 92d9a3868a Port -instnamer to NPM
Some clang tests use this.

Reviewed By: akhuang

Differential Revision: https://reviews.llvm.org/D89931
2020-10-22 12:08:36 -07:00
Arthur Eubanks cb9ca35977 [LoopRotate][NPM] Disable header duplication under -Oz
It was already disabled under -Oz in
buildFunctionSimplificationPipeline(), but not in
buildModuleOptimizationPipeline()/addPGOInstrPasses().

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89927
2020-10-22 08:39:12 -07:00
Arthur Eubanks 8d9466a385 [BlockExtract][NewPM] Port -extract-blocks to NPM
Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89015
2020-10-21 12:51:11 -07:00
Florian Hahn 88241ffb56 [Passes] Move ADCE before DSE & LICM.
The adjustment seems to have very little impact on optimizations.
The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is
in consumer-typeset and the size there actually decreases by -0.1%, with
not significant changes in the stats.

On its own, it is mildly positive in terms of compile-time, most likely
due to LICM & DSE having to process slightly less instructions. It
should also be unlikely that DSE/LICM make much new code dead.

http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions

With DSE & MemorySSA, it gives some nice compile-time improvements, due
to the fact that DSE can re-use the PDT from ADCE, if it does not make
any changes:

http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D87322
2020-10-21 10:30:56 +01:00
Ta-Wei Tu 529ecd19df [NPM] port -unify-loop-exits to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89774
2020-10-20 10:46:57 -07:00
Ta-Wei Tu 59286b36df [NPM] Port -mergereturn to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89781
2020-10-20 10:33:58 -07:00
Amy Huang ea693a1627 [NPM] Port module-debuginfo pass to the new pass manager
Port pass to NPM and update tests in DebugInfo/Generic.

Differential Revision: https://reviews.llvm.org/D89730
2020-10-19 14:31:17 -07:00
Hans Wennborg 0628bea513 Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting"
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d5d.
2020-10-19 12:31:14 +02:00
Vedant Kumar 273c299d5d [PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
2020-10-15 23:13:33 +00:00
Arthur Eubanks 518ec05a10 [LoopExtract][NewPM] Port -loop-extract to NPM
-loop-extract-single is just -loop-extract on one loop.

-loop-extract depended on -break-crit-edges and -loop-simplify in the
legacy PM, but the NPM doesn't allow specifying pass dependencies like
that, so manually add those passes to the RUN lines where necessary.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89016
2020-10-13 22:55:42 -07:00
Arthur Eubanks 0689dab844 [FixIrreducible][NewPM] Port -fix-irreducible to NPM
In the NPM, a pass cannot depend on another non-analysis pass. So pin
the test that tests that -lowerswitch is run automatically to legacy PM.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D89051
2020-10-09 09:22:09 -07:00
Arthur Eubanks 9c21c6c966 [LoopInterchange][NewPM] Port -loop-interchange to NPM
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89058
2020-10-09 09:21:31 -07:00
Arthur Eubanks 6dcbea877b [NewPM] Use PassInstrumentation for -verify-each
This removes "VerifyEachPass" parameters from a lot of functions which is nice.

Don't verify after special passes or VerifierPass.

This introduces verification on loop and cgscc passes, verifying the corresponding function/module.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D88764
2020-10-07 19:24:25 -07:00
Reid Kleckner 940d7aaea9 Port StripGCRelocates pass to NPM
Fixes one test under NPM

Differential Revision: https://reviews.llvm.org/D88766
2020-10-07 14:41:29 -07:00
Reid Kleckner da48fe1732 [NPM] Port strip nonlinetable debuginfo pass to the new pass manager
Fixes a few tests in llvm/test/Transforms/Utils.

Differential Revision: https://reviews.llvm.org/D88762
2020-10-07 14:35:36 -07:00
Arthur Eubanks d4e08c95e5 [NewPM] Set -enable-npm-optnone to true by default
This makes the NPM skip not required passes on functions marked optnone.

If this causes a pass that should be required but has not been marked
required to be skipped, add
`static bool isRequired() { return true; }`
to the pass class. AlwaysInlinerPass is an example.

clang/test/CodeGen/O0-no-skipped-passes.c is useful for checking that
no passes are skipped under -O0.

The -enable-npm-optnone option will be removed once this has been stable
for long enough without issues.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D87869
2020-10-05 18:42:32 -07:00
Arthur Eubanks 321986fe68 [MetaRenamer][NewPM] Port metarenamer to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D88690
2020-10-02 15:42:25 -07:00
Jamie Schmeiser 71124a9dbd Reland No.3: Add new hidden option -print-changed which only reports changes to IR
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.

The code introduces an abstract template base class that generalizes the
comparison of IRs that takes an IR representation as template parameter.
Derived classes provide overrides that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.

The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.

Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen), MaskRay (Fangrui Song)

Differential Revision: https://reviews.llvm.org/D86360
2020-10-01 17:39:13 +00:00
Sjoerd Meijer d53b4bee0c [LoopFlatten] Add a loop-flattening pass
This is a simple pass that flattens nested loops.  The intention is to optimise
loop nests like this, which together access an array linearly:

  for (int i = 0; i < N; ++i)
    for (int j = 0; j < M; ++j)
      f(A[i*M+j]);

into one loop:

  for (int i = 0; i < (N*M); ++i)
    f(A[i]);

It can also flatten loops where the induction variables are not used in the
loop. This can help with codesize and runtime, especially on simple cpus
without advanced branch prediction.

This is only worth flattening if the induction variables are only used in an
expression like i*M+j. If they had any other uses, we would have to insert a
div/mod to reconstruct the original values, so this wouldn't be profitable.

This partially fixes PR40581 as this pass triggers on one of the two cases. I
will follow up on this to learn LoopFlatten a few more (small) tricks. Please
note that LoopFlatten is not yet enabled by default.

Patch by Oliver Stannard, with minor tweaks from Dave Green and myself.

Differential Revision: https://reviews.llvm.org/D42365
2020-10-01 13:54:45 +01:00
Arthur Eubanks 460dda071e [WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass
The legacy pass's default constructor sets UseCommandLine = true and
goes down a separate testing route. Match that in the NPM pass.

This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D88588
2020-09-30 17:27:37 -07:00
Arthur Eubanks 4fbd83c716 [ObjCARCAA][NewPM] Add already ported objc-arc-aa to PassRegistry.def
Also add missing AnalysisKey definition.
2020-09-30 08:50:44 -07:00
Chuanqi Xu b3a722e66b [Coroutines] Reuse storage for local variables with non-overlapping lifetimes
bug 45566 shows the process of building coroutine frame won't consider
that the lifetimes of different local variables are not overlapped,
which means the compiler could generates smaller frame.

This patch calculate the lifetime range of each alloca by StackLifetime
class. Then the patch build non-overlapped sets for allocas whose
lifetime ranges are not overlapped. We use the largest type in a
non-overlapped set as the field type in the frame. In insertSpills
process, if we find the type of field is not the same with the alloca,
we cast the pointer to the field type to the pointer to the alloca type.
Since the lifetime range of alloca in one non-overlapped set is not
overlapped with each other, it should be ok to reuse the storage space
in the frame.

Test plan: check-llvm, check-clang, cppcoro, folly

Reviewers: junparser, lxfind, modocache

Differential Revision: https://reviews.llvm.org/D87596
2020-09-28 15:48:00 +08:00
Fangrui Song 50bd71e1d7 [NewPM] Port ConstraintElimination to the new pass manager
If -enable-constraint-elimination is specified, add it to the -O2/-O3 pipeline.
(-O1 uses a separate function now.)

Reviewed By: fhahn, aeubanks

Differential Revision: https://reviews.llvm.org/D88365
2020-09-27 11:12:26 -07:00
Arthur Eubanks 83e3ea2cfc [LowerTypeTests][NewPM] Add constructor that uses command line flags
This matches the legacy PM pass by having one constructor use command
line flags, and the other use parameters to the pass.

This fixes all tests under Transforms/LowerTypeTests using NPM.

Reviewed By: ychen, pcc

Differential Revision: https://reviews.llvm.org/D87845
2020-09-25 17:39:59 -07:00
Arthur Eubanks d3f6972abb [LoopReroll][NewPM] Port -loop-reroll to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87957
2020-09-25 12:09:06 -07:00
Hans Wennborg 4f1897c6f0 Move PassBuilder::registerParseTopLevelPipelineCallback out-of-line
For some mysterious reason it doesn't build with clang-cl when compiled
as part of the includes in clang's CodeGenAction.cpp
(crbug.com/1132292).
2020-09-25 19:55:40 +02:00
Andrew Litteken f02c4c87b4 [IRSim] Adding wrapper pass for IRSimilarityIdentfier
This introduces an analysis pass that wraps IRSimilarityIdentifier,
and adds a printer pass to examine in what function similarities are
being found.

Test for what the printer pass can find are in
test/Analysis/IRSimilarityIdentifier.

Reviewed by: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D86973
2020-09-24 14:59:41 -05:00
Arthur Eubanks 29aaa18848 Revert "[NewPM] Add callbacks to PassBuilder to run before/after parsing a pass"
This reverts commit 111aa4e366.
2020-09-23 18:43:13 -07:00
Arthur Eubanks 111aa4e366 [NewPM] Add callbacks to PassBuilder to run before/after parsing a pass
This is in preparation for supporting -debugify-each, which adds a debug
info pass before and after each pass.

Switch VerifyEach to use this.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D88107
2020-09-23 15:25:40 -07:00
Arthur Eubanks a031ef6f3a [GVNSink][NewPM] Add GVNSinkPass to PassRegistry.def 2020-09-22 08:24:09 -07:00
Arthur Eubanks 9db0c572c1 [Delinearization][NewPM] Port delinearization to NPM
Also make tests in Analysis/Delinearization work under NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87741
2020-09-21 17:59:08 -07:00
Arthur Eubanks 024979b7b6 [ObjCARC][NewPM] Port objc-arc-contract to NPM
Similar to https://reviews.llvm.org/D86178.

This is a module pass instead of a function pass since
ARCRuntimeEntryPoints can lazily add function declarations.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D87806
2020-09-21 09:40:14 -07:00
Arthur Eubanks 5249e6f248 [LoopSimplifyCFG][NewPM] Rename simplify-cfg -> loop-simplifycfg
This matches the legacy PM name and makes all tests in
Transforms/LoopSimplifyCFG pass under NPM.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87948
2020-09-21 08:27:19 -07:00
Douglas Yung b03c2b8395 Revert "Re-land: Add new hidden option -print-changed which only reports changes to IR"
The test added in this commit is failing on Windows bots:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/1269

This reverts commit f9e6d1edc0 and follow-up commit 6859d95ea2.
2020-09-17 01:32:29 -07:00
Arthur Eubanks f4ea0f9814 [NewPM] Port -print-alias-sets to NPM
Really it should be named print<alias-sets>, but for the sake of
changing fewer tests, added a TODO to rename after NPM switch and test
cleanup.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87713
2020-09-16 18:34:56 -07:00
Michael Liao 6859d95ea2 Fix build. 2020-09-16 14:52:00 -04:00
Jamie Schmeiser f9e6d1edc0 Re-land: Add new hidden option -print-changed which only reports changes to IR
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.

The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.

The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.

Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen)

Differential Revision: https://reviews.llvm.org/D86360
2020-09-16 17:25:18 +00:00
Arthur Eubanks 09c342493d [NPM] Translate alias analysis into require<> as well
'require<globals-aa>' is needed to make globals-aa work in NPM, since
globals-aa is a module analysis but function passes cannot run module
analyses on demand.
So don't skip translating alias analyses to 'require<>'.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87743
2020-09-16 08:54:09 -07:00
Arthur Eubanks ba12e77ec1 [NewPM] Port strip* passes to NPM
strip-nondebug and strip-debug-declare have no existing associated tests

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87639
2020-09-15 18:25:12 -07:00
Arthur Eubanks f7aa1563eb [LowerSwitch][NewPM] Port lowerswitch to NPM
Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87726
2020-09-15 18:18:31 -07:00
Wenlei He 2c391a5a14 [LICM] Make Loop ICM profile aware again
D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults.

This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block.

Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates.

Testing:
ninja check

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87551
2020-09-15 17:21:58 -07:00
Wenlei He 2ea4c2c598 [BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults
~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~

~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~

~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~
~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~
~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~

~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~

See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 .

This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes.

Testing
Ninja check

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D86156
2020-09-15 16:16:24 -07:00
Arthur Eubanks 3f69b2140f [NewPM][opt] Fix -globals-aa not being recognized as alias analysis in NPM
Was missing MODULE_ALIAS_ANALYSIS, previously only FUNCTION_ALIAS_ANALYSIS was taken into account.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87664
2020-09-15 11:18:19 -07:00
Arthur Eubanks 10b12d4035 Reland [docs][NewPM] Add docs for writing NPM passes
As to not conflict with the legacy PM example passes under
llvm/lib/Transforms/Hello, this is under HelloNew. This makes the
CMakeLists.txt and general directory structure less confusing for people
following the example.

Much of the doc structure was taken from WritinAnLLVMPass.rst.

This adds a HelloWorld pass which simply prints out each function name.

More will follow after this, e.g. passes over different units of IR, analyses.
https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more.

Relanded with missing "Support" dependency in LLVMBuild.txt.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86979
2020-09-14 16:06:19 -07:00
Arthur Eubanks 39ec36415d Revert "[docs][NewPM] Add docs for writing NPM passes"
This reverts commit c2590de30d.

Breaks shared libs build
2020-09-14 15:55:17 -07:00
Arthur Eubanks c2590de30d [docs][NewPM] Add docs for writing NPM passes
As to not conflict with the legacy PM example passes under
llvm/lib/Transforms/Hello, this is under HelloNew. This makes the
CMakeLists.txt and general directory structure less confusing for people
following the example.

Much of the doc structure was taken from WritinAnLLVMPass.rst.

This adds a HelloWorld pass which simply prints out each function name.

More will follow after this, e.g. passes over different units of IR, analyses.
https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86979
2020-09-14 13:26:03 -07:00
Teresa Johnson 226d80ebe2 [MemProf] Rename HeapProfiler to MemProfiler for consistency
This is consistent with the clang option added in
7ed8124d46, and the comments on the
runtime patch in D87120.

Differential Revision: https://reviews.llvm.org/D87622
2020-09-14 13:14:57 -07:00
Yevgeny Rouban 28012e00d8 [NewPM] Introduce PreserveCFG check
Check that all passes, which report they preserve CFG,
are really preserving CFG.
A new standard instrumentation is introduced. It can be
switched on/off by the flag verify-cfg-preserved, which
is on by default for debug builds.

Reviewers: kuhar, fedor.sergeev

Differential Revision: https://reviews.llvm.org/D81558
2020-09-11 14:32:21 +07:00
Juneyoung Lee 1b9884df8d Enable InsertFreeze flag of JumpThreading when used in LTO
This patch enables inserting freeze when JumpThreading converts a select to
a conditional branch when it is run in LTO.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85534
2020-09-10 19:05:49 +09:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Arthur Eubanks c9771391ce [NewPM][Lint] Port -lint to NewPM
This also changes -lint from an analysis to a pass. It's similar to
-verify, and that is a normal pass, and lives in llvm/IR.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87057
2020-09-03 13:03:44 -07:00
Jamie Schmeiser b2e65cf950 Revert "Add new hidden option -print-changed which only reports changes to IR"
This reverts commit 7bc9924cb2 due to
failure caused by missing a space between trailing >>, required by some
versions of C++:wq.
2020-09-03 18:41:20 +00:00
Jamie Schmeiser 7bc9924cb2 Add new hidden option -print-changed which only reports changes to IR
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.

The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.

The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
See https://hotcrp.llvm.org/usllvm2020/paper/29 for more information.

Reviewed By: yrouban (Yevgeny Rouban)

Differential Revision: https://reviews.llvm.org/D86360
2020-09-03 15:52:35 +00:00
Arthur Eubanks e440b4933a Revert "[NewPM][Lint] Port -lint to NewPM"
This reverts commit 883399c840.
2020-09-02 21:34:29 -07:00
Arthur Eubanks 883399c840 [NewPM][Lint] Port -lint to NewPM
This also changes -lint from an analysis to a pass. It's similar to
-verify, and that is a normal pass, and lives in llvm/IR.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87057
2020-09-02 21:13:01 -07:00
Arthur Eubanks cfde93e5d6 [ObjCARCOpt] Port objc-arc to NPM
Since doInitialization() in the legacy pass modifies the module, the NPM
pass is a Module pass.

Reviewed By: ahatanak, ychen

Differential Revision: https://reviews.llvm.org/D86178
2020-08-28 12:59:33 -07:00