llvm-project/llvm/lib/Transforms/Scalar/LoopDeletion.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

305 lines
12 KiB
C++
Raw Normal View History

//===- LoopDeletion.cpp - Dead Loop Deletion Pass ---------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file implements the Dead Loop Deletion Pass. This pass is responsible
// for eliminating loops with non-infinite computable trip counts that have no
// side effects or volatile instructions, and do not contribute to the
// computation of the function's return value.
//
//===----------------------------------------------------------------------===//
#include "llvm/Transforms/Scalar/LoopDeletion.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"
#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/LoopPassManager.h"
[LPM] Factor all of the loop analysis usage updates into a common helper routine. We were getting this wrong in small ways and generally being very inconsistent about it across loop passes. Instead, let's have a common place where we do this. One minor downside is that this will require some analyses like SCEV in more places than they are strictly needed. However, this seems benign as these analyses are complete no-ops, and without this consistency we can in many cases end up with the legacy pass manager scheduling deciding to split up a loop pass pipeline in order to run the function analysis half-way through. It is very, very annoying to fix these without just being very pedantic across the board. The only loop passes I've not updated here are ones that use AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer. They seemed less relevant. With this patch, almost all of the problems in PR24804 around loop pass pipelines are fixed. The one remaining issue is that we run simplify-cfg and instcombine in the middle of the loop pass pipeline. We've recently added some loop variants of these passes that would seem substantially cleaner to use, but this at least gets us much closer to the previous state. Notably, the seven loop pass managers is down to three. I've not updated the loop passes using LoopAccessAnalysis because that analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't clear that those transforms want to support those forms anyways. They all run late anyways, so this is harmless. Similarly, LSR is left alone because it already carefully manages its forms and doesn't need to get fused into a single loop pass manager with a bunch of other loop passes. LoopReroll didn't use loop simplified form previously, and I've updated the test case to match the trivially different output. Finally, I've also factored all the pass initialization for the passes that use this technique as well, so that should be done regularly and reliably. Thanks to James for the help reviewing and thinking about this stuff, and Ben for help thinking about it as well! Differential Revision: http://reviews.llvm.org/D17435 llvm-svn: 261316
2016-02-19 18:45:18 +08:00
#include "llvm/Transforms/Utils/LoopUtils.h"
using namespace llvm;
#define DEBUG_TYPE "loop-delete"
STATISTIC(NumDeleted, "Number of loops deleted");
enum class LoopDeletionResult {
Unmodified,
Modified,
Deleted,
};
/// Determines if a loop is dead.
///
/// This assumes that we've already checked for unique exit and exiting blocks,
/// and that the code is in LCSSA form.
static bool isLoopDead(Loop *L, ScalarEvolution &SE,
SmallVectorImpl<BasicBlock *> &ExitingBlocks,
BasicBlock *ExitBlock, bool &Changed,
BasicBlock *Preheader) {
// Make sure that all PHI entries coming from the loop are loop invariant.
// Because the code is in LCSSA form, any values used outside of the loop
// must pass through a PHI in the exit block, meaning that this check is
// sufficient to guarantee that no loop-variant values are used outside
// of the loop.
bool AllEntriesInvariant = true;
bool AllOutgoingValuesSame = true;
for (PHINode &P : ExitBlock->phis()) {
Value *incoming = P.getIncomingValueForBlock(ExitingBlocks[0]);
// Make sure all exiting blocks produce the same incoming value for the exit
// block. If there are different incoming values for different exiting
// blocks, then it is impossible to statically determine which value should
// be used.
AllOutgoingValuesSame =
all_of(makeArrayRef(ExitingBlocks).slice(1), [&](BasicBlock *BB) {
return incoming == P.getIncomingValueForBlock(BB);
});
2012-07-24 18:51:42 +08:00
if (!AllOutgoingValuesSame)
break;
if (Instruction *I = dyn_cast<Instruction>(incoming))
if (!L->makeLoopInvariant(I, Changed, Preheader->getTerminator())) {
AllEntriesInvariant = false;
break;
}
}
2012-07-24 18:51:42 +08:00
if (Changed)
SE.forgetLoopDispositions(L);
if (!AllEntriesInvariant || !AllOutgoingValuesSame)
return false;
// Make sure that no instructions in the block have potential side-effects.
// This includes instructions that could write to memory, and loads that are
// marked volatile.
for (auto &I : L->blocks())
[LoopDelete][Assume] Allow deleting loops with assumes This pervent very poor optimization caused by a signle assume like https://godbolt.org/z/EK3oMh baseline flags: -O3 patched flags: -O3 -mllvm --enable-knowledge-retention Before the patch ``` Metric: compile_time Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 20.72 29.74 43.5% test-suite :: CTMark/Bullet/bullet.test 24.39 24.91 2.2% test-suite :: CTMark/7zip/7zip-benchmark.test 37.39 38.06 1.8% test-suite :: CTMark/kimwitu++/kc.test 11.76 11.94 1.5% test-suite :: CTMark/sqlite3/sqlite3.test 12.94 12.91 -0.3% test-suite :: CTMark/SPASS/SPASS.test 11.72 11.70 -0.2% test-suite :: CTMark/lencod/lencod.test 16.12 16.10 -0.1% test-suite :: CTMark/ClamAV/clamscan.test 13.31 13.30 -0.1% test-suite :: CTMark/mafft/pairlocalalign.test 9.12 9.12 -0.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 9.34 9.34 -0.1% Geomean difference 4.2% Metric: compiler_Kinst_count Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107576069.87 172886418.90 60.7% test-suite :: CTMark/Bullet/bullet.test 123291865.66 125457117.96 1.8% test-suite :: CTMark/kimwitu++/kc.test 56347884.64 57298544.14 1.7% test-suite :: CTMark/7zip/7zip-benchmark.test 180637699.58 183341656.57 1.5% test-suite :: CTMark/sqlite3/sqlite3.test 66723788.85 66664692.80 -0.1% test-suite :: CTMark/ClamAV/clamscan.test 69581500.56 69597863.92 0.0% test-suite :: CTMark/lencod/lencod.test 94236501.48 94216545.32 -0.0% test-suite :: CTMark/SPASS/SPASS.test 58516756.95 58505089.07 -0.0% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 48832815.53 48841989.39 0.0% test-suite :: CTMark/mafft/pairlocalalign.test 49682720.53 49686324.34 0.0% Geomean difference 5.4% ``` After the patch ``` Metric: compile_time Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 20.70 22.40 8.2% test-suite :: CTMark/7zip/7zip-benchmark.test 37.13 38.05 2.5% test-suite :: CTMark/Bullet/bullet.test 24.25 24.83 2.4% test-suite :: CTMark/kimwitu++/kc.test 11.69 11.94 2.2% test-suite :: CTMark/ClamAV/clamscan.test 13.19 13.36 1.3% test-suite :: CTMark/lencod/lencod.test 16.02 16.19 1.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 9.29 9.36 0.7% test-suite :: CTMark/SPASS/SPASS.test 11.64 11.73 0.7% test-suite :: CTMark/mafft/pairlocalalign.test 9.10 9.15 0.5% test-suite :: CTMark/sqlite3/sqlite3.test 12.95 12.96 0.0% Geomean difference 1.9% Metric: compiler_Kinst_count Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107590933.61 114044834.72 6.0% test-suite :: CTMark/kimwitu++/kc.test 56344526.77 57235806.29 1.6% test-suite :: CTMark/Bullet/bullet.test 123291285.10 125128334.97 1.5% test-suite :: CTMark/7zip/7zip-benchmark.test 180641540.10 183155706.39 1.4% test-suite :: CTMark/sqlite3/sqlite3.test 66725619.22 66668713.92 -0.1% test-suite :: CTMark/SPASS/SPASS.test 58509029.85 58478704.75 -0.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 48843711.23 48826894.68 -0.0% test-suite :: CTMark/lencod/lencod.test 94233305.79 94207544.23 -0.0% test-suite :: CTMark/ClamAV/clamscan.test 69587887.66 69603549.90 0.0% test-suite :: CTMark/mafft/pairlocalalign.test 49686968.65 49689291.04 0.0% Geomean difference 1.0% ``` Reviewed By: jdoerfert, efriedma Differential Revision: https://reviews.llvm.org/D86816
2020-09-26 18:31:12 +08:00
if (any_of(*I, [](Instruction &I) {
return I.mayHaveSideEffects() && !I.isDroppable();
}))
return false;
return true;
}
/// This function returns true if there is no viable path from the
/// entry block to the header of \p L. Right now, it only does
/// a local search to save compile time.
static bool isLoopNeverExecuted(Loop *L) {
using namespace PatternMatch;
auto *Preheader = L->getLoopPreheader();
// TODO: We can relax this constraint, since we just need a loop
// predecessor.
assert(Preheader && "Needs preheader!");
if (Preheader == &Preheader->getParent()->getEntryBlock())
return false;
// All predecessors of the preheader should have a constant conditional
// branch, with the loop's preheader as not-taken.
for (auto *Pred: predecessors(Preheader)) {
BasicBlock *Taken, *NotTaken;
ConstantInt *Cond;
if (!match(Pred->getTerminator(),
m_Br(m_ConstantInt(Cond), Taken, NotTaken)))
return false;
if (!Cond->getZExtValue())
std::swap(Taken, NotTaken);
if (Taken == Preheader)
return false;
}
assert(!pred_empty(Preheader) &&
"Preheader should have predecessors at this point!");
// All the predecessors have the loop preheader as not-taken target.
return true;
}
/// Remove a loop if it is dead.
///
/// A loop is considered dead if it does not impact the observable behavior of
/// the program other than finite running time. This never removes a loop that
/// might be infinite (unless it is never executed), as doing so could change
/// the halting/non-halting nature of a program.
///
/// This entire process relies pretty heavily on LoopSimplify form and LCSSA in
/// order to make various safety checks work.
///
/// \returns true if any changes were made. This may mutate the loop even if it
/// is unable to delete it due to hoisting trivially loop invariant
/// instructions out of the loop.
static LoopDeletionResult deleteLoopIfDead(Loop *L, DominatorTree &DT,
ScalarEvolution &SE, LoopInfo &LI,
MemorySSA *MSSA,
OptimizationRemarkEmitter &ORE) {
assert(L->isLCSSAForm(DT) && "Expected LCSSA!");
// We can only remove the loop if there is a preheader that we can branch from
// after removing it. Also, if LoopSimplify form is not available, stay out
// of trouble.
BasicBlock *Preheader = L->getLoopPreheader();
if (!Preheader || !L->hasDedicatedExits()) {
LLVM_DEBUG(
dbgs()
<< "Deletion requires Loop with preheader and dedicated exits.\n");
return LoopDeletionResult::Unmodified;
}
// We can't remove loops that contain subloops. If the subloops were dead,
// they would already have been removed in earlier executions of this pass.
if (L->begin() != L->end()) {
LLVM_DEBUG(dbgs() << "Loop contains subloops.\n");
return LoopDeletionResult::Unmodified;
}
2012-07-24 18:51:42 +08:00
BasicBlock *ExitBlock = L->getUniqueExitBlock();
if (ExitBlock && isLoopNeverExecuted(L)) {
LLVM_DEBUG(dbgs() << "Loop is proven to never execute, delete it!");
// We need to forget the loop before setting the incoming values of the exit
// phis to undef, so we properly invalidate the SCEV expressions for those
// phis.
SE.forgetLoop(L);
// Set incoming value to undef for phi nodes in the exit block.
for (PHINode &P : ExitBlock->phis()) {
std::fill(P.incoming_values().begin(), P.incoming_values().end(),
UndefValue::get(P.getType()));
}
ORE.emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "NeverExecutes", L->getStartLoc(),
L->getHeader())
<< "Loop deleted because it never executes";
});
deleteDeadLoop(L, &DT, &SE, &LI, MSSA);
++NumDeleted;
return LoopDeletionResult::Deleted;
}
// The remaining checks below are for a loop being dead because all statements
// in the loop are invariant.
SmallVector<BasicBlock *, 4> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);
2012-07-24 18:51:42 +08:00
// We require that the loop only have a single exit block. Otherwise, we'd
// be in the situation of needing to be able to solve statically which exit
// block will be branched to, or trying to preserve the branching logic in
// a loop invariant manner.
if (!ExitBlock) {
LLVM_DEBUG(dbgs() << "Deletion requires single exit block\n");
return LoopDeletionResult::Unmodified;
}
// Finally, we have to check that the loop really is dead.
bool Changed = false;
if (!isLoopDead(L, SE, ExitingBlocks, ExitBlock, Changed, Preheader)) {
LLVM_DEBUG(dbgs() << "Loop is not invariant, cannot delete.\n");
return Changed ? LoopDeletionResult::Modified
: LoopDeletionResult::Unmodified;
}
2012-07-24 18:51:42 +08:00
// Don't remove loops for which we can't solve the trip count.
// They could be infinite, in which case we'd be changing program behavior.
const SCEV *S = SE.getConstantMaxBackedgeTakenCount(L);
if (isa<SCEVCouldNotCompute>(S)) {
LLVM_DEBUG(dbgs() << "Could not compute SCEV MaxBackedgeTakenCount.\n");
return Changed ? LoopDeletionResult::Modified
: LoopDeletionResult::Unmodified;
}
2012-07-24 18:51:42 +08:00
LLVM_DEBUG(dbgs() << "Loop is invariant, delete it!");
ORE.emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "Invariant", L->getStartLoc(),
L->getHeader())
<< "Loop deleted because it is invariant";
});
deleteDeadLoop(L, &DT, &SE, &LI, MSSA);
++NumDeleted;
return LoopDeletionResult::Deleted;
}
[PM] Rewrite the loop pass manager to use a worklist and augmented run arguments much like the CGSCC pass manager. This is a major redesign following the pattern establish for the CGSCC layer to support updates to the set of loops during the traversal of the loop nest and to support invalidation of analyses. An additional significant burden in the loop PM is that so many passes require access to a large number of function analyses. Manually ensuring these are cached, available, and preserved has been a long-standing burden in LLVM even with the help of the automatic scheduling in the old pass manager. And it made the new pass manager extremely unweildy. With this design, we can package the common analyses up while in a function pass and make them immediately available to all the loop passes. While in some cases this is unnecessary, I think the simplicity afforded is worth it. This does not (yet) address loop simplified form or LCSSA form, but those are the next things on my radar and I have a clear plan for them. While the patch is very large, most of it is either mechanically updating loop passes to the new API or the new testing for the loop PM. The code for it is reasonably compact. I have not yet updated all of the loop passes to correctly leverage the update mechanisms demonstrated in the unittests. I'll do that in follow-up patches along with improved FileCheck tests for those passes that ensure things work in more realistic scenarios. In many cases, there isn't much we can do with these until the loop simplified form and LCSSA form are in place. Differential Revision: https://reviews.llvm.org/D28292 llvm-svn: 291651
2017-01-11 14:23:21 +08:00
PreservedAnalyses LoopDeletionPass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,
LPMUpdater &Updater) {
LLVM_DEBUG(dbgs() << "Analyzing Loop for deletion: ");
LLVM_DEBUG(L.dump());
std::string LoopName = std::string(L.getName());
// For the new PM, we can't use OptimizationRemarkEmitter as an analysis
// pass. Function analyses need to be preserved across loop transformations
// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(L.getHeader()->getParent());
auto Result = deleteLoopIfDead(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE);
if (Result == LoopDeletionResult::Unmodified)
return PreservedAnalyses::all();
if (Result == LoopDeletionResult::Deleted)
Updater.markLoopAsDeleted(L, LoopName);
auto PA = getLoopPassPreservedAnalyses();
if (AR.MSSA)
PA.preserve<MemorySSAAnalysis>();
return PA;
}
namespace {
class LoopDeletionLegacyPass : public LoopPass {
public:
static char ID; // Pass ID, replacement for typeid
LoopDeletionLegacyPass() : LoopPass(ID) {
initializeLoopDeletionLegacyPassPass(*PassRegistry::getPassRegistry());
}
// Possibly eliminate loop L if it is dead.
bool runOnLoop(Loop *L, LPPassManager &) override;
void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addPreserved<MemorySSAWrapperPass>();
getLoopAnalysisUsage(AU);
}
};
}
char LoopDeletionLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(LoopDeletionLegacyPass, "loop-deletion",
"Delete dead loops", false, false)
INITIALIZE_PASS_DEPENDENCY(LoopPass)
INITIALIZE_PASS_END(LoopDeletionLegacyPass, "loop-deletion",
"Delete dead loops", false, false)
Pass *llvm::createLoopDeletionPass() { return new LoopDeletionLegacyPass(); }
bool LoopDeletionLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
if (skipLoop(L))
return false;
DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
ScalarEvolution &SE = getAnalysis<ScalarEvolutionWrapperPass>().getSE();
LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
auto *MSSAAnalysis = getAnalysisIfAvailable<MemorySSAWrapperPass>();
MemorySSA *MSSA = nullptr;
if (MSSAAnalysis)
MSSA = &MSSAAnalysis->getMSSA();
// For the old PM, we can't use OptimizationRemarkEmitter as an analysis
// pass. Function analyses need to be preserved across loop transformations
// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(L->getHeader()->getParent());
LLVM_DEBUG(dbgs() << "Analyzing Loop for deletion: ");
LLVM_DEBUG(L->dump());
LoopDeletionResult Result = deleteLoopIfDead(L, DT, SE, LI, MSSA, ORE);
if (Result == LoopDeletionResult::Deleted)
LPM.markLoopAsDeleted(*L);
return Result != LoopDeletionResult::Unmodified;
}