[BOLT][non-reloc] Change function splitting in non-relocation mode

Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.

For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.

After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.

One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.

Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.

As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.

New semantics for `-split-functions=<n>`:

  -split-functions - split functions into hot and cold regions
    =0 -   do not split any function
    =1 -   in non-relocation mode only split functions too large to fit
           into original code space
    =2 -   same as 1 (backwards compatibility)
    =3 -   split all functions

(cherry picked from FBD17362607)
This commit is contained in:
Maksim Panchenko 2019-09-11 15:42:22 -07:00
parent 615a318b60
commit e9c6c73bb8
11 changed files with 409 additions and 318 deletions

View File

@ -1734,8 +1734,9 @@ BinaryContext::createInjectedBinaryFunction(const std::string &Name,
} }
std::pair<size_t, size_t> std::pair<size_t, size_t>
BinaryContext::calculateEmittedSize(BinaryFunction &BF) { BinaryContext::calculateEmittedSize(BinaryFunction &BF, bool FixBranches) {
// Adjust branch instruction to match the current layout. // Adjust branch instruction to match the current layout.
if (FixBranches)
BF.fixBranches(); BF.fixBranches();
// Create local MC context to isolate the effect of ephemeral code emission. // Create local MC context to isolate the effect of ephemeral code emission.

View File

@ -942,11 +942,14 @@ public:
std::vector<BinaryFunction *> getSortedFunctions(); std::vector<BinaryFunction *> getSortedFunctions();
/// Do the best effort to calculate the size of the function by emitting /// Do the best effort to calculate the size of the function by emitting
/// its code, and relaxing branch instructions. /// its code, and relaxing branch instructions. By default, branch
/// instructions are updated to match the layout. Pass \p FixBranches set to
/// false if the branches are known to be up to date with the code layout.
/// ///
/// Return the pair where the first size is for the main part, and the second /// Return the pair where the first size is for the main part, and the second
/// size is for the cold one. /// size is for the cold one.
std::pair<size_t, size_t> calculateEmittedSize(BinaryFunction &BF); std::pair<size_t, size_t>
calculateEmittedSize(BinaryFunction &BF, bool FixBranches = true);
/// Calculate the size of the instruction \p Inst optionally using a /// Calculate the size of the instruction \p Inst optionally using a
/// user-supplied emitter for lock-free multi-thread work. MCCodeEmitter is /// user-supplied emitter for lock-free multi-thread work. MCCodeEmitter is

View File

@ -134,15 +134,6 @@ public:
PF_MEMEVENT = 4, /// Profile has mem events. PF_MEMEVENT = 4, /// Profile has mem events.
}; };
/// Settings for splitting function bodies into hot/cold partitions.
enum SplittingType : char {
ST_NONE = 0, /// Do not split functions
ST_EH, /// Split blocks comprising landing pads
ST_LARGE, /// Split functions that exceed maximum size in addition
/// to landing pads.
ST_ALL, /// Split all functions
};
static constexpr uint64_t COUNT_NO_PROFILE = static constexpr uint64_t COUNT_NO_PROFILE =
BinaryBasicBlock::COUNT_NO_PROFILE; BinaryBasicBlock::COUNT_NO_PROFILE;
@ -252,9 +243,6 @@ private:
/// destination. /// destination.
bool HasFixedIndirectBranch{false}; bool HasFixedIndirectBranch{false};
/// Is the function known to exceed its input size?
bool IsLarge{false};
/// True if the function is a fragment of another function. This means that /// True if the function is a fragment of another function. This means that
/// this function could only be entered via its parent or one of its sibling /// this function could only be entered via its parent or one of its sibling
/// fragments. It could be entered at any basic block. It can also return /// fragments. It could be entered at any basic block. It can also return
@ -1263,14 +1251,10 @@ public:
return HasUnknownControlFlow; return HasUnknownControlFlow;
} }
/// Return true if the function should be split for the output.
bool shouldSplit() const {
return IsLarge && !getBinaryContext().HasRelocations;
}
/// Return true if the function body is non-contiguous. /// Return true if the function body is non-contiguous.
bool isSplit() const { bool isSplit() const {
return layout_size() && return isSimple() &&
layout_size() &&
layout_front()->isCold() != layout_back()->isCold(); layout_front()->isCold() != layout_back()->isCold();
} }
@ -1654,11 +1638,6 @@ public:
return *this; return *this;
} }
BinaryFunction &setLarge(bool Large) {
IsLarge = Large;
return *this;
}
BinaryFunction &setUsesGnuArgsSize(bool Uses = true) { BinaryFunction &setUsesGnuArgsSize(bool Uses = true) {
UsesGnuArgsSize = Uses; UsesGnuArgsSize = Uses;
return *this; return *this;

View File

@ -22,6 +22,7 @@
#include "Passes/RegReAssign.h" #include "Passes/RegReAssign.h"
#include "Passes/ReorderFunctions.h" #include "Passes/ReorderFunctions.h"
#include "Passes/ReorderData.h" #include "Passes/ReorderData.h"
#include "Passes/SplitFunctions.h"
#include "Passes/StokeInfo.h" #include "Passes/StokeInfo.h"
#include "Passes/RetpolineInsertion.h" #include "Passes/RetpolineInsertion.h"
#include "Passes/ValidateInternalCalls.h" #include "Passes/ValidateInternalCalls.h"
@ -193,6 +194,13 @@ PrintSimplifyROLoads("print-simplify-rodata-loads",
cl::Hidden, cl::Hidden,
cl::cat(BoltOptCategory)); cl::cat(BoltOptCategory));
static cl::opt<bool>
PrintSplit("print-split",
cl::desc("print functions after code splitting"),
cl::ZeroOrMore,
cl::Hidden,
cl::cat(BoltOptCategory));
static cl::opt<bool> static cl::opt<bool>
PrintUCE("print-uce", PrintUCE("print-uce",
cl::desc("print functions after unreachable code elimination"), cl::desc("print functions after unreachable code elimination"),
@ -428,6 +436,8 @@ void BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
llvm::make_unique<EliminateUnreachableBlocks>(PrintUCE), llvm::make_unique<EliminateUnreachableBlocks>(PrintUCE),
opts::EliminateUnreachable); opts::EliminateUnreachable);
Manager.registerPass(llvm::make_unique<SplitFunctions>(PrintSplit));
// This pass syncs local branches with CFG. If any of the following // This pass syncs local branches with CFG. If any of the following
// passes breaks the sync - they either need to re-run the pass or // passes breaks the sync - they either need to re-run the pass or
// fix branches consistency internally. // fix branches consistency internally.
@ -497,6 +507,12 @@ void BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
Manager.registerPass( Manager.registerPass(
llvm::make_unique<InstructionLowering>(PrintAfterLowering)); llvm::make_unique<InstructionLowering>(PrintAfterLowering));
// In non-relocation mode, mark functions that do not fit into their original
// space as non-simple if we have to (e.g. for correct debug info update).
// NOTE: this pass depends on finalized code.
if (!BC.HasRelocations)
Manager.registerPass(llvm::make_unique<CheckLargeFunctions>(NeverPrint));
Manager.registerPass(llvm::make_unique<LowerAnnotations>(NeverPrint)); Manager.registerPass(llvm::make_unique<LowerAnnotations>(NeverPrint));
Manager.runPasses(); Manager.runPasses();

View File

@ -9,6 +9,7 @@
// //
//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//
#include "BinaryFunction.h"
#include "BinaryPasses.h" #include "BinaryPasses.h"
#include "ParallelUtilities.h" #include "ParallelUtilities.h"
#include "Passes/ReorderAlgorithm.h" #include "Passes/ReorderAlgorithm.h"
@ -20,6 +21,7 @@
#define DEBUG_TYPE "bolt-opts" #define DEBUG_TYPE "bolt-opts"
using namespace llvm; using namespace llvm;
using namespace bolt;
namespace { namespace {
@ -56,9 +58,8 @@ extern cl::OptionCategory BoltOptCategory;
extern cl::opt<bolt::MacroFusionType> AlignMacroOpFusion; extern cl::opt<bolt::MacroFusionType> AlignMacroOpFusion;
extern cl::opt<unsigned> Verbosity; extern cl::opt<unsigned> Verbosity;
extern cl::opt<bool> SplitEH;
extern cl::opt<bool> EnableBAT; extern cl::opt<bool> EnableBAT;
extern cl::opt<bolt::BinaryFunction::SplittingType> SplitFunctions; extern cl::opt<bool> UpdateDebugSections;
extern bool shouldProcess(const bolt::BinaryFunction &Function); extern bool shouldProcess(const bolt::BinaryFunction &Function);
extern bool isHotTextMover(const bolt::BinaryFunction &Function); extern bool isHotTextMover(const bolt::BinaryFunction &Function);
@ -67,12 +68,6 @@ enum DynoStatsSortOrder : char {
Descending Descending
}; };
static cl::opt<bool>
AggressiveSplitting("split-all-cold",
cl::desc("outline as many cold basic blocks as possible"),
cl::ZeroOrMore,
cl::cat(BoltOptCategory));
static cl::opt<DynoStatsSortOrder> static cl::opt<DynoStatsSortOrder>
DynoStatsSortOrderOpt("print-sorted-by-order", DynoStatsSortOrderOpt("print-sorted-by-order",
cl::desc("use ascending or descending order when printing functions " cl::desc("use ascending or descending order when printing functions "
@ -223,27 +218,6 @@ SctcMode("sctc-mode",
cl::ZeroOrMore, cl::ZeroOrMore,
cl::cat(BoltOptCategory)); cl::cat(BoltOptCategory));
static cl::opt<unsigned>
SplitAlignThreshold("split-align-threshold",
cl::desc("when deciding to split a function, apply this alignment "
"while doing the size comparison (see -split-threshold). "
"Default value: 2."),
cl::init(2),
cl::ZeroOrMore,
cl::Hidden,
cl::cat(BoltOptCategory));
static cl::opt<unsigned>
SplitThreshold("split-threshold",
cl::desc("split function only if its main size is reduced by more than "
"given amount of bytes. Default value: 0, i.e. split iff the "
"size is reduced. Note that on some architectures the size can "
"increase after splitting."),
cl::init(0),
cl::ZeroOrMore,
cl::Hidden,
cl::cat(BoltOptCategory));
static cl::opt<unsigned> static cl::opt<unsigned>
TSPThreshold("tsp-threshold", TSPThreshold("tsp-threshold",
cl::desc("maximum number of hot basic blocks in a function for which to use " cl::desc("maximum number of hot basic blocks in a function for which to use "
@ -335,16 +309,10 @@ void ReorderBasicBlocks::runOnFunctions(BinaryContext &BC) {
if (opts::ReorderBlocks == ReorderBasicBlocks::LT_NONE) if (opts::ReorderBlocks == ReorderBasicBlocks::LT_NONE)
return; return;
IsAArch64 = BC.isAArch64();
std::atomic<uint64_t> ModifiedFuncCount{0}; std::atomic<uint64_t> ModifiedFuncCount{0};
ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) { ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
const bool ShouldSplit = modifyFunctionLayout(BF, opts::ReorderBlocks, opts::MinBranchClusters);
(opts::SplitFunctions == BinaryFunction::ST_ALL) ||
(opts::SplitFunctions == BinaryFunction::ST_EH && BF.hasEHRanges()) ||
BF.shouldSplit();
modifyFunctionLayout(BF, opts::ReorderBlocks, opts::MinBranchClusters,
ShouldSplit);
if (BF.hasLayoutChanged()) { if (BF.hasLayoutChanged()) {
++ModifiedFuncCount; ++ModifiedFuncCount;
} }
@ -400,7 +368,7 @@ void ReorderBasicBlocks::runOnFunctions(BinaryContext &BC) {
} }
void ReorderBasicBlocks::modifyFunctionLayout(BinaryFunction &BF, void ReorderBasicBlocks::modifyFunctionLayout(BinaryFunction &BF,
LayoutType Type, bool MinBranchClusters, bool Split) const { LayoutType Type, bool MinBranchClusters) const {
if (BF.size() == 0 || Type == LT_NONE) if (BF.size() == 0 || Type == LT_NONE)
return; return;
@ -455,125 +423,6 @@ void ReorderBasicBlocks::modifyFunctionLayout(BinaryFunction &BF,
Algo->reorderBasicBlocks(BF, NewLayout); Algo->reorderBasicBlocks(BF, NewLayout);
BF.updateBasicBlockLayout(NewLayout); BF.updateBasicBlockLayout(NewLayout);
if (Split)
splitFunction(BF);
}
void ReorderBasicBlocks::splitFunction(BinaryFunction &BF) const {
if (!BF.size())
return;
bool AllCold = true;
for (auto *BB : BF.layout()) {
auto ExecCount = BB->getExecutionCount();
if (ExecCount == BinaryBasicBlock::COUNT_NO_PROFILE)
return;
if (ExecCount != 0)
AllCold = false;
}
if (AllCold)
return;
auto PreSplitLayout = BF.getLayout();
auto &BC = BF.getBinaryContext();
size_t OriginalHotSize;
size_t HotSize;
size_t ColdSize;
if (BC.isX86())
std::tie(OriginalHotSize, ColdSize) = BC.calculateEmittedSize(BF);
DEBUG(dbgs() << "Estimated size for function " << BF << " pre-split is <0x"
<< Twine::utohexstr(OriginalHotSize) << ", 0x"
<< Twine::utohexstr(ColdSize) << ">\n");
// Never outline the first basic block.
BF.layout_front()->setCanOutline(false);
for (auto *BB : BF.layout()) {
if (!BB->canOutline())
continue;
if (BB->getExecutionCount() != 0) {
BB->setCanOutline(false);
continue;
}
// Do not split extra entry points in aarch64. They can be referred by
// using ADRs and when this happens, these blocks cannot be placed far
// away due to the limited range in ADR instruction.
if (IsAArch64 && BB->isEntryPoint()) {
BB->setCanOutline(false);
continue;
}
if (BF.hasEHRanges() && !opts::SplitEH) {
// We cannot move landing pads (or rather entry points for landing
// pads).
if (BB->isLandingPad()) {
BB->setCanOutline(false);
continue;
}
// We cannot move a block that can throw since exception-handling
// runtime cannot deal with split functions. However, if we can guarantee
// that the block never throws, it is safe to move the block to
// decrease the size of the function.
for (auto &Instr : *BB) {
if (BF.getBinaryContext().MIB->isInvoke(Instr)) {
BB->setCanOutline(false);
break;
}
}
}
}
if (opts::AggressiveSplitting) {
// All blocks with 0 count that we can move go to the end of the function.
// Even if they were natural to cluster formation and were seen in-between
// hot basic blocks.
std::stable_sort(BF.layout_begin(), BF.layout_end(),
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
return A->canOutline() < B->canOutline();
});
} else if (BF.hasEHRanges() && !opts::SplitEH) {
// Typically functions with exception handling have landing pads at the end.
// We cannot move beginning of landing pads, but we can move 0-count blocks
// comprising landing pads to the end and thus facilitate splitting.
auto FirstLP = BF.layout_begin();
while ((*FirstLP)->isLandingPad())
++FirstLP;
std::stable_sort(FirstLP, BF.layout_end(),
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
return A->canOutline() < B->canOutline();
});
}
// Separate hot from cold starting from the bottom.
for (auto I = BF.layout_rbegin(), E = BF.layout_rend();
I != E; ++I) {
BinaryBasicBlock *BB = *I;
if (!BB->canOutline())
break;
BB->setIsCold(true);
}
// Check the new size to see if it's worth splitting the function.
if (BC.isX86() && BF.isSplit()) {
std::tie(HotSize, ColdSize) = BC.calculateEmittedSize(BF);
DEBUG(dbgs() << "Estimated size for function " << BF << " post-split is <0x"
<< Twine::utohexstr(HotSize) << ", 0x"
<< Twine::utohexstr(ColdSize) << ">\n");
if (alignTo(OriginalHotSize, opts::SplitAlignThreshold) <=
alignTo(HotSize, opts::SplitAlignThreshold) + opts::SplitThreshold) {
DEBUG(dbgs() << "Reversing splitting of function " << BF << ":\n 0x"
<< Twine::utohexstr(HotSize) << ", 0x"
<< Twine::utohexstr(ColdSize) << " -> 0x"
<< Twine::utohexstr(OriginalHotSize) << '\n');
BF.updateBasicBlockLayout(PreSplitLayout);
for (auto &BB : BF) {
BB.setIsCold(false);
}
}
}
} }
void FixupBranches::runOnFunctions(BinaryContext &BC) { void FixupBranches::runOnFunctions(BinaryContext &BC) {
@ -614,6 +463,37 @@ void FinalizeFunctions::runOnFunctions(BinaryContext &BC) {
SkipPredicate, "FinalizeFunctions"); SkipPredicate, "FinalizeFunctions");
} }
void CheckLargeFunctions::runOnFunctions(BinaryContext &BC) {
if (BC.HasRelocations)
return;
if (!opts::UpdateDebugSections)
return;
// If the function wouldn't fit, mark it as non-simple. Otherwise, we may emit
// incorrect debug info.
ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
uint64_t HotSize, ColdSize;
std::tie(HotSize, ColdSize) =
BC.calculateEmittedSize(BF, /*FixBranches=*/false);
if (HotSize > BF.getMaxSize())
BF.setSimple(false);
};
ParallelUtilities::PredicateTy SkipFunc = [&](const BinaryFunction &BF) {
return !shouldOptimize(BF);
};
ParallelUtilities::runOnEachFunction(
BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
SkipFunc, "CheckLargeFunctions");
}
bool CheckLargeFunctions::shouldOptimize(const BinaryFunction &BF) const {
// Unlike other passes, allow functions in non-CFG state.
return BF.isSimple() && opts::shouldProcess(BF) && BF.getSize();
}
void LowerAnnotations::runOnFunctions(BinaryContext &BC) { void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
std::vector<std::pair<MCInst *, uint64_t>> PreservedSDTAnnotations; std::vector<std::pair<MCInst *, uint64_t>> PreservedSDTAnnotations;
std::vector<std::pair<MCInst *, uint32_t>> PreservedOffsetAnnotations; std::vector<std::pair<MCInst *, uint32_t>> PreservedOffsetAnnotations;

View File

@ -143,15 +143,7 @@ public:
private: private:
void modifyFunctionLayout(BinaryFunction &Function, void modifyFunctionLayout(BinaryFunction &Function,
LayoutType Type, LayoutType Type,
bool MinBranchClusters, bool MinBranchClusters) const;
bool Split) const;
/// Split function in two: a part with warm or hot BBs and a part with never
/// executed BBs. The cold part is moved to a new BinaryFunction.
void splitFunction(BinaryFunction &Function) const;
bool IsAArch64{false};
public: public:
explicit ReorderBasicBlocks(const cl::opt<bool> &PrintPass) explicit ReorderBasicBlocks(const cl::opt<bool> &PrintPass)
: BinaryFunctionPass(PrintPass) { } : BinaryFunctionPass(PrintPass) { }
@ -188,6 +180,22 @@ class FinalizeFunctions : public BinaryFunctionPass {
void runOnFunctions(BinaryContext &BC) override; void runOnFunctions(BinaryContext &BC) override;
}; };
/// Perform any necessary adjustments for functions that do not fit into their
/// original space in non-relocation mode.
class CheckLargeFunctions : public BinaryFunctionPass {
public:
explicit CheckLargeFunctions(const cl::opt<bool> &PrintPass)
: BinaryFunctionPass(PrintPass) { }
const char *getName() const override {
return "check-large-functions";
}
void runOnFunctions(BinaryContext &BC) override;
bool shouldOptimize(const BinaryFunction &BF) const override;
};
/// Convert and remove all BOLT-related annotations before LLVM code emission. /// Convert and remove all BOLT-related annotations before LLVM code emission.
class LowerAnnotations : public BinaryFunctionPass { class LowerAnnotations : public BinaryFunctionPass {
public: public:

View File

@ -28,6 +28,7 @@ add_llvm_library(LLVMBOLTPasses
ReorderFunctions.cpp ReorderFunctions.cpp
ReorderData.cpp ReorderData.cpp
ShrinkWrapping.cpp ShrinkWrapping.cpp
SplitFunctions.cpp
StackAllocationAnalysis.cpp StackAllocationAnalysis.cpp
StackAvailableExpressions.cpp StackAvailableExpressions.cpp
StackPointerTracking.cpp StackPointerTracking.cpp

View File

@ -0,0 +1,230 @@
//===--- SplitFunctions.cpp - pass for splitting function code ------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
#include "BinaryFunction.h"
#include "ParallelUtilities.h"
#include "SplitFunctions.h"
#include "llvm/Support/Options.h"
#include <numeric>
#include <vector>
#define DEBUG_TYPE "bolt-opts"
using namespace llvm;
using namespace bolt;
namespace opts {
extern cl::OptionCategory BoltOptCategory;
extern cl::opt<bool> SplitEH;
static cl::opt<bool>
AggressiveSplitting("split-all-cold",
cl::desc("outline as many cold basic blocks as possible"),
cl::ZeroOrMore,
cl::cat(BoltOptCategory));
static cl::opt<unsigned>
SplitAlignThreshold("split-align-threshold",
cl::desc("when deciding to split a function, apply this alignment "
"while doing the size comparison (see -split-threshold). "
"Default value: 2."),
cl::init(2),
cl::ZeroOrMore,
cl::Hidden,
cl::cat(BoltOptCategory));
static cl::opt<SplitFunctions::SplittingType>
SplitFunctions("split-functions",
cl::desc("split functions into hot and cold regions"),
cl::init(SplitFunctions::ST_NONE),
cl::values(clEnumValN(SplitFunctions::ST_NONE, "0",
"do not split any function"),
clEnumValN(SplitFunctions::ST_LARGE, "1",
"in non-relocation mode only split functions too large "
"to fit into original code space"),
clEnumValN(SplitFunctions::ST_LARGE, "2",
"same as 1 (backwards compatibility)"),
clEnumValN(SplitFunctions::ST_ALL, "3",
"split all functions")),
cl::ZeroOrMore,
cl::cat(BoltOptCategory));
static cl::opt<unsigned>
SplitThreshold("split-threshold",
cl::desc("split function only if its main size is reduced by more than "
"given amount of bytes. Default value: 0, i.e. split iff the "
"size is reduced. Note that on some architectures the size can "
"increase after splitting."),
cl::init(0),
cl::ZeroOrMore,
cl::Hidden,
cl::cat(BoltOptCategory));
void syncOptions(BinaryContext &BC) {
if (!BC.HasRelocations && opts::SplitFunctions == SplitFunctions::ST_LARGE)
opts::SplitFunctions = SplitFunctions::ST_ALL;
}
} // namespace opts
namespace llvm {
namespace bolt {
void SplitFunctions::runOnFunctions(BinaryContext &BC) {
opts::syncOptions(BC);
if (opts::SplitFunctions == SplitFunctions::ST_NONE)
return;
ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
splitFunction(BF);
};
ParallelUtilities::PredicateTy SkipFunc = [&](const BinaryFunction &BF) {
return !shouldOptimize(BF);
};
ParallelUtilities::runOnEachFunction(
BC, ParallelUtilities::SchedulingPolicy::SP_BB_LINEAR, WorkFun, SkipFunc,
"SplitFunctions");
}
void SplitFunctions::splitFunction(BinaryFunction &BF) const {
if (!BF.size())
return;
if (!BF.hasValidProfile())
return;
bool AllCold = true;
for (auto *BB : BF.layout()) {
auto ExecCount = BB->getExecutionCount();
if (ExecCount == BinaryBasicBlock::COUNT_NO_PROFILE)
return;
if (ExecCount != 0)
AllCold = false;
}
if (AllCold)
return;
auto PreSplitLayout = BF.getLayout();
auto &BC = BF.getBinaryContext();
size_t OriginalHotSize;
size_t HotSize;
size_t ColdSize;
if (BC.isX86()) {
std::tie(OriginalHotSize, ColdSize) = BC.calculateEmittedSize(BF);
DEBUG(dbgs() << "Estimated size for function " << BF << " pre-split is <0x"
<< Twine::utohexstr(OriginalHotSize) << ", 0x"
<< Twine::utohexstr(ColdSize) << ">\n");
}
if (opts::SplitFunctions == SplitFunctions::ST_LARGE && !BC.HasRelocations) {
// Split only if the function wouldn't fit.
if (OriginalHotSize <= BF.getMaxSize())
return;
}
// Never outline the first basic block.
BF.layout_front()->setCanOutline(false);
for (auto *BB : BF.layout()) {
if (!BB->canOutline())
continue;
if (BB->getExecutionCount() != 0) {
BB->setCanOutline(false);
continue;
}
// Do not split extra entry points in aarch64. They can be referred by
// using ADRs and when this happens, these blocks cannot be placed far
// away due to the limited range in ADR instruction.
if (BC.isAArch64() && BB->isEntryPoint()) {
BB->setCanOutline(false);
continue;
}
if (BF.hasEHRanges() && !opts::SplitEH) {
// We cannot move landing pads (or rather entry points for landing
// pads).
if (BB->isLandingPad()) {
BB->setCanOutline(false);
continue;
}
// We cannot move a block that can throw since exception-handling
// runtime cannot deal with split functions. However, if we can guarantee
// that the block never throws, it is safe to move the block to
// decrease the size of the function.
for (auto &Instr : *BB) {
if (BF.getBinaryContext().MIB->isInvoke(Instr)) {
BB->setCanOutline(false);
break;
}
}
}
}
if (opts::AggressiveSplitting) {
// All blocks with 0 count that we can move go to the end of the function.
// Even if they were natural to cluster formation and were seen in-between
// hot basic blocks.
std::stable_sort(BF.layout_begin(), BF.layout_end(),
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
return A->canOutline() < B->canOutline();
});
} else if (BF.hasEHRanges() && !opts::SplitEH) {
// Typically functions with exception handling have landing pads at the end.
// We cannot move beginning of landing pads, but we can move 0-count blocks
// comprising landing pads to the end and thus facilitate splitting.
auto FirstLP = BF.layout_begin();
while ((*FirstLP)->isLandingPad())
++FirstLP;
std::stable_sort(FirstLP, BF.layout_end(),
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
return A->canOutline() < B->canOutline();
});
}
// Separate hot from cold starting from the bottom.
for (auto I = BF.layout_rbegin(), E = BF.layout_rend();
I != E; ++I) {
BinaryBasicBlock *BB = *I;
if (!BB->canOutline())
break;
BB->setIsCold(true);
}
// Check the new size to see if it's worth splitting the function.
if (BC.isX86() && BF.isSplit()) {
std::tie(HotSize, ColdSize) = BC.calculateEmittedSize(BF);
DEBUG(dbgs() << "Estimated size for function " << BF << " post-split is <0x"
<< Twine::utohexstr(HotSize) << ", 0x"
<< Twine::utohexstr(ColdSize) << ">\n");
if (alignTo(OriginalHotSize, opts::SplitAlignThreshold) <=
alignTo(HotSize, opts::SplitAlignThreshold) + opts::SplitThreshold) {
DEBUG(dbgs() << "Reversing splitting of function " << BF << ":\n 0x"
<< Twine::utohexstr(HotSize) << ", 0x"
<< Twine::utohexstr(ColdSize) << " -> 0x"
<< Twine::utohexstr(OriginalHotSize) << '\n');
BF.updateBasicBlockLayout(PreSplitLayout);
for (auto &BB : BF) {
BB.setIsCold(false);
}
}
}
}
} // namespace bolt
} // namespace llvm

View File

@ -0,0 +1,52 @@
//===--- SplitFunctions.h - pass for splitting function code --------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_TOOLS_LLVM_BOLT_PASSES_SPLIT_FUNCTIONS_H
#define LLVM_TOOLS_LLVM_BOLT_PASSES_SPLIT_FUNCTIONS_H
#include "BinaryContext.h"
#include "BinaryFunction.h"
#include "Passes/BinaryPasses.h"
#include "llvm/Support/CommandLine.h"
namespace llvm {
namespace bolt {
/// Split function code in multiple parts.
class SplitFunctions : public BinaryFunctionPass {
public:
/// Settings for splitting function bodies into hot/cold partitions.
enum SplittingType : char {
ST_NONE = 0, /// Do not split functions.
ST_LARGE, /// In non-relocation mode, only split functions that
/// are too large to fit into the original space.
ST_ALL, /// Split all functions.
};
private:
/// Split function body into fragments.
void splitFunction(BinaryFunction &Function) const;
public:
explicit SplitFunctions(const cl::opt<bool> &PrintPass)
: BinaryFunctionPass(PrintPass) { }
const char *getName() const override {
return "split-functions";
}
void runOnFunctions(BinaryContext &BC) override;
};
} // namespace bolt
} // namespace llvm
#endif

View File

@ -174,15 +174,6 @@ DumpEHFrame("dump-eh-frame",
cl::Hidden, cl::Hidden,
cl::cat(BoltCategory)); cl::cat(BoltCategory));
static cl::opt<bool>
FixDebugInfoLargeFunctions("fix-debuginfo-large-functions",
cl::init(true),
cl::desc("do another pass if we encounter large functions, to correct their "
"debug info."),
cl::ZeroOrMore,
cl::ReallyHidden,
cl::cat(BoltCategory));
static cl::list<std::string> static cl::list<std::string>
FunctionNames("funcs", FunctionNames("funcs",
cl::CommaSeparated, cl::CommaSeparated,
@ -344,21 +335,6 @@ SkipFunctionNamesFile("skip-funcs-file",
cl::Hidden, cl::Hidden,
cl::cat(BoltCategory)); cl::cat(BoltCategory));
cl::opt<BinaryFunction::SplittingType>
SplitFunctions("split-functions",
cl::desc("split functions into hot and cold regions"),
cl::init(BinaryFunction::ST_NONE),
cl::values(clEnumValN(BinaryFunction::ST_NONE, "0",
"do not split any function"),
clEnumValN(BinaryFunction::ST_EH, "1",
"split all landing pads"),
clEnumValN(BinaryFunction::ST_LARGE, "2",
"also split if function too large to fit"),
clEnumValN(BinaryFunction::ST_ALL, "3",
"split all functions")),
cl::ZeroOrMore,
cl::cat(BoltOptCategory));
cl::opt<bool> cl::opt<bool>
SplitEH("split-eh", SplitEH("split-eh",
cl::desc("split C++ exception handling code"), cl::desc("split C++ exception handling code"),
@ -783,26 +759,6 @@ RewriteInstance::RewriteInstance(ELFObjectFileBase *File, DataReader &DR,
RewriteInstance::~RewriteInstance() {} RewriteInstance::~RewriteInstance() {}
void RewriteInstance::reset() {
FileSymRefs.clear();
auto &DR = BC->DR;
DR.reset();
BC = createBinaryContext(
InputFile, DR,
DWARFContext::create(*InputFile, nullptr,
DWARFContext::defaultErrorHandler, "", false));
BAT = llvm::make_unique<BoltAddressTranslation>(*BC);
CFIRdWrt.reset(nullptr);
OLT.reset(nullptr);
EFMM.reset();
Out.reset(nullptr);
EHFrame = nullptr;
FailedAddresses.clear();
if (opts::UpdateDebugSections) {
DebugInfoRewriter = llvm::make_unique<DWARFRewriter>(*BC, SectionPatchers);
}
}
bool RewriteInstance::shouldDisassemble(const BinaryFunction &BF) const { bool RewriteInstance::shouldDisassemble(const BinaryFunction &BF) const {
// If we have to relocate the code we have to disassemble all functions. // If we have to relocate the code we have to disassemble all functions.
if (!BF.getBinaryContext().HasRelocations && !opts::shouldProcess(BF)) { if (!BF.getBinaryContext().HasRelocations && !opts::shouldProcess(BF)) {
@ -1079,8 +1035,11 @@ void RewriteInstance::run() {
return; return;
} }
auto executeRewritePass = [&](const std::set<uint64_t> &NonSimpleFunctions, outs() << "BOLT-INFO: Target architecture: "
bool ShouldSplit) { << Triple::getArchTypeName(
(llvm::Triple::ArchType)InputFile->getArch())
<< "\n";
discoverStorage(); discoverStorage();
readSpecialSections(); readSpecialSections();
adjustCommandLineOptions(); adjustCommandLineOptions();
@ -1114,58 +1073,15 @@ void RewriteInstance::run() {
return; return;
postProcessFunctions(); postProcessFunctions();
for (uint64_t Address : NonSimpleFunctions) {
auto *BF = BC->getBinaryFunctionAtAddress(Address);
assert(BF && "bad non-simple function address");
if (ShouldSplit)
BF->setLarge(true);
else
BF->setSimple(false);
}
if (opts::DiffOnly) if (opts::DiffOnly)
return; return;
runOptimizationPasses(); runOptimizationPasses();
emitAndLink(); emitAndLink();
};
outs() << "BOLT-INFO: Target architecture: " updateDebugInfo();
<< Triple::getArchTypeName(
(llvm::Triple::ArchType)InputFile->getArch())
<< "\n";
unsigned PassNumber = 1;
executeRewritePass({}, false);
if (opts::AggregateOnly || opts::DiffOnly)
return;
if (opts::SplitFunctions == BinaryFunction::ST_LARGE &&
checkLargeFunctions()) {
++PassNumber;
// Emit again because now some functions have been split
outs() << "BOLT: split-functions: starting pass " << PassNumber << "...\n";
reset();
executeRewritePass(LargeFunctions, true);
}
// Emit functions again ignoring functions which still didn't fit in their
// original space, so that we don't generate incorrect debugging information
// for them (information that would reflect the optimized version).
if (opts::UpdateDebugSections && opts::FixDebugInfoLargeFunctions &&
checkLargeFunctions()) {
++PassNumber;
outs() << format("BOLT: starting pass %zu (ignoring %zu large functions) ",
PassNumber, LargeFunctions.size())
<< "...\n";
reset();
executeRewritePass(LargeFunctions, false);
}
{
NamedRegionTimer T("updateDebugInfo", "update debug info", TimerGroupName,
TimerGroupDesc, opts::TimeRewrite);
if (opts::UpdateDebugSections)
DebugInfoRewriter->updateDebugInfo();
}
if (opts::WriteBoltInfoSection) if (opts::WriteBoltInfoSection)
addBoltInfoSection(); addBoltInfoSection();
@ -3245,6 +3161,15 @@ void RewriteInstance::linkRuntime() {
<< Twine::utohexstr(InstrumentationRuntimeStartAddress) << "\n"; << Twine::utohexstr(InstrumentationRuntimeStartAddress) << "\n";
} }
void RewriteInstance::updateDebugInfo() {
if (!opts::UpdateDebugSections)
return;
NamedRegionTimer T("updateDebugInfo", "update debug info", TimerGroupName,
TimerGroupDesc, opts::TimeRewrite);
DebugInfoRewriter->updateDebugInfo();
}
void RewriteInstance::emitFunctions(MCStreamer *Streamer) { void RewriteInstance::emitFunctions(MCStreamer *Streamer) {
auto emit = [&](const std::vector<BinaryFunction *> &Functions) { auto emit = [&](const std::vector<BinaryFunction *> &Functions) {
for (auto *Function : Functions) { for (auto *Function : Functions) {

View File

@ -51,10 +51,6 @@ public:
StringRef ToolPath); StringRef ToolPath);
~RewriteInstance(); ~RewriteInstance();
/// Reset all state except for split hints. Used to run a second pass with
/// function splitting information.
void reset();
/// Run all the necessary steps to read, optimize and rewrite the binary. /// Run all the necessary steps to read, optimize and rewrite the binary.
void run(); void run();