forked from OSchip/llvm-project
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary: This diff applies to non-relocation mode mostly. In this mode, we are limited by original function boundaries, i.e. if a function becomes larger after optimizations (e.g. because of the newly introduced branches) then we might not be able to write the optimized version, unless we split the function. At the same time, we do not benefit from function splitting as we do in the relocation mode since we are not moving functions/fragments, and the hot code does not become more compact. For the reasons described above, we used to execute multiple re-write attempts to optimize the binary and we would only split functions that were too large to fit into their original space. After the first attempt, we would know functions that did not fit into their original space. Then we would re-run all our passes again feeding back the function information and forcefully splitting such functions. Some functions still wouldn't fit even after the splitting (mostly because of the branch relaxation for conditional tail calls that does not happen in non-relocation mode). Yet we have emitted debug info as if they were successfully overwritten. That's why we had one more stage to write the functions again, marking failed-to-emit functions non-simple. Sadly, there was a bug in the way 2nd and 3rd attempts interacted, and we were not splitting the functions correctly and as a result we were emitting less optimized code. One of the reasons we had the multi-pass rewrite scheme in place, was that we did not have an ability to precisely estimate the code size before the actual code emission. Recently, BinaryContext obtained such functionality, and now we can use it instead of relying on the multi-pass rewrite. This eliminates redundant work of re-running the same function passes multiple times. Because function splitting runs before a number of optimization passes that run on post-CFG state (those rely on the splitting pass), we cannot estimate the non-split code size with 100% accuracy. However, it is good enough for over 99% of the cases to extract most of the performance gains for the binary. As a result of eliminating the multi-pass rewrite, the processing time in non-relocation mode with `-split-functions=2` is greatly reduced. With debug info update, it is less than half of what it used to be. New semantics for `-split-functions=<n>`: -split-functions - split functions into hot and cold regions =0 - do not split any function =1 - in non-relocation mode only split functions too large to fit into original code space =2 - same as 1 (backwards compatibility) =3 - split all functions (cherry picked from FBD17362607)
This commit is contained in:
parent
615a318b60
commit
e9c6c73bb8
|
@ -1734,9 +1734,10 @@ BinaryContext::createInjectedBinaryFunction(const std::string &Name,
|
|||
}
|
||||
|
||||
std::pair<size_t, size_t>
|
||||
BinaryContext::calculateEmittedSize(BinaryFunction &BF) {
|
||||
BinaryContext::calculateEmittedSize(BinaryFunction &BF, bool FixBranches) {
|
||||
// Adjust branch instruction to match the current layout.
|
||||
BF.fixBranches();
|
||||
if (FixBranches)
|
||||
BF.fixBranches();
|
||||
|
||||
// Create local MC context to isolate the effect of ephemeral code emission.
|
||||
auto MCEInstance = createIndependentMCCodeEmitter();
|
||||
|
|
|
@ -942,11 +942,14 @@ public:
|
|||
std::vector<BinaryFunction *> getSortedFunctions();
|
||||
|
||||
/// Do the best effort to calculate the size of the function by emitting
|
||||
/// its code, and relaxing branch instructions.
|
||||
/// its code, and relaxing branch instructions. By default, branch
|
||||
/// instructions are updated to match the layout. Pass \p FixBranches set to
|
||||
/// false if the branches are known to be up to date with the code layout.
|
||||
///
|
||||
/// Return the pair where the first size is for the main part, and the second
|
||||
/// size is for the cold one.
|
||||
std::pair<size_t, size_t> calculateEmittedSize(BinaryFunction &BF);
|
||||
std::pair<size_t, size_t>
|
||||
calculateEmittedSize(BinaryFunction &BF, bool FixBranches = true);
|
||||
|
||||
/// Calculate the size of the instruction \p Inst optionally using a
|
||||
/// user-supplied emitter for lock-free multi-thread work. MCCodeEmitter is
|
||||
|
|
|
@ -134,15 +134,6 @@ public:
|
|||
PF_MEMEVENT = 4, /// Profile has mem events.
|
||||
};
|
||||
|
||||
/// Settings for splitting function bodies into hot/cold partitions.
|
||||
enum SplittingType : char {
|
||||
ST_NONE = 0, /// Do not split functions
|
||||
ST_EH, /// Split blocks comprising landing pads
|
||||
ST_LARGE, /// Split functions that exceed maximum size in addition
|
||||
/// to landing pads.
|
||||
ST_ALL, /// Split all functions
|
||||
};
|
||||
|
||||
static constexpr uint64_t COUNT_NO_PROFILE =
|
||||
BinaryBasicBlock::COUNT_NO_PROFILE;
|
||||
|
||||
|
@ -252,9 +243,6 @@ private:
|
|||
/// destination.
|
||||
bool HasFixedIndirectBranch{false};
|
||||
|
||||
/// Is the function known to exceed its input size?
|
||||
bool IsLarge{false};
|
||||
|
||||
/// True if the function is a fragment of another function. This means that
|
||||
/// this function could only be entered via its parent or one of its sibling
|
||||
/// fragments. It could be entered at any basic block. It can also return
|
||||
|
@ -1263,14 +1251,10 @@ public:
|
|||
return HasUnknownControlFlow;
|
||||
}
|
||||
|
||||
/// Return true if the function should be split for the output.
|
||||
bool shouldSplit() const {
|
||||
return IsLarge && !getBinaryContext().HasRelocations;
|
||||
}
|
||||
|
||||
/// Return true if the function body is non-contiguous.
|
||||
bool isSplit() const {
|
||||
return layout_size() &&
|
||||
return isSimple() &&
|
||||
layout_size() &&
|
||||
layout_front()->isCold() != layout_back()->isCold();
|
||||
}
|
||||
|
||||
|
@ -1654,11 +1638,6 @@ public:
|
|||
return *this;
|
||||
}
|
||||
|
||||
BinaryFunction &setLarge(bool Large) {
|
||||
IsLarge = Large;
|
||||
return *this;
|
||||
}
|
||||
|
||||
BinaryFunction &setUsesGnuArgsSize(bool Uses = true) {
|
||||
UsesGnuArgsSize = Uses;
|
||||
return *this;
|
||||
|
|
|
@ -22,6 +22,7 @@
|
|||
#include "Passes/RegReAssign.h"
|
||||
#include "Passes/ReorderFunctions.h"
|
||||
#include "Passes/ReorderData.h"
|
||||
#include "Passes/SplitFunctions.h"
|
||||
#include "Passes/StokeInfo.h"
|
||||
#include "Passes/RetpolineInsertion.h"
|
||||
#include "Passes/ValidateInternalCalls.h"
|
||||
|
@ -193,6 +194,13 @@ PrintSimplifyROLoads("print-simplify-rodata-loads",
|
|||
cl::Hidden,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<bool>
|
||||
PrintSplit("print-split",
|
||||
cl::desc("print functions after code splitting"),
|
||||
cl::ZeroOrMore,
|
||||
cl::Hidden,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<bool>
|
||||
PrintUCE("print-uce",
|
||||
cl::desc("print functions after unreachable code elimination"),
|
||||
|
@ -428,6 +436,8 @@ void BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
|
|||
llvm::make_unique<EliminateUnreachableBlocks>(PrintUCE),
|
||||
opts::EliminateUnreachable);
|
||||
|
||||
Manager.registerPass(llvm::make_unique<SplitFunctions>(PrintSplit));
|
||||
|
||||
// This pass syncs local branches with CFG. If any of the following
|
||||
// passes breaks the sync - they either need to re-run the pass or
|
||||
// fix branches consistency internally.
|
||||
|
@ -497,6 +507,12 @@ void BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
|
|||
Manager.registerPass(
|
||||
llvm::make_unique<InstructionLowering>(PrintAfterLowering));
|
||||
|
||||
// In non-relocation mode, mark functions that do not fit into their original
|
||||
// space as non-simple if we have to (e.g. for correct debug info update).
|
||||
// NOTE: this pass depends on finalized code.
|
||||
if (!BC.HasRelocations)
|
||||
Manager.registerPass(llvm::make_unique<CheckLargeFunctions>(NeverPrint));
|
||||
|
||||
Manager.registerPass(llvm::make_unique<LowerAnnotations>(NeverPrint));
|
||||
|
||||
Manager.runPasses();
|
||||
|
|
|
@ -9,6 +9,7 @@
|
|||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#include "BinaryFunction.h"
|
||||
#include "BinaryPasses.h"
|
||||
#include "ParallelUtilities.h"
|
||||
#include "Passes/ReorderAlgorithm.h"
|
||||
|
@ -20,6 +21,7 @@
|
|||
#define DEBUG_TYPE "bolt-opts"
|
||||
|
||||
using namespace llvm;
|
||||
using namespace bolt;
|
||||
|
||||
namespace {
|
||||
|
||||
|
@ -56,9 +58,8 @@ extern cl::OptionCategory BoltOptCategory;
|
|||
|
||||
extern cl::opt<bolt::MacroFusionType> AlignMacroOpFusion;
|
||||
extern cl::opt<unsigned> Verbosity;
|
||||
extern cl::opt<bool> SplitEH;
|
||||
extern cl::opt<bool> EnableBAT;
|
||||
extern cl::opt<bolt::BinaryFunction::SplittingType> SplitFunctions;
|
||||
extern cl::opt<bool> UpdateDebugSections;
|
||||
extern bool shouldProcess(const bolt::BinaryFunction &Function);
|
||||
extern bool isHotTextMover(const bolt::BinaryFunction &Function);
|
||||
|
||||
|
@ -67,12 +68,6 @@ enum DynoStatsSortOrder : char {
|
|||
Descending
|
||||
};
|
||||
|
||||
static cl::opt<bool>
|
||||
AggressiveSplitting("split-all-cold",
|
||||
cl::desc("outline as many cold basic blocks as possible"),
|
||||
cl::ZeroOrMore,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<DynoStatsSortOrder>
|
||||
DynoStatsSortOrderOpt("print-sorted-by-order",
|
||||
cl::desc("use ascending or descending order when printing functions "
|
||||
|
@ -223,27 +218,6 @@ SctcMode("sctc-mode",
|
|||
cl::ZeroOrMore,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<unsigned>
|
||||
SplitAlignThreshold("split-align-threshold",
|
||||
cl::desc("when deciding to split a function, apply this alignment "
|
||||
"while doing the size comparison (see -split-threshold). "
|
||||
"Default value: 2."),
|
||||
cl::init(2),
|
||||
cl::ZeroOrMore,
|
||||
cl::Hidden,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<unsigned>
|
||||
SplitThreshold("split-threshold",
|
||||
cl::desc("split function only if its main size is reduced by more than "
|
||||
"given amount of bytes. Default value: 0, i.e. split iff the "
|
||||
"size is reduced. Note that on some architectures the size can "
|
||||
"increase after splitting."),
|
||||
cl::init(0),
|
||||
cl::ZeroOrMore,
|
||||
cl::Hidden,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<unsigned>
|
||||
TSPThreshold("tsp-threshold",
|
||||
cl::desc("maximum number of hot basic blocks in a function for which to use "
|
||||
|
@ -335,16 +309,10 @@ void ReorderBasicBlocks::runOnFunctions(BinaryContext &BC) {
|
|||
if (opts::ReorderBlocks == ReorderBasicBlocks::LT_NONE)
|
||||
return;
|
||||
|
||||
IsAArch64 = BC.isAArch64();
|
||||
std::atomic<uint64_t> ModifiedFuncCount{0};
|
||||
|
||||
ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
|
||||
const bool ShouldSplit =
|
||||
(opts::SplitFunctions == BinaryFunction::ST_ALL) ||
|
||||
(opts::SplitFunctions == BinaryFunction::ST_EH && BF.hasEHRanges()) ||
|
||||
BF.shouldSplit();
|
||||
modifyFunctionLayout(BF, opts::ReorderBlocks, opts::MinBranchClusters,
|
||||
ShouldSplit);
|
||||
modifyFunctionLayout(BF, opts::ReorderBlocks, opts::MinBranchClusters);
|
||||
if (BF.hasLayoutChanged()) {
|
||||
++ModifiedFuncCount;
|
||||
}
|
||||
|
@ -400,7 +368,7 @@ void ReorderBasicBlocks::runOnFunctions(BinaryContext &BC) {
|
|||
}
|
||||
|
||||
void ReorderBasicBlocks::modifyFunctionLayout(BinaryFunction &BF,
|
||||
LayoutType Type, bool MinBranchClusters, bool Split) const {
|
||||
LayoutType Type, bool MinBranchClusters) const {
|
||||
if (BF.size() == 0 || Type == LT_NONE)
|
||||
return;
|
||||
|
||||
|
@ -455,125 +423,6 @@ void ReorderBasicBlocks::modifyFunctionLayout(BinaryFunction &BF,
|
|||
Algo->reorderBasicBlocks(BF, NewLayout);
|
||||
|
||||
BF.updateBasicBlockLayout(NewLayout);
|
||||
|
||||
if (Split)
|
||||
splitFunction(BF);
|
||||
}
|
||||
|
||||
void ReorderBasicBlocks::splitFunction(BinaryFunction &BF) const {
|
||||
if (!BF.size())
|
||||
return;
|
||||
|
||||
bool AllCold = true;
|
||||
for (auto *BB : BF.layout()) {
|
||||
auto ExecCount = BB->getExecutionCount();
|
||||
if (ExecCount == BinaryBasicBlock::COUNT_NO_PROFILE)
|
||||
return;
|
||||
if (ExecCount != 0)
|
||||
AllCold = false;
|
||||
}
|
||||
|
||||
if (AllCold)
|
||||
return;
|
||||
|
||||
auto PreSplitLayout = BF.getLayout();
|
||||
|
||||
auto &BC = BF.getBinaryContext();
|
||||
size_t OriginalHotSize;
|
||||
size_t HotSize;
|
||||
size_t ColdSize;
|
||||
if (BC.isX86())
|
||||
std::tie(OriginalHotSize, ColdSize) = BC.calculateEmittedSize(BF);
|
||||
DEBUG(dbgs() << "Estimated size for function " << BF << " pre-split is <0x"
|
||||
<< Twine::utohexstr(OriginalHotSize) << ", 0x"
|
||||
<< Twine::utohexstr(ColdSize) << ">\n");
|
||||
|
||||
// Never outline the first basic block.
|
||||
BF.layout_front()->setCanOutline(false);
|
||||
for (auto *BB : BF.layout()) {
|
||||
if (!BB->canOutline())
|
||||
continue;
|
||||
if (BB->getExecutionCount() != 0) {
|
||||
BB->setCanOutline(false);
|
||||
continue;
|
||||
}
|
||||
// Do not split extra entry points in aarch64. They can be referred by
|
||||
// using ADRs and when this happens, these blocks cannot be placed far
|
||||
// away due to the limited range in ADR instruction.
|
||||
if (IsAArch64 && BB->isEntryPoint()) {
|
||||
BB->setCanOutline(false);
|
||||
continue;
|
||||
}
|
||||
if (BF.hasEHRanges() && !opts::SplitEH) {
|
||||
// We cannot move landing pads (or rather entry points for landing
|
||||
// pads).
|
||||
if (BB->isLandingPad()) {
|
||||
BB->setCanOutline(false);
|
||||
continue;
|
||||
}
|
||||
// We cannot move a block that can throw since exception-handling
|
||||
// runtime cannot deal with split functions. However, if we can guarantee
|
||||
// that the block never throws, it is safe to move the block to
|
||||
// decrease the size of the function.
|
||||
for (auto &Instr : *BB) {
|
||||
if (BF.getBinaryContext().MIB->isInvoke(Instr)) {
|
||||
BB->setCanOutline(false);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (opts::AggressiveSplitting) {
|
||||
// All blocks with 0 count that we can move go to the end of the function.
|
||||
// Even if they were natural to cluster formation and were seen in-between
|
||||
// hot basic blocks.
|
||||
std::stable_sort(BF.layout_begin(), BF.layout_end(),
|
||||
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
|
||||
return A->canOutline() < B->canOutline();
|
||||
});
|
||||
} else if (BF.hasEHRanges() && !opts::SplitEH) {
|
||||
// Typically functions with exception handling have landing pads at the end.
|
||||
// We cannot move beginning of landing pads, but we can move 0-count blocks
|
||||
// comprising landing pads to the end and thus facilitate splitting.
|
||||
auto FirstLP = BF.layout_begin();
|
||||
while ((*FirstLP)->isLandingPad())
|
||||
++FirstLP;
|
||||
|
||||
std::stable_sort(FirstLP, BF.layout_end(),
|
||||
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
|
||||
return A->canOutline() < B->canOutline();
|
||||
});
|
||||
}
|
||||
|
||||
// Separate hot from cold starting from the bottom.
|
||||
for (auto I = BF.layout_rbegin(), E = BF.layout_rend();
|
||||
I != E; ++I) {
|
||||
BinaryBasicBlock *BB = *I;
|
||||
if (!BB->canOutline())
|
||||
break;
|
||||
BB->setIsCold(true);
|
||||
}
|
||||
|
||||
// Check the new size to see if it's worth splitting the function.
|
||||
if (BC.isX86() && BF.isSplit()) {
|
||||
std::tie(HotSize, ColdSize) = BC.calculateEmittedSize(BF);
|
||||
DEBUG(dbgs() << "Estimated size for function " << BF << " post-split is <0x"
|
||||
<< Twine::utohexstr(HotSize) << ", 0x"
|
||||
<< Twine::utohexstr(ColdSize) << ">\n");
|
||||
if (alignTo(OriginalHotSize, opts::SplitAlignThreshold) <=
|
||||
alignTo(HotSize, opts::SplitAlignThreshold) + opts::SplitThreshold) {
|
||||
DEBUG(dbgs() << "Reversing splitting of function " << BF << ":\n 0x"
|
||||
<< Twine::utohexstr(HotSize) << ", 0x"
|
||||
<< Twine::utohexstr(ColdSize) << " -> 0x"
|
||||
<< Twine::utohexstr(OriginalHotSize) << '\n');
|
||||
|
||||
BF.updateBasicBlockLayout(PreSplitLayout);
|
||||
for (auto &BB : BF) {
|
||||
BB.setIsCold(false);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void FixupBranches::runOnFunctions(BinaryContext &BC) {
|
||||
|
@ -614,6 +463,37 @@ void FinalizeFunctions::runOnFunctions(BinaryContext &BC) {
|
|||
SkipPredicate, "FinalizeFunctions");
|
||||
}
|
||||
|
||||
void CheckLargeFunctions::runOnFunctions(BinaryContext &BC) {
|
||||
if (BC.HasRelocations)
|
||||
return;
|
||||
|
||||
if (!opts::UpdateDebugSections)
|
||||
return;
|
||||
|
||||
// If the function wouldn't fit, mark it as non-simple. Otherwise, we may emit
|
||||
// incorrect debug info.
|
||||
ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
|
||||
uint64_t HotSize, ColdSize;
|
||||
std::tie(HotSize, ColdSize) =
|
||||
BC.calculateEmittedSize(BF, /*FixBranches=*/false);
|
||||
if (HotSize > BF.getMaxSize())
|
||||
BF.setSimple(false);
|
||||
};
|
||||
|
||||
ParallelUtilities::PredicateTy SkipFunc = [&](const BinaryFunction &BF) {
|
||||
return !shouldOptimize(BF);
|
||||
};
|
||||
|
||||
ParallelUtilities::runOnEachFunction(
|
||||
BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
|
||||
SkipFunc, "CheckLargeFunctions");
|
||||
}
|
||||
|
||||
bool CheckLargeFunctions::shouldOptimize(const BinaryFunction &BF) const {
|
||||
// Unlike other passes, allow functions in non-CFG state.
|
||||
return BF.isSimple() && opts::shouldProcess(BF) && BF.getSize();
|
||||
}
|
||||
|
||||
void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
|
||||
std::vector<std::pair<MCInst *, uint64_t>> PreservedSDTAnnotations;
|
||||
std::vector<std::pair<MCInst *, uint32_t>> PreservedOffsetAnnotations;
|
||||
|
|
|
@ -143,15 +143,7 @@ public:
|
|||
private:
|
||||
void modifyFunctionLayout(BinaryFunction &Function,
|
||||
LayoutType Type,
|
||||
bool MinBranchClusters,
|
||||
bool Split) const;
|
||||
|
||||
/// Split function in two: a part with warm or hot BBs and a part with never
|
||||
/// executed BBs. The cold part is moved to a new BinaryFunction.
|
||||
void splitFunction(BinaryFunction &Function) const;
|
||||
|
||||
bool IsAArch64{false};
|
||||
|
||||
bool MinBranchClusters) const;
|
||||
public:
|
||||
explicit ReorderBasicBlocks(const cl::opt<bool> &PrintPass)
|
||||
: BinaryFunctionPass(PrintPass) { }
|
||||
|
@ -188,6 +180,22 @@ class FinalizeFunctions : public BinaryFunctionPass {
|
|||
void runOnFunctions(BinaryContext &BC) override;
|
||||
};
|
||||
|
||||
/// Perform any necessary adjustments for functions that do not fit into their
|
||||
/// original space in non-relocation mode.
|
||||
class CheckLargeFunctions : public BinaryFunctionPass {
|
||||
public:
|
||||
explicit CheckLargeFunctions(const cl::opt<bool> &PrintPass)
|
||||
: BinaryFunctionPass(PrintPass) { }
|
||||
|
||||
const char *getName() const override {
|
||||
return "check-large-functions";
|
||||
}
|
||||
|
||||
void runOnFunctions(BinaryContext &BC) override;
|
||||
|
||||
bool shouldOptimize(const BinaryFunction &BF) const override;
|
||||
};
|
||||
|
||||
/// Convert and remove all BOLT-related annotations before LLVM code emission.
|
||||
class LowerAnnotations : public BinaryFunctionPass {
|
||||
public:
|
||||
|
|
|
@ -28,6 +28,7 @@ add_llvm_library(LLVMBOLTPasses
|
|||
ReorderFunctions.cpp
|
||||
ReorderData.cpp
|
||||
ShrinkWrapping.cpp
|
||||
SplitFunctions.cpp
|
||||
StackAllocationAnalysis.cpp
|
||||
StackAvailableExpressions.cpp
|
||||
StackPointerTracking.cpp
|
||||
|
|
|
@ -0,0 +1,230 @@
|
|||
//===--- SplitFunctions.cpp - pass for splitting function code ------------===//
|
||||
//
|
||||
// The LLVM Compiler Infrastructure
|
||||
//
|
||||
// This file is distributed under the University of Illinois Open Source
|
||||
// License. See LICENSE.TXT for details.
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#include "BinaryFunction.h"
|
||||
#include "ParallelUtilities.h"
|
||||
#include "SplitFunctions.h"
|
||||
#include "llvm/Support/Options.h"
|
||||
|
||||
#include <numeric>
|
||||
#include <vector>
|
||||
|
||||
#define DEBUG_TYPE "bolt-opts"
|
||||
|
||||
using namespace llvm;
|
||||
using namespace bolt;
|
||||
|
||||
namespace opts {
|
||||
|
||||
extern cl::OptionCategory BoltOptCategory;
|
||||
|
||||
extern cl::opt<bool> SplitEH;
|
||||
|
||||
static cl::opt<bool>
|
||||
AggressiveSplitting("split-all-cold",
|
||||
cl::desc("outline as many cold basic blocks as possible"),
|
||||
cl::ZeroOrMore,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<unsigned>
|
||||
SplitAlignThreshold("split-align-threshold",
|
||||
cl::desc("when deciding to split a function, apply this alignment "
|
||||
"while doing the size comparison (see -split-threshold). "
|
||||
"Default value: 2."),
|
||||
cl::init(2),
|
||||
cl::ZeroOrMore,
|
||||
cl::Hidden,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<SplitFunctions::SplittingType>
|
||||
SplitFunctions("split-functions",
|
||||
cl::desc("split functions into hot and cold regions"),
|
||||
cl::init(SplitFunctions::ST_NONE),
|
||||
cl::values(clEnumValN(SplitFunctions::ST_NONE, "0",
|
||||
"do not split any function"),
|
||||
clEnumValN(SplitFunctions::ST_LARGE, "1",
|
||||
"in non-relocation mode only split functions too large "
|
||||
"to fit into original code space"),
|
||||
clEnumValN(SplitFunctions::ST_LARGE, "2",
|
||||
"same as 1 (backwards compatibility)"),
|
||||
clEnumValN(SplitFunctions::ST_ALL, "3",
|
||||
"split all functions")),
|
||||
cl::ZeroOrMore,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
static cl::opt<unsigned>
|
||||
SplitThreshold("split-threshold",
|
||||
cl::desc("split function only if its main size is reduced by more than "
|
||||
"given amount of bytes. Default value: 0, i.e. split iff the "
|
||||
"size is reduced. Note that on some architectures the size can "
|
||||
"increase after splitting."),
|
||||
cl::init(0),
|
||||
cl::ZeroOrMore,
|
||||
cl::Hidden,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
void syncOptions(BinaryContext &BC) {
|
||||
if (!BC.HasRelocations && opts::SplitFunctions == SplitFunctions::ST_LARGE)
|
||||
opts::SplitFunctions = SplitFunctions::ST_ALL;
|
||||
}
|
||||
|
||||
} // namespace opts
|
||||
|
||||
namespace llvm {
|
||||
namespace bolt {
|
||||
|
||||
void SplitFunctions::runOnFunctions(BinaryContext &BC) {
|
||||
opts::syncOptions(BC);
|
||||
|
||||
if (opts::SplitFunctions == SplitFunctions::ST_NONE)
|
||||
return;
|
||||
|
||||
ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
|
||||
splitFunction(BF);
|
||||
};
|
||||
|
||||
ParallelUtilities::PredicateTy SkipFunc = [&](const BinaryFunction &BF) {
|
||||
return !shouldOptimize(BF);
|
||||
};
|
||||
|
||||
ParallelUtilities::runOnEachFunction(
|
||||
BC, ParallelUtilities::SchedulingPolicy::SP_BB_LINEAR, WorkFun, SkipFunc,
|
||||
"SplitFunctions");
|
||||
}
|
||||
|
||||
void SplitFunctions::splitFunction(BinaryFunction &BF) const {
|
||||
if (!BF.size())
|
||||
return;
|
||||
|
||||
if (!BF.hasValidProfile())
|
||||
return;
|
||||
|
||||
bool AllCold = true;
|
||||
for (auto *BB : BF.layout()) {
|
||||
auto ExecCount = BB->getExecutionCount();
|
||||
if (ExecCount == BinaryBasicBlock::COUNT_NO_PROFILE)
|
||||
return;
|
||||
if (ExecCount != 0)
|
||||
AllCold = false;
|
||||
}
|
||||
|
||||
if (AllCold)
|
||||
return;
|
||||
|
||||
auto PreSplitLayout = BF.getLayout();
|
||||
|
||||
auto &BC = BF.getBinaryContext();
|
||||
size_t OriginalHotSize;
|
||||
size_t HotSize;
|
||||
size_t ColdSize;
|
||||
if (BC.isX86()) {
|
||||
std::tie(OriginalHotSize, ColdSize) = BC.calculateEmittedSize(BF);
|
||||
DEBUG(dbgs() << "Estimated size for function " << BF << " pre-split is <0x"
|
||||
<< Twine::utohexstr(OriginalHotSize) << ", 0x"
|
||||
<< Twine::utohexstr(ColdSize) << ">\n");
|
||||
}
|
||||
|
||||
if (opts::SplitFunctions == SplitFunctions::ST_LARGE && !BC.HasRelocations) {
|
||||
// Split only if the function wouldn't fit.
|
||||
if (OriginalHotSize <= BF.getMaxSize())
|
||||
return;
|
||||
}
|
||||
|
||||
// Never outline the first basic block.
|
||||
BF.layout_front()->setCanOutline(false);
|
||||
for (auto *BB : BF.layout()) {
|
||||
if (!BB->canOutline())
|
||||
continue;
|
||||
if (BB->getExecutionCount() != 0) {
|
||||
BB->setCanOutline(false);
|
||||
continue;
|
||||
}
|
||||
// Do not split extra entry points in aarch64. They can be referred by
|
||||
// using ADRs and when this happens, these blocks cannot be placed far
|
||||
// away due to the limited range in ADR instruction.
|
||||
if (BC.isAArch64() && BB->isEntryPoint()) {
|
||||
BB->setCanOutline(false);
|
||||
continue;
|
||||
}
|
||||
if (BF.hasEHRanges() && !opts::SplitEH) {
|
||||
// We cannot move landing pads (or rather entry points for landing
|
||||
// pads).
|
||||
if (BB->isLandingPad()) {
|
||||
BB->setCanOutline(false);
|
||||
continue;
|
||||
}
|
||||
// We cannot move a block that can throw since exception-handling
|
||||
// runtime cannot deal with split functions. However, if we can guarantee
|
||||
// that the block never throws, it is safe to move the block to
|
||||
// decrease the size of the function.
|
||||
for (auto &Instr : *BB) {
|
||||
if (BF.getBinaryContext().MIB->isInvoke(Instr)) {
|
||||
BB->setCanOutline(false);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (opts::AggressiveSplitting) {
|
||||
// All blocks with 0 count that we can move go to the end of the function.
|
||||
// Even if they were natural to cluster formation and were seen in-between
|
||||
// hot basic blocks.
|
||||
std::stable_sort(BF.layout_begin(), BF.layout_end(),
|
||||
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
|
||||
return A->canOutline() < B->canOutline();
|
||||
});
|
||||
} else if (BF.hasEHRanges() && !opts::SplitEH) {
|
||||
// Typically functions with exception handling have landing pads at the end.
|
||||
// We cannot move beginning of landing pads, but we can move 0-count blocks
|
||||
// comprising landing pads to the end and thus facilitate splitting.
|
||||
auto FirstLP = BF.layout_begin();
|
||||
while ((*FirstLP)->isLandingPad())
|
||||
++FirstLP;
|
||||
|
||||
std::stable_sort(FirstLP, BF.layout_end(),
|
||||
[&] (BinaryBasicBlock *A, BinaryBasicBlock *B) {
|
||||
return A->canOutline() < B->canOutline();
|
||||
});
|
||||
}
|
||||
|
||||
// Separate hot from cold starting from the bottom.
|
||||
for (auto I = BF.layout_rbegin(), E = BF.layout_rend();
|
||||
I != E; ++I) {
|
||||
BinaryBasicBlock *BB = *I;
|
||||
if (!BB->canOutline())
|
||||
break;
|
||||
BB->setIsCold(true);
|
||||
}
|
||||
|
||||
// Check the new size to see if it's worth splitting the function.
|
||||
if (BC.isX86() && BF.isSplit()) {
|
||||
std::tie(HotSize, ColdSize) = BC.calculateEmittedSize(BF);
|
||||
DEBUG(dbgs() << "Estimated size for function " << BF << " post-split is <0x"
|
||||
<< Twine::utohexstr(HotSize) << ", 0x"
|
||||
<< Twine::utohexstr(ColdSize) << ">\n");
|
||||
if (alignTo(OriginalHotSize, opts::SplitAlignThreshold) <=
|
||||
alignTo(HotSize, opts::SplitAlignThreshold) + opts::SplitThreshold) {
|
||||
DEBUG(dbgs() << "Reversing splitting of function " << BF << ":\n 0x"
|
||||
<< Twine::utohexstr(HotSize) << ", 0x"
|
||||
<< Twine::utohexstr(ColdSize) << " -> 0x"
|
||||
<< Twine::utohexstr(OriginalHotSize) << '\n');
|
||||
|
||||
BF.updateBasicBlockLayout(PreSplitLayout);
|
||||
for (auto &BB : BF) {
|
||||
BB.setIsCold(false);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace bolt
|
||||
} // namespace llvm
|
|
@ -0,0 +1,52 @@
|
|||
//===--- SplitFunctions.h - pass for splitting function code --------------===//
|
||||
//
|
||||
// The LLVM Compiler Infrastructure
|
||||
//
|
||||
// This file is distributed under the University of Illinois Open Source
|
||||
// License. See LICENSE.TXT for details.
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#ifndef LLVM_TOOLS_LLVM_BOLT_PASSES_SPLIT_FUNCTIONS_H
|
||||
#define LLVM_TOOLS_LLVM_BOLT_PASSES_SPLIT_FUNCTIONS_H
|
||||
|
||||
#include "BinaryContext.h"
|
||||
#include "BinaryFunction.h"
|
||||
#include "Passes/BinaryPasses.h"
|
||||
#include "llvm/Support/CommandLine.h"
|
||||
|
||||
namespace llvm {
|
||||
namespace bolt {
|
||||
|
||||
/// Split function code in multiple parts.
|
||||
class SplitFunctions : public BinaryFunctionPass {
|
||||
public:
|
||||
/// Settings for splitting function bodies into hot/cold partitions.
|
||||
enum SplittingType : char {
|
||||
ST_NONE = 0, /// Do not split functions.
|
||||
ST_LARGE, /// In non-relocation mode, only split functions that
|
||||
/// are too large to fit into the original space.
|
||||
ST_ALL, /// Split all functions.
|
||||
};
|
||||
|
||||
private:
|
||||
/// Split function body into fragments.
|
||||
void splitFunction(BinaryFunction &Function) const;
|
||||
|
||||
public:
|
||||
explicit SplitFunctions(const cl::opt<bool> &PrintPass)
|
||||
: BinaryFunctionPass(PrintPass) { }
|
||||
|
||||
const char *getName() const override {
|
||||
return "split-functions";
|
||||
}
|
||||
|
||||
void runOnFunctions(BinaryContext &BC) override;
|
||||
};
|
||||
|
||||
} // namespace bolt
|
||||
} // namespace llvm
|
||||
|
||||
#endif
|
|
@ -174,15 +174,6 @@ DumpEHFrame("dump-eh-frame",
|
|||
cl::Hidden,
|
||||
cl::cat(BoltCategory));
|
||||
|
||||
static cl::opt<bool>
|
||||
FixDebugInfoLargeFunctions("fix-debuginfo-large-functions",
|
||||
cl::init(true),
|
||||
cl::desc("do another pass if we encounter large functions, to correct their "
|
||||
"debug info."),
|
||||
cl::ZeroOrMore,
|
||||
cl::ReallyHidden,
|
||||
cl::cat(BoltCategory));
|
||||
|
||||
static cl::list<std::string>
|
||||
FunctionNames("funcs",
|
||||
cl::CommaSeparated,
|
||||
|
@ -344,21 +335,6 @@ SkipFunctionNamesFile("skip-funcs-file",
|
|||
cl::Hidden,
|
||||
cl::cat(BoltCategory));
|
||||
|
||||
cl::opt<BinaryFunction::SplittingType>
|
||||
SplitFunctions("split-functions",
|
||||
cl::desc("split functions into hot and cold regions"),
|
||||
cl::init(BinaryFunction::ST_NONE),
|
||||
cl::values(clEnumValN(BinaryFunction::ST_NONE, "0",
|
||||
"do not split any function"),
|
||||
clEnumValN(BinaryFunction::ST_EH, "1",
|
||||
"split all landing pads"),
|
||||
clEnumValN(BinaryFunction::ST_LARGE, "2",
|
||||
"also split if function too large to fit"),
|
||||
clEnumValN(BinaryFunction::ST_ALL, "3",
|
||||
"split all functions")),
|
||||
cl::ZeroOrMore,
|
||||
cl::cat(BoltOptCategory));
|
||||
|
||||
cl::opt<bool>
|
||||
SplitEH("split-eh",
|
||||
cl::desc("split C++ exception handling code"),
|
||||
|
@ -783,26 +759,6 @@ RewriteInstance::RewriteInstance(ELFObjectFileBase *File, DataReader &DR,
|
|||
|
||||
RewriteInstance::~RewriteInstance() {}
|
||||
|
||||
void RewriteInstance::reset() {
|
||||
FileSymRefs.clear();
|
||||
auto &DR = BC->DR;
|
||||
DR.reset();
|
||||
BC = createBinaryContext(
|
||||
InputFile, DR,
|
||||
DWARFContext::create(*InputFile, nullptr,
|
||||
DWARFContext::defaultErrorHandler, "", false));
|
||||
BAT = llvm::make_unique<BoltAddressTranslation>(*BC);
|
||||
CFIRdWrt.reset(nullptr);
|
||||
OLT.reset(nullptr);
|
||||
EFMM.reset();
|
||||
Out.reset(nullptr);
|
||||
EHFrame = nullptr;
|
||||
FailedAddresses.clear();
|
||||
if (opts::UpdateDebugSections) {
|
||||
DebugInfoRewriter = llvm::make_unique<DWARFRewriter>(*BC, SectionPatchers);
|
||||
}
|
||||
}
|
||||
|
||||
bool RewriteInstance::shouldDisassemble(const BinaryFunction &BF) const {
|
||||
// If we have to relocate the code we have to disassemble all functions.
|
||||
if (!BF.getBinaryContext().HasRelocations && !opts::shouldProcess(BF)) {
|
||||
|
@ -1079,93 +1035,53 @@ void RewriteInstance::run() {
|
|||
return;
|
||||
}
|
||||
|
||||
auto executeRewritePass = [&](const std::set<uint64_t> &NonSimpleFunctions,
|
||||
bool ShouldSplit) {
|
||||
discoverStorage();
|
||||
readSpecialSections();
|
||||
adjustCommandLineOptions();
|
||||
discoverFileObjects();
|
||||
|
||||
std::thread PreProcessProfileThread([&]() {
|
||||
if (!DA.started())
|
||||
return;
|
||||
|
||||
outs() << "BOLT-INFO: spawning thread to pre-process profile\n";
|
||||
preprocessProfileData();
|
||||
});
|
||||
|
||||
if (opts::NoThreads)
|
||||
PreProcessProfileThread.join();
|
||||
|
||||
readDebugInfo();
|
||||
|
||||
// Skip disassembling if we have a translation table and we are running an
|
||||
// aggregation job.
|
||||
if (!opts::AggregateOnly || !BAT->enabledFor(InputFile)) {
|
||||
disassembleFunctions();
|
||||
}
|
||||
|
||||
if (PreProcessProfileThread.joinable())
|
||||
PreProcessProfileThread.join();
|
||||
|
||||
processProfileData();
|
||||
|
||||
if (opts::AggregateOnly)
|
||||
return;
|
||||
|
||||
postProcessFunctions();
|
||||
for (uint64_t Address : NonSimpleFunctions) {
|
||||
auto *BF = BC->getBinaryFunctionAtAddress(Address);
|
||||
assert(BF && "bad non-simple function address");
|
||||
if (ShouldSplit)
|
||||
BF->setLarge(true);
|
||||
else
|
||||
BF->setSimple(false);
|
||||
}
|
||||
if (opts::DiffOnly)
|
||||
return;
|
||||
runOptimizationPasses();
|
||||
emitAndLink();
|
||||
};
|
||||
|
||||
outs() << "BOLT-INFO: Target architecture: "
|
||||
<< Triple::getArchTypeName(
|
||||
(llvm::Triple::ArchType)InputFile->getArch())
|
||||
<< "\n";
|
||||
|
||||
unsigned PassNumber = 1;
|
||||
executeRewritePass({}, false);
|
||||
if (opts::AggregateOnly || opts::DiffOnly)
|
||||
discoverStorage();
|
||||
readSpecialSections();
|
||||
adjustCommandLineOptions();
|
||||
discoverFileObjects();
|
||||
|
||||
std::thread PreProcessProfileThread([&]() {
|
||||
if (!DA.started())
|
||||
return;
|
||||
|
||||
outs() << "BOLT-INFO: spawning thread to pre-process profile\n";
|
||||
preprocessProfileData();
|
||||
});
|
||||
|
||||
if (opts::NoThreads)
|
||||
PreProcessProfileThread.join();
|
||||
|
||||
readDebugInfo();
|
||||
|
||||
// Skip disassembling if we have a translation table and we are running an
|
||||
// aggregation job.
|
||||
if (!opts::AggregateOnly || !BAT->enabledFor(InputFile)) {
|
||||
disassembleFunctions();
|
||||
}
|
||||
|
||||
if (PreProcessProfileThread.joinable())
|
||||
PreProcessProfileThread.join();
|
||||
|
||||
processProfileData();
|
||||
|
||||
if (opts::AggregateOnly)
|
||||
return;
|
||||
|
||||
if (opts::SplitFunctions == BinaryFunction::ST_LARGE &&
|
||||
checkLargeFunctions()) {
|
||||
++PassNumber;
|
||||
// Emit again because now some functions have been split
|
||||
outs() << "BOLT: split-functions: starting pass " << PassNumber << "...\n";
|
||||
reset();
|
||||
executeRewritePass(LargeFunctions, true);
|
||||
}
|
||||
postProcessFunctions();
|
||||
|
||||
// Emit functions again ignoring functions which still didn't fit in their
|
||||
// original space, so that we don't generate incorrect debugging information
|
||||
// for them (information that would reflect the optimized version).
|
||||
if (opts::UpdateDebugSections && opts::FixDebugInfoLargeFunctions &&
|
||||
checkLargeFunctions()) {
|
||||
++PassNumber;
|
||||
outs() << format("BOLT: starting pass %zu (ignoring %zu large functions) ",
|
||||
PassNumber, LargeFunctions.size())
|
||||
<< "...\n";
|
||||
reset();
|
||||
executeRewritePass(LargeFunctions, false);
|
||||
}
|
||||
if (opts::DiffOnly)
|
||||
return;
|
||||
|
||||
{
|
||||
NamedRegionTimer T("updateDebugInfo", "update debug info", TimerGroupName,
|
||||
TimerGroupDesc, opts::TimeRewrite);
|
||||
if (opts::UpdateDebugSections)
|
||||
DebugInfoRewriter->updateDebugInfo();
|
||||
}
|
||||
runOptimizationPasses();
|
||||
|
||||
emitAndLink();
|
||||
|
||||
updateDebugInfo();
|
||||
|
||||
if (opts::WriteBoltInfoSection)
|
||||
addBoltInfoSection();
|
||||
|
@ -3245,6 +3161,15 @@ void RewriteInstance::linkRuntime() {
|
|||
<< Twine::utohexstr(InstrumentationRuntimeStartAddress) << "\n";
|
||||
}
|
||||
|
||||
void RewriteInstance::updateDebugInfo() {
|
||||
if (!opts::UpdateDebugSections)
|
||||
return;
|
||||
|
||||
NamedRegionTimer T("updateDebugInfo", "update debug info", TimerGroupName,
|
||||
TimerGroupDesc, opts::TimeRewrite);
|
||||
DebugInfoRewriter->updateDebugInfo();
|
||||
}
|
||||
|
||||
void RewriteInstance::emitFunctions(MCStreamer *Streamer) {
|
||||
auto emit = [&](const std::vector<BinaryFunction *> &Functions) {
|
||||
for (auto *Function : Functions) {
|
||||
|
|
|
@ -51,10 +51,6 @@ public:
|
|||
StringRef ToolPath);
|
||||
~RewriteInstance();
|
||||
|
||||
/// Reset all state except for split hints. Used to run a second pass with
|
||||
/// function splitting information.
|
||||
void reset();
|
||||
|
||||
/// Run all the necessary steps to read, optimize and rewrite the binary.
|
||||
void run();
|
||||
|
||||
|
|
Loading…
Reference in New Issue