2017-02-17 06:57:57 +08:00
|
|
|
add_llvm_library(LLVMBOLTPasses
|
2021-08-20 08:07:01 +08:00
|
|
|
ADRRelaxationPass.cpp
|
2017-10-28 06:05:31 +08:00
|
|
|
Aligner.cpp
|
2017-05-02 07:52:54 +08:00
|
|
|
AllocCombiner.cpp
|
2021-09-28 01:51:25 +08:00
|
|
|
AsmDump.cpp
|
2017-02-17 06:57:57 +08:00
|
|
|
BinaryPasses.cpp
|
2017-05-27 06:46:46 +08:00
|
|
|
BinaryFunctionCallGraph.cpp
|
2021-10-09 02:47:10 +08:00
|
|
|
CacheMetrics.cpp
|
2017-05-27 03:53:21 +08:00
|
|
|
CallGraph.cpp
|
2017-06-03 07:57:22 +08:00
|
|
|
CallGraphWalker.cpp
|
2017-05-02 07:51:27 +08:00
|
|
|
DataflowAnalysis.cpp
|
|
|
|
DataflowInfoManager.cpp
|
2019-11-01 04:32:25 +08:00
|
|
|
ExtTSPReorderAlgorithm.cpp
|
2017-05-02 07:51:27 +08:00
|
|
|
FrameAnalysis.cpp
|
2017-02-17 06:57:57 +08:00
|
|
|
FrameOptimizer.cpp
|
2017-03-04 03:35:41 +08:00
|
|
|
HFSort.cpp
|
|
|
|
HFSortPlus.cpp
|
2018-05-23 06:52:21 +08:00
|
|
|
IdenticalCodeFolding.cpp
|
2017-03-09 11:58:33 +08:00
|
|
|
IndirectCallPromotion.cpp
|
2017-02-17 06:57:57 +08:00
|
|
|
Inliner.cpp
|
2019-06-20 11:10:49 +08:00
|
|
|
Instrumentation.cpp
|
2017-11-02 15:30:11 +08:00
|
|
|
JTFootprintReduction.cpp
|
2017-09-01 02:45:37 +08:00
|
|
|
LongJmp.cpp
|
2021-05-12 01:59:13 +08:00
|
|
|
LoopInversionPass.cpp
|
|
|
|
LivenessAnalysis.cpp
|
2017-08-03 01:59:33 +08:00
|
|
|
MCF.cpp
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 15:15:47 +08:00
|
|
|
PatchEntries.cpp
|
2017-05-27 03:53:21 +08:00
|
|
|
PettisAndHansen.cpp
|
2017-08-03 01:59:33 +08:00
|
|
|
PLTCall.cpp
|
2017-06-03 07:57:22 +08:00
|
|
|
RegAnalysis.cpp
|
2017-11-15 10:20:40 +08:00
|
|
|
RegReAssign.cpp
|
2017-02-17 06:57:57 +08:00
|
|
|
ReorderAlgorithm.cpp
|
2017-05-27 03:53:21 +08:00
|
|
|
ReorderFunctions.cpp
|
2018-04-21 11:03:31 +08:00
|
|
|
ReorderData.cpp
|
2017-05-02 07:52:54 +08:00
|
|
|
ShrinkWrapping.cpp
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-12 06:42:22 +08:00
|
|
|
SplitFunctions.cpp
|
2017-05-02 07:52:54 +08:00
|
|
|
StackAllocationAnalysis.cpp
|
|
|
|
StackAvailableExpressions.cpp
|
2017-05-02 07:51:27 +08:00
|
|
|
StackPointerTracking.cpp
|
2017-05-02 07:52:54 +08:00
|
|
|
StackReachingUses.cpp
|
2017-06-14 08:24:27 +08:00
|
|
|
StokeInfo.cpp
|
2021-07-01 22:11:26 +08:00
|
|
|
TailDuplication.cpp
|
2021-08-18 01:15:21 +08:00
|
|
|
ThreeWayBranch.cpp
|
2018-06-12 04:18:44 +08:00
|
|
|
ValidateInternalCalls.cpp
|
2018-06-08 02:10:37 +08:00
|
|
|
VeneerElimination.cpp
|
2018-07-26 10:07:41 +08:00
|
|
|
RetpolineInsertion.cpp
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-07 07:00:23 +08:00
|
|
|
|
2021-10-09 02:47:10 +08:00
|
|
|
LINK_LIBS
|
|
|
|
${LLVM_PTHREAD_LIB}
|
2017-02-17 06:57:57 +08:00
|
|
|
|
2021-10-09 02:47:10 +08:00
|
|
|
LINK_COMPONENTS
|
2021-11-16 12:01:48 +08:00
|
|
|
AsmPrinter
|
2021-10-09 02:47:10 +08:00
|
|
|
BOLTCore
|
|
|
|
BOLTUtils
|
|
|
|
MC
|
|
|
|
Support
|
|
|
|
)
|