Summary:
Preserve original layout for basic blocks that have 0 execution
count. Since we don't optimize for size, it's better to rely on
the original input order.
(cherry picked from FBD2875335)
Summary:
We should never outline the first basic block.
Also add an option to accept a file with the list of
functions to optimize.
(cherry picked from FBD2868184)
Summary:
We could split functions with exceptions even without creating
a new exception handling table. This limits us to only move
basic blocks that never throw, and are not a start of a
landing pad.
(cherry picked from FBD2862937)
Summary:
Some basic blocks were created empty because they only contained
alignment nop's. Ignore such nop's before basic block gets created.
Fixed intermittent aborts related to CFI update.
(cherry picked from FBD2844465)
Summary:
* Update CFI state for larger range of functions to increase coverage.
* Issue more warnings indicating reasons for skipping functions.
* Print top called functions in the binary.
(cherry picked from FBD2839734)
Summary:
Modified processing of "-reorder-blocks=" option and added an option
to reverse original basic blocks order for testing purposes.
(cherry picked from FBD2829862)
Summary:
Fixes some issues discovered after hhvm switched to gcc 4.9.
Add support for DW_CFA_GNU_args_size instruction.
Allow CFI instruction after the last instruction in a function.
Reverse conditions of assert for DW_CFA_set_loc.
(cherry picked from FBD28110096)
Summary:
Binary code could be weird. It could include calls to address 0 and
reference data at 0 (e.g. with lea on x86). LLVM JIT fatals
while resolving relocations against symbols at address 0x0. For now
we will stop emitting such code, i.e. we'll skip functions.
(cherry picked from FBD28109837)
Summary:
In a test binary, we found 8 cases where code in a function A would jump to the
middle of another function B. In this case, we cannot reorder function B because
this would change instruction offsets and break the program. This is pretty rare
but can happen in code written in assembly.
(cherry picked from FBD2719850)
Summary:
We found out that the insertion of extra nops to preserve alignment of
some loop bodies do not pay off the increased function size, since this extra
size may inhibit us from rewriting a reordered version of this function.
(cherry picked from FBD2718466)
Summary:
Our CFI parser in the LLVM library was giving up on parsing all CFI
instructions when finding a single instruction with expression operands. Yet,
all gcc-4.9 binaries seem to have at least one CFI instruction with expression
operands (DW_CFA_def_cfa_expression). This patch fixes this and makes DebugInfo
continue to parse other instructions, even though it does not completely parse
DWARF expressions yet. However, this seems to be enough to allow llvm-flo to
process gcc-4.9 binaries because the FDEs with DWARF expressions are linked to
the PLT region, and not to functions that we process.
If we ever try to read a function whose CFI depends on DWARF expression, which
is unlikely, llvm-flo will assert.
(cherry picked from FBD2693088)
Summary:
This patch builds upon the previous patch to create a two-pass process
to function splitting. We first perform the full rewriting pipeline to discover
which functions need splitting. Afterwards, we restart the pipeline with those
functions annotated to be split.
(cherry picked from FBD2691709)
Summary:
Previously, llvm-flo.cpp contained a long function doing lots of
different tasks. This patch refactors this logic into a separate class with
different member functions, exposing the relationship between each step of
the rewritting process and making it easier to coordinate/change it.
(cherry picked from FBD2691674)
Summary:
After basic block reordering, it may be possible that the reordered
function is now larger than the original because of the following reasons:
- jump offsets may change, forcing some jump instructions to use 4-byte
immediate operand instead of the 1-byte, shorter version.
- fall-throughs change, forcing us to emit an extra jump instruction to jump
to the original fall-through at the end of a basic block.
Since we currently do not change function addresses, we need to rewrite the
function back in the binary in the original location. If it doesn't fit, we were
dropping the function.
This patch adds a flag -split-functions that tells llvm-flo to split hot
functions into hot and cold separate regions. The hot region is written back
in the original function location, while the cold region is written in a
separate, far-away region reserved to flo via a linker script.
This patch also adds the logic to create and extra FDE to supply unwinding
information to the cold part of the function. Owing to this, we now need to
rewrite .eh_frame_hdr to another location and patch the EH_FRAME ELF segment
to point to this new .eh_frame_hdr.
(cherry picked from FBD2677996)
Summary:
This is an attempt at determining the hotness of functions we are
rewriting and help detect if we are discarding hot functions. This patch
introduces logic to estimate the number of instructions executed in each
function by using the profile data for branches. It sums the products of
BB frequency and size. Since we can only do this for functions we have
successfully disassembled, created the CFG and annotated with profiling
data, all complex functions that were not disassembled are left out from
this analysis.
(cherry picked from FBD2654985)
Summary:
Previously, we were marking functions with indirect calls as too
complex to be disassembled, but this was unnecessarily conservative. This patch
removes this restriction.
(cherry picked from FBD2669627)
Summary:
Teach llvm-flo to drop on function with LSDA information until we know
how to update them after block reordering.
(cherry picked from FBD2640806)
Summary:
This patch adds logic to detect when the binary has extra space
reserved for us via the __flo_storage symbol. If this symbol is present,
it means we have extra space in the binary to write extraneous information.
When we write a new .eh_frame, we cannot discard the old .eh_frame because
it may still contain relevant information for functions we do not reorder.
Thus, we write the new .eh_frame into __flo_storage and patch the current
.eh_frame_hdr to point to the new .eh_frame only for the functions we touched,
generating a binary that works with a bi-.eh_frame model.
(cherry picked from FBD2639326)
Summary:
This patch is an intermediary step towards updating the CFI in the
optimized binary. It adds the logic necessary to output our CFI annotations to
a new .eh_frame in the temporary object file we create to hold rewritten
functions. The next step will be to fully integrate this new .eh_frame into the
optimized binary.
(cherry picked from FBD2633728)
Summary:
This patch introduces logic to check how the CFI instructions define a
table to help during stack unwinding at exception run time and attempts to fix
any problem in this table that may have been introduced by reordering the basic
blocks. If it fails to fix this problem, the function is marked as not simple
and not eligible for rewriting.
(cherry picked from FBD2633696)
Summary:
Regenerate exception handling information after optimizations.
Use '-print-eh-ranges' to see CFG with updated ranges.
(cherry picked from FBD2660982)
Summary:
There were two issues: we were trying to process non-simple functions,
i.e. function that we don't fully understand, and then we failed to stop
iterating if EH closing label was after the last instruction in a
function.
(cherry picked from FBD2664460)
Summary:
Read .gcc_except_table and add information to CFG. Calls have extra operands
indicating there's a possible handler for exceptions and an action. Landing
pad information is recorded in BinaryFunction.
Also convert JMP instructions that are calls into tail calls pseudo
instructions so that they don't miss call instruction analysis.
(cherry picked from FBD2652775)
Summary: Reverting this commit until we better investigate why
it is necessary to change local symbol names with a prefix.
(cherry picked from FBD28109521)
Summary: After discussion with Maksim, we decided to drop the lines
that add the PG prefix if the symbol is already local, since they
wouldn't be impacted by the way LLVM handles these symbols.
(cherry picked from FBD28109400)
Summary:
This bug would cause llvm-flo to fail to disambiguate two local symbols
with the same file name, causing two different addresses to compete in the
symbol table for the resolution of a given name, causing unpredicted behavior in
the linker.
(cherry picked from FBD2646626)
Summary:
In order to represent CFI information in our BinaryFunction class, this
patch adds a map of Offsets to CFI instructions. In this way, we make it easy to
check exactly where DWARF CFI information is annotated in the disassembled
function.
(cherry picked from FBD2619216)
Summary:
We need to parse the whole contents of .gcc_except_table even if we are
not printing exceptions. Otherwise we are missing type index table and
miscalculate the size of the current table.
(cherry picked from FBD2632965)
Summary: In order to reorder binaries with C++ exceptions, we first need to
read DWARF CFI (call frame info) from binaries in a table in the .eh_frame
ELF section. This table contains unwinding information we need to be aware of
when reordering basic blocks, so as to avoid corrupting it. This patch also
cleans up some code from Exceptions.cpp due to a refactoring where we moved
some functions to the LLVM's libSupport.
(cherry picked from FBD2614464)
Summary:
Print actions for exception ranges from .gcc_except_table.
Types are printed as names if the name is available from symbol table.
(cherry picked from FBD2612631)
Summary:
Previously, we inferred all non-taken branch frequencies with the
information we had for taken branches. This patch teaches perf2flo and llvm-flo
how to read and incorporate non-taken branch frequencies directly from the
traces available in LBR data and by disassembling the binary. It still leaves
the inference engine untouched in case we need it to fill out other
fall-throughs.
(cherry picked from FBD2589212)
Summary:
Pettis' paper on block layout (PLDI'90) suggests we should order
clusters (or chains, using the paper terminology) using a specific criterion.
This patch implements two distinct ideas for cluster layout that can be
activated using different command-line flags. The first one reflects Pettis'
ideas on minimizing branch mispredictions and the second one is targeted at
reducing I-cache misses, described in the Ispike paper (CGO'04).
(cherry picked from FBD2588693)
Summary:
Fixes a bug which caused the block reordering heuristic to put in the
same cluster hot basic blocks and cold basic blocks, increasing I-cache misses.
(cherry picked from FBD2588203)
Summary:
When the ignore-nops patch landed, it exposed a bug in fixBranches()
where it ignored empty BBs. However, we cannot ignore empty BBs when it is
reordered and its fall-through changes. We must update it with a jump to the
original fall-through. This patch fixes this.
(cherry picked from FBD2568244)
Summary:
It is important to remove dead blocks to free up space in functions
and allow us to reorder blocks or align branch targets with more
freedom. This patch implements a simple algorithm to delete all basic
blocks that are not reachable from the entry point. Note that C++
exceptions may create "unreachable" blocks, so this option must be
used with care.
(cherry picked from FBD2562637)
Summary:
SPEC CPU2006 perlbench triggered a bug in our heuristic block
reordering algorithm where a hot edge that targets the entry point (as in a
recursive tail call) would make us try to allocate the call site before the
function entry point. Since we don't update function addresses yet, moving the
entry point will corrupt the program. This patch fixes this.
(cherry picked from FBD2562528)
Summary:
If we have two consecutive JMP instructions and no branches to the
second one, the second one is dead code, but llvm-flo does not handle these
cases properly and put two JMPs in the same BB. This patch fixes this, putting
the extraneous JMP in a separate block, making it easy for us to detect it is
dead code and remove it later in a separate step.
(cherry picked from FBD2562465)
Summary:
Nop instructions are primarily used for alignment purposes on the input.
We remove all nops when we build CFG and derive alignment of basic blocks
based on existing alignment and a presence of nops before it. This
will not always work as some basic blocks will be naturally aligned
without necessity for nops. However, it's better than random alignment.
We would also add heuristics for BB alignment based on execution profile.
(cherry picked from FBD2561740)
Summary:
Adds logic in BinaryFunction to be able to fix branches (invert
its condition, delete or add a branch), making the new function work with the
new layout proposed by the layout pass. All the architecture-specific content
was designed to live in the LLVM Target library, in the MCInstrAnalysis pass.
For now, we only introduce such logic to the X86 backend.
(cherry picked from FBD2551479)
Summary:
Tests with SPEC CPU2006 400.perlbench exposed a bug in the block reordering
heuristic that happened when two blocks are both successor and predecessor of
each other. This patch fixes this.
(cherry picked from FBD2555835)
Summary:
SPEC CPU2006 perlbench exposed a bug in BinaryFunction::optimizeLayout()
where it would try to optimize the layout even though the function had zero
basic blocks. This patch simply checks if the function has zero basic blocks and
bails out.
(cherry picked from FBD2556831)
Summary:
In a recent commit, we changed local symbols to be specially tagged
with the number 2 (local sym) instead of 1 (sym). This patch modifies the reader
to don't choke when seeing a 2 in the symbol id field.
(cherry picked from FBD2552776)
Summary:
This patch implements a dynamic programming approach to solve reorder
basic blocks with profiling information in an optimal way. Since this is
analogous to TSP, it is NP-hard and the algorithm is exponential in time and
memory consumption. Therefore, we only use the optimal algorithm to decide the
layout of small functions (with less than 11 basic blocks).
(cherry picked from FBD2544124)
Summary:
This patch introduces a first approach to reorder basic blocks based on
profiling data that gives us the execution frequency for each edge. Our strategy
is to layout basic blocks in a order that maximizes the weight (hotness) of
branches that will be deleted. We can delete branches when src comes right
before dst in the new layout order. This can be reduced to the TSP problem. This
patch uses a greedy heuristic to solve the problem: we start with a graph with
no edges and progressively add edges by choosing the hottest edges first,
building a layout order that attempts to put BBs with hot edges together.
(cherry picked from FBD2544076)
Summary:
The LBR only has information about taken branches and does not record
information when a branch is not taken. In our CFG, we call these edges
"fall-through" edges. This patch teaches llvm-flo how to infer fall-through
edge frequencies.
(cherry picked from FBD2536633)
Summary:
Changes DataReader to organize branch perf data per function name and
sets up logistics to bring this data to BinaryFunction::buildCFG(). To do this,
we expand BinaryContext with a const reference to DataReader. This patch also
adds the "-dump-functions" flag to force llvm-flo to dump the current state of
BinaryFunctions once they are disassembled and their CFG built, allowing us to
test whether the builder is sane with LLVM LIT tests.
(cherry picked from FBD2534675)
Summary:
This patch introduces DataReader, a module responsible for
parsing llvm flo data files into in-memory data structures.
(cherry picked from FBD2515754)