Commit Graph

832 Commits

Author SHA1 Message Date
Maksim Panchenko 21cc191ea8 Added function to parse and dump .gcc_except_table
Summary: Use '-print-exceptions' option to dump contents of .gcc_except_table.

(cherry picked from FBD2609925)
2015-11-02 11:50:53 -07:00
Rafael Auler 0e8998713c Extract non-taken branch frequencies from LBR
Summary:
Previously, we inferred all non-taken branch frequencies with the
information we had for taken branches. This patch teaches perf2flo and llvm-flo
how to read and incorporate non-taken branch frequencies directly from the
traces available in LBR data and by disassembling the binary. It still leaves
the inference engine untouched in case we need it to fill out other
fall-throughs.

(cherry picked from FBD2589212)
2015-10-26 15:00:56 -07:00
Rafael Auler 13a520ab30 Implement two cluster layout heuristics
Summary:
Pettis' paper on block layout (PLDI'90) suggests we should order
clusters (or chains, using the paper terminology) using a specific criterion.
This patch implements two distinct ideas for cluster layout that can be
activated using different command-line flags. The first one reflects Pettis'
ideas on minimizing branch mispredictions and the second one is targeted at
reducing I-cache misses, described in the Ispike paper (CGO'04).

(cherry picked from FBD2588693)
2015-10-23 09:38:26 -07:00
Rafael Auler 2539539bde Fixes priority queue ordering in llvm-flo block reordering
Summary:
Fixes a bug which caused the block reordering heuristic to put in the
same cluster hot basic blocks and cold basic blocks, increasing I-cache misses.

(cherry picked from FBD2588203)
2015-10-27 03:04:58 -07:00
Maksim Panchenko d4d773458c More control over function printing.
Summary:
Can use '-print-*' option to print function at specific stage.
Use '-print-all' to print at every stage.

(cherry picked from FBD2578196)
2015-10-23 15:52:59 -07:00
Maksim Panchenko 7f44331773 Issue warning when relaxed tail call is seen on input.
Summary:
Issue warning when we see a 2-byte tail call. Currently we
will increase the size of these instructions.

(cherry picked from FBD2575520)
2015-10-20 10:51:17 -07:00
Rafael Auler 546c4e6e84 Fix bug in BinaryFunction::fixBranches() in llvm-flo
Summary:
When the ignore-nops patch landed, it exposed a bug in fixBranches()
where it ignored empty BBs. However, we cannot ignore empty BBs when it is
reordered and its fall-through changes. We must update it with a jump to the
original fall-through. This patch fixes this.

(cherry picked from FBD2568244)
2015-10-21 16:25:16 -07:00
Rafael Auler dc848b5376 Fix entry BB execution count in llvm-flo
Summary:
When we have tailcalls, the execution count for the entry point is
wrongly computed. Fix this.

(cherry picked from FBD2563112)
2015-10-20 16:48:54 -07:00
Rafael Auler ab63ca9afb Implement unreachable BB elimination in llvm-flo
Summary:
It is important to remove dead blocks to free up space in functions
and allow us to reorder blocks or align branch targets with more
freedom. This patch implements a simple algorithm to delete all basic
blocks that are not reachable from the entry point. Note that C++
exceptions may create "unreachable" blocks, so this option must be
used with care.

(cherry picked from FBD2562637)
2015-10-20 12:47:37 -07:00
Rafael Auler 9f41a0d263 Do not schedule BBs before the entry point
Summary:
SPEC CPU2006 perlbench triggered a bug in our heuristic block
reordering algorithm where a hot edge that targets the entry point (as in a
recursive tail call) would make us try to allocate the call site before the
function entry point. Since we don't update function addresses yet, moving the
entry point will corrupt the program. This patch fixes this.

(cherry picked from FBD2562528)
2015-10-20 12:30:22 -07:00
Rafael Auler b0115a4536 Teach llvm-flo how to handle two back-to-back JMPs
Summary:
If we have two consecutive JMP instructions and no branches to the
second one, the second one is dead code, but llvm-flo does not handle these
cases properly and put two JMPs in the same BB. This patch fixes this, putting
the extraneous JMP in a separate block, making it easy for us to detect it is
dead code and remove it later in a separate step.

(cherry picked from FBD2562465)
2015-10-20 10:17:38 -07:00
Maksim Panchenko 85b99eb7b7 Eliminate nop instruction in input and derive alignment.
Summary:
Nop instructions are primarily used for alignment purposes on the input.
We remove all nops when we build CFG and derive alignment of basic blocks
based on existing alignment and a presence of nops before it. This
will not always work as some basic blocks will be naturally aligned
without necessity for nops. However, it's better than random alignment.
We would also add heuristics for BB alignment based on execution profile.

(cherry picked from FBD2561740)
2015-10-20 10:51:17 -07:00
Rafael Auler cd6250d1e3 Fixes branches after reordering basic blocks in a binary function
Summary:
Adds logic in BinaryFunction to be able to fix branches (invert
its condition, delete or add a branch), making the new function work with the
new layout proposed by the layout pass. All the architecture-specific content
was designed to live in the LLVM Target library, in the MCInstrAnalysis pass.
For now, we only introduce such logic to the X86 backend.

(cherry picked from FBD2551479)
2015-10-16 09:49:04 -07:00
Rafael Auler ef059af3d1 Fix bug in block reorder heuristic
Summary:
Tests with SPEC CPU2006 400.perlbench exposed a bug in the block reordering
heuristic that happened when two blocks are both successor and predecessor of
each other. This patch fixes this.

(cherry picked from FBD2555835)
2015-10-19 10:43:54 -07:00
Rafael Auler 31e6bd1226 Fix missing sanity check in BinaryFunction::optimizeLayout()
Summary:
SPEC CPU2006 perlbench exposed a bug in BinaryFunction::optimizeLayout()
where it would try to optimize the layout even though the function had zero
basic blocks. This patch simply checks if the function has zero basic blocks and
bails out.

(cherry picked from FBD2556831)
2015-10-19 13:23:03 -07:00
Maksim Panchenko b4ed5cc942 Make FLO work on hhvm binary.
Summary: Fixes several issues that prevented us from running hhvm binary.

(cherry picked from FBD2543057)
2015-10-14 15:35:14 -07:00
Rafael Auler ec22caff1e Fix comments. NFC.
Summary: Updated comments in BinaryFunction class.

(cherry picked from FBD28108888)
2015-10-16 17:15:00 -07:00
Rafael Auler 9a8d357d0b Fix DataReader to work with new local sym perf2flo format
Summary:
In a recent commit, we changed local symbols to be specially tagged
with the number 2 (local sym) instead of 1 (sym). This patch modifies the reader
to don't choke when seeing a 2 in the symbol id field.

(cherry picked from FBD2552776)
2015-10-16 17:00:36 -07:00
Rafael Auler f9ed45893b Teach llvm-flo how to reorder blocks in an optimal way
Summary:
This patch implements a dynamic programming approach to solve reorder
basic blocks with profiling information in an optimal way. Since this is
analogous to TSP, it is NP-hard and the algorithm is exponential in time and
memory consumption. Therefore, we only use the optimal algorithm to decide the
layout of small functions (with less than 11 basic blocks).

(cherry picked from FBD2544124)
2015-10-14 16:58:55 -07:00
Rafael Auler 34f7085503 Teach llvm-flo how to reorder basic blocks with a heuristic
Summary:
This patch introduces a first approach to reorder basic blocks based on
profiling data that gives us the execution frequency for each edge. Our strategy
is to layout basic blocks in a order that maximizes the weight (hotness) of
branches that will be deleted. We can delete branches when src comes right
before dst in the new layout order. This can be reduced to the TSP problem. This
patch uses a greedy heuristic to solve the problem: we start with a graph with
no edges and progressively add edges by choosing the hottest edges first,
building a layout order that attempts to put BBs with hot edges together.

(cherry picked from FBD2544076)
2015-10-13 12:18:54 -07:00
Rafael Auler 9b58b2e64b Make llvm-flo infer branch count data for fall-through edges
Summary:
The LBR only has information about taken branches and does not record
information when a branch is not taken. In our CFG, we call these edges
"fall-through" edges. This patch teaches llvm-flo how to infer fall-through
edge frequencies.

(cherry picked from FBD2536633)
2015-10-13 10:25:45 -07:00
Maksim Panchenko f79f6302c1 Converted local offsets from uint64_t to uint32_t. Refactoring.
(cherry picked from FBD2543557)
2015-10-14 16:46:59 -07:00
Rafael Auler 4c1da22ae9 Add branch count information to binary CFG
Summary:
Changes DataReader to organize branch perf data per function name and
sets up logistics to bring this data to BinaryFunction::buildCFG(). To do this,
we expand BinaryContext with a const reference to DataReader. This patch also
adds the "-dump-functions" flag to force llvm-flo to dump the current state of
BinaryFunctions once they are disassembled and their CFG built, allowing us to
test whether the builder is sane with LLVM LIT tests.

(cherry picked from FBD2534675)
2015-10-12 12:30:47 -07:00
Maksim Panchenko d30423f872 Don't bail out if there's no input data file specified.
Summary:
Don't attempt to read data file if it was not
specified by the user.

(cherry picked from FBD2533440)
2015-10-12 14:46:18 -07:00
Maksim Panchenko ffcc2be7fa FLO: added support for rip-relative operands.
Summary: Detect and replace rip-relative operands with relocations.

(cherry picked from FBD2529818)
2015-10-09 21:47:18 -07:00
Maksim Panchenko f166c4ab2b Fix CFG building issue.
Summary:
Fixed getBasicBlockContainingOffset() to return correct basic
block.

(cherry picked from FBD2532514)
2015-10-12 12:12:16 -07:00
Rafael Auler e1a539b0ec Add initial implementation of DataReader
Summary:
This patch introduces DataReader, a module responsible for
parsing llvm flo data files into in-memory data structures.

(cherry picked from FBD2515754)
2015-10-05 18:31:25 -07:00
Maksim Panchenko 9a2fe7ebe4 Commit FLO with control flow graph.
Summary:
llvm-flo disassembles, builds control flow graph, and re-writes
simple functions.

(cherry picked from FBD2524024)
2015-10-09 17:21:14 -07:00
Maksim Panchenko 7927c14ff5 Fixed cmake.
(cherry picked from FBD28108725)
2015-10-02 12:38:07 -07:00
Maksim Panchenko a89c417357 Removed remote .arcconfig + comment change.
(cherry picked from FBD2503821)
2015-10-02 12:06:31 -07:00
Maksim Panchenko 575b24d719 Initial FLO commit.
Summary:
Directory created.

(cherry picked from FBD28105260)
2015-10-02 11:55:15 -07:00
Maksim Panchenko 25b976aa12 BOLT root commit 2022-01-10 17:58:05 -08:00