Commit Graph

151 Commits

Author SHA1 Message Date
Theodoros Kasampalis 32739247eb More aggressive inlining pass
Summary:
This adds functionality for a more aggressive inlining pass, that can
inline tail calls and functions with more than one basic block.

(cherry picked from FBD3677856)
2016-07-29 14:17:06 -07:00
Bill Nell 82d76ae18b Add MCInst annotation mechanism to MCInstrAnalysis class.
Summary:
Add three new MCOperand types: Annotation, LandingPad and GnuArgsSize.

Annotation is used for associating random data with MCInsts.  Clients can
construct their own annotation types (subclassed from MCAnnotation) and
associate them with instructions.  Annotations are looked up by string keys.

Annotations can be added, removed and queried using an instance of the
MCInstrAnalysis class.

The LandingPad operand is a MCSymbol, uint64_t pair used to encode exception
handling information for call instructions.

GnuArgsSize is used to annotate calls with the DW_CFA_GNU_args_size attribute.

(cherry picked from FBD3597877)
2016-07-28 10:34:50 -07:00
Theodoros Kasampalis 713e361f36 Fix for correct disassembling of conditional tail calls.
Summary:
BOLT attempts to convert jumps that serve as tail calls to dedicated tail call
instructions, but this is impossible when the jump is conditional because there is
no corresponding tail call instruction. This was causing the creation of a duplicate
fall-through edge for basic blocks terminated with a conditional jump serving as
a tail call when there is profile data available for the non-taken branch. In this
case, the first fall-through edge had a count taken from the profile data, while
the second has a count computed (incorrectly) by
BinaryFunction::inferFallThroughCounts.

(cherry picked from FBD3560504)
2016-07-13 18:57:40 -07:00
Maksim Panchenko 486ab273c7 Add printing support for indirect tail calls.
Summary:
LLVM was missing assembler print string for indirect tail
calls which are synthetic instructions created by us.

(cherry picked from FBD3640197)
2016-07-28 18:49:48 -07:00
Bill Nell 50e011f4e5 CFG editing functions
Summary:
This diff adds a number of methods to BinaryFunction that can be used to edit the CFG after it is created.

The basic public functions are:
  - createBasicBlock - create a new block that is not inserted into the CFG.
  - insertBasicBlocks - insert a range of blocks (made with createBasicBlock) into the CFG.
  - updateLayout - update the CFG layout (either by inserting new blocks at a certain point or recomputing the entire layout).
  - fixFallthroughBranch - add a direct jump to the fallthrough successor for a given block.

There are a number of private helper functions used to implement the above.

This was split off the ICP diff to simplify it a bit.

(cherry picked from FBD3611313)
2016-07-23 12:50:34 -07:00
Theodoros Kasampalis ab599fe71a Basic block clustering algorithm for minimizing branches.
Summary:
This algorithm is similar to our main clustering algorithm but uses
a different heuristic for selecting edges to become fall-throughs.
The weight of an edge is calculated as the win in branches if we choose
to layout this edge as a fall-through. For example, the edges A -> B with
execution count 100 and A -> C with execution count 500 (where B and C
are the only successors of A) have weights -400 and +400 respectively.

(cherry picked from FBD3606591)
2016-07-15 16:11:30 -07:00
Theodoros Kasampalis a9bb3320ad Identical Code Folding (ICF) pass
Summary:
Added an ICF pass to BOLT, that can recognize identical functions
and replace references to these functions with references to just one
representative.

(cherry picked from FBD3460297)
2016-06-09 11:36:55 -07:00
Bill Nell 82401630a2 Factor out instruction printing and size computation.
Summary:
I've factored out the instruction printing and size computation routines to
methods on BinaryContext.  I've also added some more debug print functions.

This was split off the ICP diff to simplify it a bit.

(cherry picked from FBD3610690)
2016-07-23 08:01:53 -07:00
Theodoros Kasampalis 156a55209c Simplification of loads from read-only data sections.
Summary:
Instructions that load data from the a read-only data section and their
target address can be computed statically (e.g. RIP-relative addressing)
are modified to corresponding instructions that use immediate operands.
We apply the transformation only when the resulting instruction will have
smaller or equal size.

(cherry picked from FBD3397112)
2016-06-03 00:58:11 -07:00
Theodoros Kasampalis 17b846586c Loop detection for BOLT's CFG.
Summary:
Loop detection for the CFG data structure. Added a GraphTraits
specialization for BOLT's CFG that allows us to use LLVM's loop
detection interface.

(cherry picked from FBD3604837)
2016-05-26 10:58:01 -07:00
Bill Nell ea53cffb2d Add movabs -> mov shortening optimization. Add peephole optimization pass that does instruction shortening.
Summary:
Shorten when a mov instruction has a 64-bit immediate that can be repesented as
a sign extended 32-bit number, use the smaller mov instruction (MOV64ri -> MOV64ri32).

Add peephole optimization pass that does instruction shortening.

(cherry picked from FBD3603099)
2016-07-21 16:40:06 -07:00
Maksim Panchenko c6d0c568d4 Add BinaryContext::getSectionForAddress()
Summary: Interface for accessing section from BinaryContext.

(cherry picked from FBD3600854)
2016-07-21 12:45:35 -07:00
Maksim Panchenko f2d82919d0 Move debug-handling code into DWARFRewriter (NFC).
Summary: RewriteInstance.cpp is getting too big. Split the code.

(cherry picked from FBD3596103)
2016-05-31 19:12:26 -07:00
Maksim Panchenko bf46263eed Shorten instructions if possible.
Summary:
Generate short versions of branch instructions by default and rely on
relaxation to produce longer versions when needed.

Also produce short versions of arithmetic instructions if immediate
fits into one byte. This was only triggered once on HHVM binary.

(cherry picked from FBD3591466)
2016-07-19 11:19:18 -07:00
Bill Nell 674dbcc0de Fix crash in patchELFPHDRTable when no functions are modified.
Summary:
patchELFPHDRTable was asserting that it could not find an entry
for .eh_frame_hdr in SectionMapInfo when no functions were modified
by BOLT.

This just changes code to skip modifying GNU_EH_FRAME program headers
hen SectionMapInfo is empty.  The existing header is copied and written
instead.

(cherry picked from FBD3557481)
2016-07-12 16:43:53 -07:00
Maksim Panchenko 84b5b9e462 Create alternative name for local symbols.
Summary:
If a profile data was collected on a stripped binary but an input
to BOLT is unstripped, we would use a different mangling scheme for
local functions and ignore their profiles. To solve the issue this
diff adds alternative name for all local functions such that one
of the names would match the name in the profile.

If the input binary was stripped, we reject it, unless "-allow-stripped"
option was passed. It's more complicated to do a matching in this case
since we have less information than at the time of profile collection.
It's also not that simple to tell if the profile was gathered on a
stripped binary (in which case we would have no issue matching data).

(cherry picked from FBD3548012)
2016-07-11 18:51:13 -07:00
Bill Nell bdd4af2134 Store index inside BinaryBasicBlock instead of in map on BinaryFunction.
Summary:
Store the basic block index inside the BinaryBasicBlock instead of a map in BinaryFunction.
This cut another 15-20 sec. from the processing time for hhvm.

(cherry picked from FBD3533606)
2016-07-07 21:43:43 -07:00
Bill Nell 90c9323511 Use unordered_map instead of map in ReorderAlgorithm and BinaryFunction::BasicBlockIndices.
Summary:
Use unordered_map instead of map in ReorderAlgorithm and BinaryFunction::BasicBlockIndices.
Cuts about 30sec off the processing time for the hhvm binary. (~8.5 min to ~8min)

(cherry picked from FBD3530910)
2016-07-07 11:48:50 -07:00
Theodoros Kasampalis c20506c570 Fix in inferFallthroughCounts
Summary:
This fixes the initialization of basic block execution counts, where
we should skip edges to the first basic block but we were not
skipping the corresponding profile info.

Also, I removed a check that was done twice.

(cherry picked from FBD3519265)
2016-07-03 21:30:35 -07:00
Bill Nell 260f6fbdb6 Add option to dump CFGs in (simple) graphviz format during all passes.
Summary:
I noticed the BinaryFunction::viewGraph() method that hadn't been implemented
and decided I could use a simple DOT dumper for CFGs while working on the indirect
call optimization.

I've implemented the bare minimum for the dumper.  It's just nodes+BB labels with
dges. We can add more detailed information as needed/desired.

(cherry picked from FBD3509326)
2016-07-01 08:40:56 -07:00
Theodoros Kasampalis 6eb4e5b687 perf2bolt can extract branch records with histories
Summary:
Added perf2bolt functionality for extracting branch records
with histories of previous branches. The length of the histories
is user defined, and the default is 0 (previous functionality). Also,
DataReader can parse perf2bolt output with histories.
Note: creating profile data with long histories can increase their
size significantly (2x for history of length 1, 3x for length 2 etc).

(cherry picked from FBD3473983)
2016-06-21 18:44:42 -07:00
Theodoros Kasampalis 287fa51324 Fix for ignoring fall-through profile data when jump is followed by no-op
Summary:
When a conditional jump is followed by one or more no-ops, the
destination of fall-through branch was recorded as the first no-op in
FuncBranchInfo. However the fall-through basic block after the jump
starts after the no-ops, so the profile data could not match the CFG
and was ignored.

(cherry picked from FBD3496084)
2016-06-27 14:51:38 -07:00
Theodoros Kasampalis d09b00ebff Refactoring of the reordering algorithms
Summary:
The various reorder and clustering algorithms have been refactored
into separate classes, so that it is easier to add new algorithms and/or
change the logic of algorithm selection.

(cherry picked from FBD3473656)
2016-06-16 18:47:57 -07:00
Maksim Panchenko f1192a7118 Support for multiple function names.
Summary:
With ICF optimization in the linker we were getting mismatches of
function names in .fdata and BinaryFunction name. This diff adds
support for multiple function names for BinaryFunction and
does a match against all possible names for the profile.

(cherry picked from FBD3466215)
2016-06-10 17:13:05 -07:00
Maksim Panchenko 70f82d9371 Reject profile data for functions that do not match.
Summary:
Verify profile data for a function and reject if there are branches
that don't correspond to any branches in the function CFG. Note that
we have to ignore branches resulting from recursive calls.

Fix printing instruction offsets in disassembled state.

Allow function to have non-zero execution count even if we don't
have branch information.

(cherry picked from FBD3451596)
2016-06-15 18:36:16 -07:00
Maksim Panchenko 88ac5d9d0e [merge-fdata] Add option to print function list.
Summary:
Print total number of functions/objects that have profile
and add new options:

  -print      - print the list of objects with count to stderr
    =none     -   do not print objects/functions
    =exec     -   print functions sorted by execution count
    =branches -   print functions sorted by total branch count
  -q          - do not print merged data to stdout

(cherry picked from FBD3442288)
2016-06-09 17:45:15 -07:00
Bill Nell 980a06265a Revert "Indirect call optimization."
This reverts commit 33966090e18545b64013614e7929ff1bdcdf10d5.

(cherry picked from FBD28110782)
2016-06-08 17:38:13 -07:00
Bill Nell 8bcfd9a392 Indirect call optimization.
(cherry picked from FBD28110629)
2016-06-07 16:27:52 -07:00
Bill Nell 45e2219ae4 Allocate BinaryBasicBlocks with new rather than storing them in the BasicBlocks vector.
Summary: This will help optimization passes that need to modify the CFG after it is constructed.  Otherwise, the BinaryBasicBlock pointers stored in the layout, successors and predecessors would need to be modified every time a new basic block is created.

(cherry picked from FBD3403372)
2016-06-07 16:27:52 -07:00
Maksim Panchenko 6da0d95326 Fix large functions debug info by default.
Summary:
Turn on -fix-debuginfo-large-functions by default.

In the process of testing I've discovered that we output cold code
for functions that were too large to be emitted. Fixed that.

(cherry picked from FBD3372697)
2016-05-31 19:29:34 -07:00
Maksim Panchenko 4460da0d81 Improvements for debug info.
Summary:
Assembly functions could have no corresponding DW_AT_subprogram
entries, yet they are represented in module ranges (and .debug_aranges)
and will have line number information. Make sure we update those.

Eliminated unnecessary data structures and optimized some passes.

For .debug_loc unused location entries are no longer processed
resulting in smaller output files.

Overall it's a small processing time improvement and memory imporement.

(cherry picked from FBD3362540)
2016-05-27 20:19:19 -07:00
Theodoros Kasampalis 65ac8bbdf2 Better edge counts for fall through blocks in presence of C++ exceptions.
Summary: The inference algorithm for counts of fall through edges takes possible jumps to landing pad blocks into account. Also, the landing pad block execution counts are updated using profile data.

(cherry picked from FBD3350727)
2016-05-26 15:10:09 -07:00
Theodoros Kasampalis 485f9220b7 Taking LP counts into account for FT count inference
(cherry picked from FBD28110493)
2016-05-24 09:26:25 -07:00
Theodoros Kasampalis fb5f18b2dc Correctly updating landing pad exec counts.
(cherry picked from FBD28110316)
2016-05-23 16:16:25 -07:00
Maksim Panchenko 06b9c5b342 Better .debug_line for non-simple functions.
Summary:
Generate .debug_line info for non-simple functions in a way
that if preferrable by 'objdump -S'.

(cherry picked from FBD3345485)
2016-05-24 20:50:36 -07:00
Maksim Panchenko 7b97793b94 Fix for clang .debug_info.
Summary:
Clang uses different attribute for high_pc which
was incompatible with the way we were updating
ranges. This diff fixes it.

(cherry picked from FBD3345537)
2016-05-24 14:54:23 -07:00
Maksim Panchenko cfa5d753eb Miscellaneous fixes for debug info.
Summary:
* Fix several cases for handling debug info:
  - properly update CU DW_AT_ranges for function with folded body
    due to ICF optimization
  - convert ranges to DW_AT_ranges from hi/low PC for all DIEs
  - add support for [a, a) range
  - update CU ranges even when there are no functions registered
* Overwrite .debug_ranges section instead of appending.
* Convert assertions in debug info handling part into warnings.

(cherry picked from FBD3339383)
2016-05-23 19:36:38 -07:00
Maksim Panchenko 7ab3db129b Create DW_AT_ranges for compile units.
Summary:
Some compile unit DIEs might be missing DW_AT_ranges because they were
compiled without "-ffunction-sections" option. This diff adds the
attribute to all compile units.

If the section is not present, we need to create it. Will do it in a
separate diff.

(cherry picked from FBD3314984)
2016-05-17 18:10:14 -07:00
Maksim Panchenko f047b9d43a Overwrite contents of .debug_line section.
Summary:
Overwrite contents of .debug_line section since we don't reference
the original contents anymore. This saves ~100MB of HHVM binary.

(cherry picked from FBD3314917)
2016-05-16 17:02:17 -07:00
Bill Nell e63984f325 Patch forward jumping tail calls to prevent branch mispredictions.
Summary:
A simple optimization to prevent branch misprediction for tail calls.
Convert the sequence:

        j<cc> L1
        ...
    L1: jmp foo # tail call

into:

        j<cc> foo

but only if 'j<cc> foo' turns out to be a forward branch.

(cherry picked from FBD3234207)
2016-05-02 12:47:18 -07:00
Maksim Panchenko b445f5eb7b Fix issue with garbage address in .debug_line.
Summary:
While emitting debug lines for a function we don't overwrite, we
don't have a code section context that is needed by default
writing routine. Hence we have to emit end_sequence after the
last address, not at the end of section.

(cherry picked from FBD3291533)
2016-05-11 19:13:38 -07:00
Bill Nell f7e7e25b88 Put all optimization passes under the pass manager.
Summary:
Move eliminate unreachable code, block reordering, and CFI/exception fixup
into official optimization passes.

(cherry picked from FBD3248991)
2016-05-02 12:47:18 -07:00
Gabriel Poesia 5fa128e748 Inlining of small functions.
Summary:
Added an optimization pass of inlining calls to small functions (with only one
basic block). Inlining is done in a very simple way, inserting instructions to
simulate the changes to the stack pointer that call/ret would make before/after the
inlined function executes. Also, the heuristic prefers to inline calls that happen
in the hottest blocks (by looking at their execution count). Calls in cold blocks are
ignored.

(cherry picked from FBD3233516)
2016-04-25 14:25:58 -07:00
Gabriel Poesia d1f525499e Optimize calls to functions that are a single unconditional jump
Summary:
Many functions (around 600) in the HHVM binary are simply
a single unconditional jump instruction to another function. These can
be trivially optimized by modifying the call sites to directly call the
branch target instead (because it also happens with more than one jump
in sequence, we do it iteratively).

This diff also adds a very simple analysis/optimization pass system in
which this pass is the first one to be implemented. A follow-up to this
could be to move the current optimizations to other passes.

(cherry picked from FBD3211138)
2016-04-15 15:59:52 -07:00
Gabriel Poesia e6acc7bb53 Optimize calls to functions that are a single unconditional jump
Summary:
Many functions (around 600) in the HHVM binary are simply
a single unconditional jump instruction to another function. These can
be trivially optimized by modifying the call sites to directly call the
branch target instead (because it also happens with more than one jump
in sequence, we do it iteratively).

This diff also adds a very simple analysis/optimization pass system in
which this pass is the first one to be implemented. A follow-up to this
could be to move the current optimizations to other passes.

(cherry picked from FBD3211138)
2016-04-15 15:59:52 -07:00
Gabriel Poesia 459eb8c230 Fix "Cannot update ranges for DIE at offset" error messages.
Summary:
Fix the error message by not printing it :)

Explanation: a previous diff accidentally removed this error message from within
the DEBUG macro, and it's expected that we'll have a bunch of them since a lot
of the DIEs we try to update are empty or meaningless. For instance (and mainly), there
is a huge number of lexical block DIEs with no attributes in .debug_info.
In the first phase of collecting debugging info, we store the offsets of all
these DIEs, only later to realize that we cannot update their address
ranges because they have none.

A better fix would be to check this earlier and not store offsets of DIEs
we cannot update to begin with.

(cherry picked from FBD3236923)
2016-04-28 12:55:35 -07:00
Maksim Panchenko de95a5b6a4 Make merge-fdata generate smaller .fdata files.
Summary:
A lot of the space in the merged .fdata is taken by branches
to and from [heap], which is jitted code. On different machines,
or during different runs, jitted addresses are all different.
We don't use these addresses, but we need branch info to get
accurate function call counts.

This diff treats all [heap] addresses the same, resulting in a
simplified merged file. The size of the compressed file decreased
from 70MB to 8MB.

(cherry picked from FBD3233943)
2016-04-27 18:06:18 -07:00
Maksim Panchenko 1258903b54 Fix for functions in different segments.
Summary:
In a test binary some functions are placed in a segment
preceding the segment containing .text section. As a result,
we were miscalculating maximum function size as the calculation
was based on addresses only.

This diff fixes the calculation by checking if symbol after function
belongs to the same section.  If it does not, then we set the maximum
function size based on the size of the containing section and not
on the address distance to the next symbol.

(cherry picked from FBD3229205)
2016-04-26 23:42:39 -07:00
Maksim Panchenko 3811673a0c Option to break in given functions.
Summary:
Added option "-break-funcs=func1,func2,...." to coredump in any
given function by introducing ud2 sequence at the beginning of the
function. Useful for debugging and validating stack traces.

Also renamed options containing "_" to use "-" instead.

Also run hhvm test with "-update-debug-sections".

(cherry picked from FBD3210248)
2016-04-21 09:54:33 -07:00
Maksim Panchenko 87a90ae133 Fix ninja install-* for BOLT utilities.
Summary:
Make sure we can install all tools needed for processing
BOLT .fdata files such as perf2bolt, merge-fdata, etc.

(cherry picked from FBD3223477)
2016-04-25 22:13:12 -07:00