Commit Graph

403 Commits

Author SHA1 Message Date
Maksim Panchenko 96adec51eb [BOLT] Rework debug info processing.
Summary:
Multiple improvements to debug info handling:
  * Add support for relocation mode.
  * Speed-up processing.
  * Reduce memory consumption.
  * Bug fixes.

The high-level idea behind the new debug handling is that we don't save
intermediate state for ranges and location lists. Instead we depend
on function and basic block address transformations to update the info
as a final post-processing step.

For HHVM in non-relocation mode the peak memory went down from 55GB to 35GB. Processing time went from over 6 minutes to under 5 minutes.

(cherry picked from FBD5113431)
2017-05-16 09:27:34 -07:00
Rafael Auler 511a1c78b2 [BOLT] Add dataflow infrastructure
Summary:
This diff introduces a common infrastructure for performing
dataflow analyses in BinaryFunctions as well as a few analyses that are
useful in a variety of scenarios. The largest user of this
infrastructure so far is shrink wrapping, which will be added in a
separate diff.

(cherry picked from FBD4983671)
2017-05-01 16:51:27 -07:00
Maksim Panchenko 457b7f14b9 [BOLT] Fix debug info for input with continuous range.
Summary:
When we see a compilation unit with continuous range on input,
it has two attributes: DW_AT_low_pc and DW_AT_high_pc. We convert the
range to a non-continuous one and change the attributes to
DW_AT_ranges and DW_AT_producer. However, gdb seems to expect
every compilation unit to have a base address specified via
DW_AT_low_pc, even when its value is always 0. Otherwise gdb will
not show proper debug info for such modules.

With this diff we produce DW_AT_ranges followed by DW_AT_low_pc.
The problem is that the first attribute takes DW_FORM_sec_offset
which is exactly 4 bytes, and in many cases we are left with
12 bytes to fill in. We used to fill this space with DW_AT_producer,
which took an arbitrary-length field. For DW_AT_low_pc we can
use a trick of using DW_FORM_udata (unsigned ULEB128 encoded
integer) which can take up to 12 bytes, even when the value is 0.

(cherry picked from FBD5109798)
2017-05-22 17:17:04 -07:00
Bill Nell 4806b13835 [BOLT] Add jump table support to ICP
Summary:
Add jump table support to ICP.  The optimization is basically the same
as ICP for tail calls.  The big difference is that the profiling data
comes from the jump table and the targets are local symbols rather than
global.

I've removed an instruction from ICP for tail calls.  The code used to
have a conditional jump to a block with a direct jump to the target, i.e.

  B1: cmp foo,(%rax)
      jne B3
  B2: jmp foo
  B3: ...

this code is now:

  B1: cmp foo,(%rax)
      je  foo
  B2: ...

The other changes in this diff:
- Move ICP + new jump table support to separate file in Passes.
- Improve the CFG validation to handle jump tables.
- Fix the double jump peephole so that the successor of the modified
  block is updated properly.  Also make sure that any existing branches
  in the block are modified to properly reflect the new CFG.
- Add an invocation of the double jump peephole to SCTC.  This allows
  us to remove a call to peepholes/UCE occurring after fixBranches() in
  the pass manager.
- Miscellaneous cleanups to BOLT output.

(cherry picked from FBD4727757)
2017-03-08 19:58:33 -08:00
Maksim Panchenko c789d5137b [BOLT] Add option to keep/generate .debug_aranges.
Summary:
GOLD linker removes .debug_aranges while generating .gdb_index.
Some tools however rely on the presence of this section.
Add an option to generate .debug_aranges if it was removed,
or keep it in the file if it was present.

Generally speaking .debug_aranges duplicates information present
in .gdb_index addresses table.

(cherry picked from FBD5084808)
2017-05-17 18:35:00 -07:00
Maksim Panchenko 69b586326c [BOLT] Support adding new non-allocatable sections.
Summary:
We had the ability to add allocatable sections before. This diff
expands this capability to non-allocatable sections.

(cherry picked from FBD5082018)
2017-05-16 17:29:31 -07:00
Maksim Panchenko 3adb52d80e [BOLT] Update .gdb_index section.
Summary: Update address table in .gdb_index section.

(cherry picked from FBD5068255)
2017-05-15 15:21:59 -07:00
Maksim Panchenko 3f42fdf7da [BOLT] Update function address and size in relocation mode.
Summary:
Set function addresses after code emission but before we update
debug info and symbol table entries.

(cherry picked from FBD5029609)
2017-05-08 22:51:36 -07:00
Maksim Panchenko 13c89e6ef1 [BOLT] Fix branch data for __builtin_unreachable().
Summary:
When we have a conditional branch past the end of function (a result
of a call to__builtin_unreachable()), we replace the branch with nop,
but keep branch information for validation purposes. If that branch
has a recorded profile we mistakenly create an additional successor
to a containing basic block (a 3rd successor).

Instead of adding the branch to FTBranches list we should be adding
to IgnoredBranches.

(cherry picked from FBD4912840)
2017-04-18 23:32:11 -07:00
Maksim Panchenko 075f076503 [BOLT] Don't abort on processing binaries with .gdb_index section
Summary:
While writing non-allocatable sections we had an assumption that the
size of such section is congruent to the alignment, as typically
such sections are a collections of fixed-sized elements. .gdb_index
breaks this assumption.

This diff removes the assertion that was triggered by a presence of
.gdb_index section, and makes sure that we insert a padding if we are
appending to a section with a size not congruent to section alignment.

(cherry picked from FBD4844553)
2017-04-06 10:49:59 -07:00
Bill Nell c7cccacc4f [BOLT] Enable SCTC by default.
(cherry picked from FBD4837849)
2017-04-05 13:23:58 -07:00
Maksim Panchenko 34c8a7c21b [BOLT] Relocation support for non-allocatable sections.
Summary:
Relocations can be created for non-allocatable (aka Note) sections.

To start using this for debug info, the emission has to be moved
earlier in the pipeline for relocation processing to kick in.

(cherry picked from FBD4835204)
2017-04-05 09:29:24 -07:00
Maksim Panchenko a99005397f [BOLT] Fix branch count in removeDuplicateConditionalSuccessor().
Summary:
When we merge the original branch counts we have to make sure
both of them have a profile. Otherwise set the count to COUNT_NO_PROFILE.

The misprediction count should be 0.

(cherry picked from FBD4837774)
2017-04-05 13:00:20 -07:00
Bill Nell 6c5c65e3a3 [BOLT] Fix double jump peephole, remove useless conditional branches.
Summary:
I split some of this out from the jumptable diff since it fixes the
double jump peephole.

I've changed the pass manager so that UCE and peepholes are not called
after SCTC.  I've incorporated a call to the double jump fixer to SCTC
since it is needed to fix things up afterwards.

While working on fixing the double jump peephole I discovered a few
useless conditional branches that could be removed as well.  I highly
doubt that removing them will improve perf at all but it does seem
odd to leave in useless conditional branches.

There are also some minor logging improvements.

(cherry picked from FBD4751875)
2017-03-20 22:44:25 -07:00
Maksim Panchenko f7d32f7e7d [BOLT] Detect and reject binaries built for coverage.
Summary: Don't attempt to optimize binaries built with coverage support.

(cherry picked from FBD4810330)
2017-03-31 07:51:30 -07:00
Maksim Panchenko c166a8c1a7 [BOLT] Fix debug info update for inlining.
Summary:
When inlining, if a callee has debug info and a caller does not
(i.e. a containing compilation unit was compiled without "-g"), we try
to update a nonexistent compilation unit. Instead we should skip
updating debug info in such cases.

Minor refactoring of line number emitting code.

(cherry picked from FBD4823982)
2017-04-03 16:24:26 -07:00
Maksim Panchenko 0bde796e50 [BOLT] Organize options in categories for pretty printing (near NFC).
Summary:
Each BOLT-specific option now belongs to BoltCategory or BoltOptCategory.

Use alphabetical order for options in source code (does not affect
output).

The result is a cleaner output of "llvm-bolt -help" which does not
include any unrelated llvm options and is close to the following:

.....

BOLT generic options:

  -data=<string>                                       - <data file>
  -dyno-stats                                          - print execution info based on profile
  -hot-text                                            - hot text symbols support (relocation mode)
  -o=<string>                                          - <output file>
  -relocs                                              - relocation mode - use relocations to move functions in the binary
  -update-debug-sections                               - update DWARF debug sections of the executable
  -use-gnu-stack                                       - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy)
  -use-old-text                                        - re-use space in old .text if possible (relocation mode)
  -v=<uint>                                            - set verbosity level for diagnostic output

BOLT optimization options:

  -align-blocks                                        - try to align BBs inserting nops
  -align-functions=<uint>                              - align functions at a given value (relocation mode)
  -align-functions-max-bytes=<uint>                    - maximum number of bytes to use to align functions
  -boost-macroops                                      - try to boost macro-op fusions by avoiding the cache-line boundary
  -eliminate-unreachable                               - eliminate unreachable code
  -frame-opt                                           - optimize stack frame accesses
  ......

(cherry picked from FBD4793684)
2017-03-28 14:40:20 -07:00
Maksim Panchenko d5a0264a9e [BOLT] Issue error in relocs mode if input is lacking relocations.
Summary:
If we specify "-relocs" flag and an input has no relocations we
proceed with assumptions that relocations were there and break the
binary.

Detect the condition above, and reject the input.

(cherry picked from FBD4761239)
2017-03-22 22:05:50 -07:00
Rafael Auler ad81bd6779 Change dynostats dynamic instruction count policy
Summary:
Also add LOAD/STORE counters.

(cherry picked from FBD4732284)
2017-03-17 10:32:56 -07:00
Bill Nell b1ef186ca9 [BOLT] Don't allow non-symbol targets in ICP
Summary:
ICP was letting through call targets that weren't symbols.  This diff
filters out the non-symbol targets before running ICP.

(cherry picked from FBD4735358)
2017-03-18 11:55:45 -07:00
Maksim Panchenko e6f96de4d0 [BOLT] Add option to print only specific functions.
Summary:
Add option '-print-only=func1,func2,...' to print only functions
of interest. The rest of the functions are still processed and
optimized (e.g. inlined), but only the ones on the list are printed.

(cherry picked from FBD4734610)
2017-03-17 19:05:11 -07:00
Maksim Panchenko 6cfd7ac2d5 [BOLT] Do not overwrite starting address in non-relocation mode.
Summary:
In non-relocation mode we shouldn't attemtp to change ELF
entry point.

What made matters worse - it broke '-max-funcs=' and '-funcs=' options
since an entry function more often than not was excluded from the list
of processed functions, and we were setting entry point to 0.

(cherry picked from FBD4720044)
2017-03-15 19:31:20 -07:00
Maksim Panchenko 559a57a181 [BOLT] Improve dynostats output.
Summary:
Reduce verbosity of dynostats to make them more readable.

  * Don't print "before" dynostats twice.
  * Detect if dynostats have changed after optimization and print
    before/after only if at least one metric have changed. Otherwise
    just print dynostats once and indicate "no change".
  * If any given metric hasn't changed, then print the difference as
    "(=)" as opposed to (+0.0%).

(cherry picked from FBD4705920)
2017-03-14 09:03:23 -07:00
Maksim Panchenko 351af0c895 [BOLT] Do not process empty functions.
Summary:
While running on a recent test binary BOLT failed with an error. We were
trying to process '__hot_end' (which is not really a function), and asserted
that it had no basic blocks.

This diff marks functions with empty basic blocks list as non-simple since
there's no need to process them.

(cherry picked from FBD4696517)
2017-03-12 11:30:05 -07:00
Bill Nell 2e5c2e689f Fix hfsort callgraph stats, add hfsort test.
Summary:
The stats for call sites that are not included in the call graph were broken.
The intention is to count the total number of call sites vs. the number of call sites that are ignored because they have targets that are not BinaryFunctions.

Also add a new test for hfsort.

(cherry picked from FBD4668631)
2017-03-07 11:45:07 -08:00
Maksim Panchenko f4825ea417 [BOLT] Fix gcc5 build.
Summary: A <numeric> include is required for gcc5 build.

(cherry picked from FBD4671953)
2017-03-07 18:09:09 -08:00
Maksim Panchenko 98737b34bb [BOLT] Fix verbose output.
Summary:
Inadvertently, output of BOLT became way too verbose. Discovered while
building HHVM on master.

(cherry picked from FBD4669881)
2017-03-07 14:22:15 -08:00
Bill Nell fed0980139 [BOLT] Update tests
Summary:
Fix validateCFG to handle BBs that were generated from code that used
_builtin_unreachable().
Add -verify-cfg option to run CFG validation after every optimization
pass.

(cherry picked from FBD4641174)
2017-02-27 21:44:38 -08:00
Maksim Panchenko 0acba2bcf0 [BOLT] Detect unmarked data in text.
Summary:
Sometimes a code written in assembly will have unmarked data (such as
constants) embedded into text.

Typically such data falls into a "padding" address space of a function.

This diffs detects such references, and adjusts the padding space to
prevent overwriting of code in data.

Note that in relocation mode we prefer to overwrite the original code
(-use-old-text) and thus cannot simply ignore data in text.

(cherry picked from FBD4662780)
2017-02-21 14:18:09 -08:00
Maksim Panchenko f241e252fc [BOLT] Detect and handle __builtin_unreachable().
Summary:
Calls to __builtin_unreachable() can result in a inconsistent CFG.
It was possible for basic block to end with a conditional branche
and have a single successor. Or there could exist non-terminated
basic block without successors.

We also often treated conditional jumps with destination past the end
of a function as conditional tail calls. This can be prevented
reliably at least when the byte past the end of the function does
not belong to the next function.

This diff includes several changes:
  * At disassembly stage jumps past the end of a function are converted
    into 'nops'. This is done only for cases when we can guarantee that
    the jump is not a tail call. Conversion to nop is required since the
    instruction could be referenced either by exception handling
    tables and/or debug info. Nops are later removed.
  * In CFG insert 'ret' into non-terminated basic blocks without
    successors (this almost never happens).
  * Conditional jumps at the end of the function are removed from
    CFG. The block will still have a single successor.
  * Cases where a destination of a jump instruction is the start
    of the next function, are still conservatively handled as
    (conditional) tail calls.

(cherry picked from FBD4655046)
2017-03-03 11:35:41 -08:00
Maksim Panchenko 6dc2351505 [BOLT] New CFI handling policy.
Summary:
The new interface for handling Call Frame Information:

  * CFI state at any point in a function (in CFG state) is defined by
    CFI state at basic block entry and CFI instructions inside the
    block. The state is independent of basic blocks layout order
    (this is implied by CFG state but wasn't always true in the past).
  * Use BinaryBasicBlock::getCFIStateAtInstr(const MCInst *Inst) to
    get CFI state at any given instruction in the program.
  * No need to call fixCFIState() after any given pass. fixCFIState()
    is called only once during function finalization, and any function
    transformations after that point are prohibited.
  * When introducing new basic blocks, make sure CFI state at entry
    is set correctly and matches CFI instructions in the basic block
    (if any).
  * When splitting basic blocks, use getCFIStateAtInstr() to get
    a state at the split point, and set the new basic block's CFI
    state to this value.

Introduce CFG_Finalized state to indicate that no further optimizations
are allowed on the function. This state is reached after we have synced
CFI instructions and updated EH info.

Rename "-print-after-fixup" option to "-print-finalized".

This diffs fixes CFI for cases when we split conditional tail calls,
and for indirect call promotion optimization.

(cherry picked from FBD4629307)
2017-02-24 21:59:33 -08:00
Rafael Auler 965a373dc4 Fix warnings when compiling with clang (NFC)
Summary:
Fix inconsistent override keyword usages and initializes a
missing field of a Relocation object when using braced initializers.

(cherry picked from FBD4622856)
2017-02-27 13:09:27 -08:00
Maksim Panchenko 2029458f34 [BOLT] Strip 'repz' prefix from 'repz retq'.
Summary:
Add pass to strip 'repz' prefix from 'repz retq' sequence. The prefix
is not used in Intel CPUs afaik. The pass is on by default.

(cherry picked from FBD4610329)
2017-02-23 18:09:10 -08:00
Maksim Panchenko 88a461014b [BOLT] Don't set code skew in relocations mode.
Summary:
We use code skew in non-relocation mode since functions have fixed
addresses, and internal alignment has to be adjusted wrt the skew.
However in relocation mode it interferes with effective code
alignment, and has to be disabled. I missed it when was re-basing
the relocation diff.

(cherry picked from FBD4599670)
2017-02-22 11:29:52 -08:00
Maksim Panchenko d3e33b6edc [BOLT] Fix -jump-tables=basic in relocation mode.
Summary:
In a prev diff I added an option to update jump tables in-place (on by default)
and accidentally broke the default handling of jump tables in relocation
mode. The update should be happening semi-automatically, but because
we ignore relocations for jump tables it wasn't happening (derp).

Since we mostly use '-jump-tables=move' this hasn't been noticed for
some time.

This diff gets rid of IgnoredRelocations and removes relocations
from a relocation set when they are no longer needed. If relocations
are created later for jump tables they are no longer ignored.

(cherry picked from FBD4595159)
2017-02-21 16:15:15 -08:00
Maksim Panchenko 88244a10bb [BOLT] Move BOLT passes under Passes subdirectory (NFC).
Summary:
Move passes under Passes subdirectory.

Move inlining passes under Passes/Inliner.*

(cherry picked from FBD4575832)
2017-02-16 14:57:57 -08:00
Maksim Panchenko f06a1455ea [BOLT] Add support for *GOTPCRELX relocation type.
Summary:
gcc5 can generate new types of relocations that give linker a freedom
to substitute instructions. These relocations are PC-relative, and
since we manually process such relocations they don't present
much of a problem.

Additionally, detect non-pc-relative access from code into a middle of
a function. Occasionally I've seen such code, but don't know exactly
how to trigger its generation. Just issue a warning for now.

(cherry picked from FBD4566473)
2017-02-14 22:55:10 -08:00
Maksim Panchenko 82965b963f [BOLT] Emit short tail calls in relocation mode.
Summary:
To minimize size of the output code we should emit tail calls
that are as short as possible. For this we have to convert a synthetic
TAILJMPd into JMP_1 instruction. This should be one of the last passes
as most of analysis passes could break since tail calls will no longer
be marked as such.

The total size of the code is smaller, but not by much - hot text was
reduced by 192 bytes.

(cherry picked from FBD4557804)
2017-02-13 23:05:12 -08:00
Maksim Panchenko 734a7a5437 [BOLT] Skip disassembly of padding at function end.
Summary:
Some functions coming from assembly may not have been marked
with size. We assume the size to include all bytes up to
the next function/object in the file. As a result,
function body will include any padding inserted by the linker.
If linker inserts 0-value bytes this could be misinterpreted
as invalid instruction and BOLT will bail out on such functions
in non-relocation mode, and give up on a binary in relocation
mode.

This diff detects zero-padding, ignores it, and continues processing
as normal.

(cherry picked from FBD4528893)
2017-02-08 09:14:10 -08:00
Maksim Panchenko 6b0b5bbae7 [BOLT] Reject sanitized binaries.
Summary:
Whenever input binary is suspected to have been sanitized we print an error
message and exit. I've checked that "__asan_init*" symbol
presence is the most conservative way to detect "sanitization".

(cherry picked from FBD4525478)
2017-02-07 15:56:00 -08:00
Maksim Panchenko c89821cee3 [BOLT] Detect and prevent re-optimization attempts.
Summary:
Whenever we try to re-optimize a binary with BOLT we should
issue an error and exit.

(cherry picked from FBD4525228)
2017-02-07 15:31:14 -08:00
Maksim Panchenko e212805ea6 [BOLT] Update section names in output file.
Summary:
Re-write section header string table to reflect new names
given to sections. Old sections get ".bolt.org" prefix.

E.g. when we write ".eh_frame" section, we keep the old copy
but rename it to ".bolt.org.eh_frame".

Note: the new code section is named ".bolt.text" - it contains split
function bodies, while original ".text" name is left unchanged.

(cherry picked from FBD4524935)
2017-02-07 12:20:46 -08:00
Bill Nell d74997c3cc Indirect call promotion optimization.
Summary:
Perform indirect call promotion optimization in BOLT.

The code scans the instructions during CFG creation for all
indirect calls.  Right now indirect tail calls are not handled
since the functions are marked not simple.  The offsets of the
indirect calls are stored for later use by the ICP pass.

The indirect call promotion pass visits each indirect call and
examines the BranchData for each.  If the most frequent targets
from that callsite exceed the specified threshold (default 90%),
the call is promoted.  Otherwise, it is ignored.  By default,
only one target is considered at each callsite.

When an candiate callsite is processed, we modify the callsite
to test for the most common call targets before calling through
the original generic call mechanism.

The CFG and layout are modified by ICP.

A few new command line options have been added:
-indirect-call-promotion
-indirect-call-promotion-threshold=<percentage>
-indirect-call-promotion-topn=<int>

The threshold is the minimum frequency of a call target needed
before ICP is triggered.

The topn option controls the number of targets to consider for
each callsite, e.g. ICP is triggered if topn=2 and the total
requency of the top two call targets exceeds the threshold.

Example of ICP:

C++ code:

  int B_count = 0;
  int C_count = 0;

  struct A { virtual void foo() = 0; }
  struct B : public A { virtual void foo() { ++B_count; }; };
  struct C : public A { virtual void foo() { ++C_count; }; };

  A* a = ...
  a->foo();
  ...

original:
  400863:	49 8b 07             	mov    (%r15),%rax
  400866:	4c 89 ff             	mov    %r15,%rdi
  400869:	ff 10                	callq  *(%rax)
  40086b:	41 83 e6 01          	and    $0x1,%r14d
  40086f:	4d 89 e6             	mov    %r12,%r14
  400872:	4c 0f 44 f5          	cmove  %rbp,%r14
  400876:	4c 89 f7             	mov    %r14,%rdi
  ...

after ICP:
  40085e:	49 8b 07             	mov    (%r15),%rax
  400861:	4c 89 ff             	mov    %r15,%rdi
  400864:	49 ba e0 0b 40 00 00 	movabs $0x400be0,%r10
  40086b:	00 00 00
  40086e:	4c 3b 10             	cmp    (%rax),%r10
  400871:	75 29                	jne    40089c <main+0x9c>
  400873:	41 ff d2             	callq  *%r10
  400876:	41 83 e6 01          	and    $0x1,%r14d
  40087a:	4d 89 e6             	mov    %r12,%r14
  40087d:	4c 0f 44 f5          	cmove  %rbp,%r14
  400881:	4c 89 f7             	mov    %r14,%rdi
  ...

  40089c:	ff 10                	callq  *(%rax)
  40089e:	eb d6                	jmp    400876 <main+0x76>

(cherry picked from FBD3612218)
2016-09-07 18:59:23 -07:00
Maksim Panchenko 6ff1795d96 [BOLT] Support overwriting jump tables in-place.
Summary:
Add an option to overwrite jump tables without moving and make it a
default:

  -jump-tables   - jump tables support (default=basic)
    =none        -   do not optimize functions with jump tables
    =basic       -   optimize functions with jump tables
    =move        -   move jump tables to a separate section
    =split       - split jump tables section into hot and cold based on
                   function execution frequency
    =aggressive  - aggressively split jump tables section based on usage of
                   the tables

(cherry picked from FBD4448499)
2017-01-17 15:49:59 -08:00
Rafael Auler 6dfd16cb4c Cover RSP-indexed accesses in frame optimization
Summary:
Add a new dataflow analysis to recover the value of RSP at a
given point of the program. This value is expressed as an offset from
the CFA. Use this information to detect redundant load in memory
accesses performed via RSP as well, not only RBP as done previously.
Bail when RSP value (as an offset of the CFA) can't be reliably
determined with a simple dataflow analysis.

(cherry picked from FBD4372261)
2016-12-28 17:09:52 -08:00
Maksim Panchenko 503c741d43 [BOLT] Report stale functions' percentage wrt all profiled functions.
Summary:
Report stale functions percentage with respect to all profiled
functions instead of all simple functions in the binary.
The new reporting format should make it more apparent if the
profile is out-of-date. Compare:

  BOLT-INFO: 341 (16.7% of all profiled) functions have invalid (possibly
stale) profile.

vs old:

  BOLT-INFO: 341 (0.3%)  functions have invalid (possibly stale) profile.

(cherry picked from FBD4451746)
2017-01-23 13:08:40 -08:00
Maksim Panchenko 19859377f8 [BOLT] Fix debug info update for zero-length ranges.
Summary:
Due to a clowntown on my part we were generating wrong ranges
when an empty range was seen on input. We were basically expanding
the range to include all basic blocks following such range and setting
wrong sizes at the same time.

Add "-dump-cu" option to llvm-dwarfdump that allows to look at debug
info of a single compile unit only. Saves time if we are only interested
in a subset of information.

(cherry picked from FBD4430989)
2017-01-18 10:09:54 -08:00
Maksim Panchenko 0894905373 [ICF] Don't re-fold functions in non-relocation mode.
Summary:
In-non relocation mode, when we run ICF the second time,
we fold the same functions again since they were not
removed from the function set. This diff marks them as
folded and ignores them during ICF optimization. Note
that we still want to optimize such functions since they
are potentially called from the code not covered by BOLT
in non-relocation mode.

Folded functions are also excluded from dyno stats with
this diff

Also print the number of times folded functions were called.
When 2 functions -  f1() and f2() are folded, that number
would be min(call_frequency(f1), call_frequency(f2)).

(cherry picked from FBD4399993)
2017-01-10 11:20:56 -08:00
Maksim Panchenko bc8a456309 ICF improvements.
Summary:
Re-worked the way ICF operates. The pass now checks for more than just
call instructions, but also for all references including function
pointers. Jump tables are handled too.

(cherry picked from FBD4372491)
2016-12-21 17:13:56 -08:00
Maksim Panchenko 55fc5417f8 Relocations support for BOLT.
Summary: Read relocation from linker and relocate all functions.

(cherry picked from FBD4223901)
2016-09-27 19:09:38 -07:00