Commit Graph

1011 Commits

Author SHA1 Message Date
Alexander Shaposhnikov 1b258b8908 Refactor syscall wrappers for OSX
Summary: Refactor syscall wrappers for OSX.

(cherry picked from FBD25084642)
2020-11-19 14:56:45 -08:00
Amir Ayupov f9d00d418b [BOLT] Handle insertion of updated CFI at the first basic block
Summary:
Fix corner case of insertion of updated CFI with unset `PrevBB`.
Handle it in the same way as inserting past hot-cold split point.

(cherry picked from FBD24943911)
2020-11-17 18:40:19 -08:00
Alexander Shaposhnikov 1cf23e5ee8 Link the instrumentation runtime on OSX
Summary: Link the instrumentation runtime on OSX.

(cherry picked from FBD24390019)
2020-11-17 13:57:29 -08:00
Maksim Panchenko 7eaf63a118 [BOLT] Fix data race while running split functions pass
Summary:
In BinaryContext::calculateEmittedSize(), after the temporary code
emission, we have to perform a cleanup and mark all symbols used
during the emission as undefined and unregistered (so that we can emit
them again later). The cleanup is happening even for symbols that were
referenced and not defined by emitted code.

If all emitted symbols are local, there is no risk that one thread will
define a symbol while some other thread will undefine it in its cleanup
code. Such behavior is expected as local symbols can only be referenced
within the containing function and each function is processed in one
thread. However, secondary entry points have associated global symbols
and if we emit them, then it is possible for a thread to undefine
a symbol while the other thread had defined it and was in the process of
emitting the fragment with it. In such case, a data race may happen and
the thread that contains the definition of the symbol may define it
twice causing a redefinition error.

To avoid the data race, we skip the emission of secondary entry global
symbols when emitting code used only for the size estimation.

(cherry picked from FBD24986007)
2020-11-16 14:34:51 -08:00
Sergey Pupyrev 1e9b733008 a new version of hfsort+
Summary:
A faster and better version of function reordering:
- fixed a bug when some computed probabilities were negative;
- changed an O(n^2) loop to a priority queue to find a candidate of chains to merge

(cherry picked from FBD24571208)
2020-11-14 13:18:58 -08:00
Amir Ayupov 6401af89c7 [BOLT] Support jump tables in split fragments with entries pointing back to parent functions
Summary:
Support jump tables belonging to split fragments with entries
pointing back to parent functions.
While skipping such families of functions, make sure to use the
topmost fragment to ignore its fragments.

(cherry picked from FBD24907438)
2020-11-12 11:54:51 -08:00
Amir Ayupov e8234b3b98 [BOLT] Add invalid offset for a JT entry pointing to a fragment
Summary:
In a jump table identification, register an invalid offset for jump table
entries pointing to function fragments.
These invalid offsets have no effect other than padding the jump
table size, calculated as `max(OffsetEntries, Entries)`.
Correct jump table size is required in strict mode (enabled by default
in aggregation mode by `perf2bolt`) in accounting of all PC-relative
relocations in data.
Functions containing these jump tables with invalid offsets are
marked to be ignored immediately afterwards in
`populateJumpTables`.

(cherry picked from FBD24897464)
2020-11-12 11:54:44 -08:00
Amir Ayupov 157129b751 [BOLT] Debug logging in analyzeJumpTable
Summary:
Added debug logging in/around `analyzeJumpTable`:
- Dump jump table entries as they are being processed:
```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2)
  * Checking 0x428ff40 -> OK: real entry
  * Checking 0x428ff44 -> OK: real entry
  * Checking 0x428ff48 -> OK: real entry
  * Checking 0x428ff4c -> OK: real entry
  * Checking 0x428ff50 -> OK: real entry
  * Checking 0x428ff54 -> OK: address in split fragment
  * Checking 0x428ff58 -> OK: address in split fragment
  * Checking 0x428ff5c -> OK: address in split fragment
  * Checking 0x428ff60 -> OK: address in split fragment
  * Checking 0x428ff64 -> OK: real entry
  * Checking 0x428ff68 -> OK: real entry
  * Checking 0x428ff6c -> OK: real entry
  * Checking 0x428ff70 -> OK: real entry
BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2)
  * Checking 0x428ff74 -> OK: real entry
  ...
```
- Dump skipped functions:
```
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2)
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2)
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2)
```
- Dump values of unclaimed PC-relative relocations in data.

(cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
Amir Ayupov c0cb550536 Minimize X86/shrinkwrapping-critedge test case
Summary: Minimized test case while preserving the CFG subgraph with an issue

(cherry picked from FBD24871063)
2020-11-10 21:22:57 -08:00
Amir Ayupov e54d389799 [BOLT] Disable DynoStats printing after SCTC
Summary:
Introduce new BinaryFunction flag `IsCanonicalCFG`, which gets
unset by SCTC pass. Make DynoStats collection conditional on this
new flag.
SCTC leaves CFG in a state where branch counters of BBs with tail
calls/conditional tail calls are not available (except via annotations,
which get stripped by `lower-annotations`). Without branch
counters, DynoStats are invalid.

(cherry picked from FBD24558050)
2020-11-10 10:51:23 -08:00
Amir Ayupov c36b71686c Improve cold fragment name matching
Summary:
Fix cold fragment name matching regex by replacing existing
regexes `.*\.cold\..*` and  `.*\.cold`
and combining them into `.*\.cold(\.\d)?`,
applied to restored name (with BOLT-added suffixes stripped)

This allows matching names like "execute_stack_op.cold/1", which
previously weren't recognized.

(cherry picked from FBD24804880)
2020-11-09 12:38:51 -08:00
Amir Ayupov f86a78a4e7 Lost in rebase: call registerFragment with a reference to TargetBF
Summary: Fixes broken build due to a lost dereferencing

(cherry picked from FBD24799948)
2020-11-06 12:22:22 -08:00
Amir Ayupov 2b09d672ce Conservatively handle jump tables in split functions
Summary:
- Allow jump table entries to point to locations inside the function and its fragments.
Reasoning behind this is that jump table identification has the logic of stopping at entry which belongs to a function different from the one originally referencing jump table. This assumption is invalid for jump tables with entries pointing to both parent function and cold fragments, leading to "unclaimed PC-relative relocations" assertion.

- Add fragment identification heuristic based on function name regex and contiguous jump table entries.
Currently, parent-to-fragment relationship is set up based on interprocedural references – direct references from the parent function. These references don't include references through jump table.
Additionally, some fragments are only reachable through jump table. In that case, in order to fully consume jump table, add parent-to-fragment relationship during `analyzeJumpTable` using the following heuristics:
  1. Fragment is identified as such based on name (contains `.cold.` part), but
  2. Parent function is not set – no direct interprocedural references to that fragment, and
  3. Fragment has the name of the form <parent>.cold(.\d+)

* For split functions with jump table entries spanning parent and fragments, mark parent and all fragments as ignored.

(cherry picked from FBD24456904)
2020-11-06 11:19:03 -08:00
Amir Ayupov dc48354f71 processInterproceduralReferences: record references to cold fragments as entry points
Summary:
For interprocedural references to fragments, record them as
fragment entry points. Not registering these entry points leads to
UCE removing the blocks and "Undefined temporary symbol"
assertion.

(cherry picked from FBD24511281)
2020-11-06 10:57:47 -08:00
Amir Ayupov 5452287710 Extract BinaryContext::registerFragment
Summary: registerFragment to be reused in adding fragments reachable only through jump tables.

(cherry picked from FBD24656651)
2020-11-06 10:27:33 -08:00
Vladislav Khmelevsky 58460460d9 [BOLT][PR] Handle TLS relocations on AArch64
Summary:
Some of the TLS relocatios like R_AARCH64_TLSDESC_ADR_PAGE21 must be
handled by bolt and should not be skipped by the removed condition. Some
of the TLS relocations like R_AARCH64_TLS_TPREL64 could really be skipped
here, but AFAIU this condition was added as part of BOLT its self optimization, so
to prevent future problems here my suggestion is not to add another condition
like "isTLS(RType) && isTLSRelocatable(RType)", but just remove it since
absense of this condition should not broke any other TLS relocation.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/103
GitHub Author: Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com>

(cherry picked from FBD24745928)
2020-11-04 16:45:58 -08:00
Maksim Panchenko 4f4239ceba [BOLT] Fix C++ exceptions for shared objects
Summary:
Fix several issues to make C++ exceptions work in shared objects:
  * Set MCObjectFileInfo PIC type based on the input binary type.
  * Support indirect (DW_EH_PE_indirect) encoding while writing
    exception Type Table.
  * Use different LPStart value and landing pad encoding for .so's.
  * Disable splitting of exception-handling code for .so's because of
    the new encoding.

(cherry picked from FBD24698765)
2020-11-04 11:44:02 -08:00
Rafael Auler c1bb4dcb2b [BOLT] Remove threaded EliminateUnreachableBlock version
Summary:
EliminateUnreachableBlocks has a data race because it depends
on BinaryContext::computeCodeSize. computeCodeSize supports independent
Emitters, enabling a lock-free execution. Unfortunately, that is almost
as expensive as the lock. Removing the boilerplate code for
parallellization of this pass turned out to be the best alternative: no
races and slightly better execution time for HHVM.

(cherry picked from FBD24716250)
2020-11-03 11:28:59 -08:00
Rafael Auler 37921b489a [BOLT] Please sanitizers
Summary:
In BinaryContext, we had StringRef holding a reference to
an r-value std::string. This triggers clang's address sanitizer
warnings. In MCPlusBuilder we had a left shift overflowing a type,
which is undefined behavior. Similarly, in CallGraph, we had a hash
function shifting a negative value, which is also UB. The last two
triggers the UB sanitizer.

(cherry picked from FBD24661045)
2020-10-30 15:11:52 -07:00
Rafael Auler 3e78082c1b [DOCS] Add instrumentation instructions to README
Summary: Add basic instructions on how to instrument a binary.

(cherry picked from FBD24660183)
2020-10-30 14:45:30 -07:00
Rafael Auler eb12d719ac [BOLT] Fix no-asserts build
Summary: Only use dump() method under DEBUG() macro.

(cherry picked from FBD24666481)
2020-10-30 19:59:07 -07:00
Maksim Panchenko 6b185cccf4 [BOLT] Always keep dynamic symbols defined
Summary:
Some symbols in .dynsym will be erroneously marked as belonging to a
non-allocatable section that BOLT can remove. In that case, keep the
original invalid index for such symbols instead of setting the UNDEF
index.

(cherry picked from FBD24488677)
2020-10-22 16:35:29 -07:00
Amir Ayupov 5f2f96c4c9 Add pass number to dot dump filename
Summary:
Change .dot dumps filename format from
  <function>-<passname>.dot
to
  <function>-<passidx>_<passname>.dot
This change helps navigate dumps by making the pass order explicit.
Example:
  execute_stack_op.cold.6-1(*2)-00_build-cfg.dot
  execute_stack_op.cold.6-1(*2)-01_validate-internal-calls.dot
  execute_stack_op.cold.6-1(*2)-02_strip-rep-ret.dot
  ...

(cherry picked from FBD24452903)
2020-10-21 17:08:32 -07:00
Maksim Panchenko d91add0bfe [BOLT] Fix PatchEntries pass
Summary:
While refactoring the pass, I removed the important transactional
property of the patching process. Restore it.

(cherry picked from FBD24440214)
2020-10-21 12:31:09 -07:00
Maksim Panchenko d6d88399fc [BOLT] Enable lite mode by default with relocations
Summary:
When optimizing input with relocations, make it faster and less
memory-hungry with lite mode.

(cherry picked from FBD24374241)
2020-10-17 15:09:06 -07:00
Rafael Auler e4396c41da [BOLT] Ignore __hot_start, __hot_end from input
Summary:
When -hot-text is on, do not read __hot_start and __hot_end
from input (inserted by a linker script with the intent of ordering
functions). This can confuse BOLT into creating a function with this
name depending on which address the symbol lands and we will assert
when trying to emit our own __hot_start/__hot_end with symbol
redefinition.

(cherry picked from FBD24366636)
2020-10-17 00:50:27 -07:00
Alexander Shaposhnikov 6133d2598b Inject a hook into the entry point on MachO
Summary:
This diff is a preparation for loading the runtime on MachO.
The proposed schema is the following:

1/  Function "bolt_instr_setup" is placed into the predefined section "setup" (in the final setting this function will be coming from the instrumentation runtime but we still will be placing it into this section).

2/ In the instrumentation pass we create a symbol "bolt_instr_setup" and inject the corresponding call into the beginning of the function representing the entry point of the binary.

(cherry picked from FBD24329530)
2020-10-15 01:39:35 -07:00
Maksim Panchenko f15532c2aa [BOLT][DWARF] Streamline processing of DWARF unit DIEs
Summary:
Do not store processed DWARF DIEs, but instead process them while
reading one at a time.

Reduces memory consumption when updating debug info by 10%-25%.

(cherry picked from FBD24327029)
2020-10-16 00:11:24 -07:00
Alexander Shaposhnikov bbd9d610fe Add first bits to cross-compile the runtime for OSX
Summary: Add first bits to cross-compile the runtime for OSX.

(cherry picked from FBD24330977)
2020-10-15 03:51:56 -07:00
Rafael Auler 0b6df06e04 [BOLT] In shrinkwrap, do not split prefix/instr
Summary:
When placing restore instructions in the shrink wrapping pass,
we typically put them right before the last instruction of a block at
the dominance frontier. If this instruction happened to have a prefix,
because the MC lib separates prefix into separate MCInsts, we would
accidentally put a load between a prefix and another instruction. Fix
this.

(cherry picked from FBD24295324)
2020-10-14 12:40:33 -07:00
Maksim Panchenko 53bd88c7fe [BOLT] Refactor reading of debug line info
Summary:
Match BinaryFunction to a DWARFUnit based on the unit's address ranges
skipping the parsing of DIEs.

(cherry picked from FBD24269325)
2020-10-12 21:04:42 -07:00
Maksim Panchenko 9f15b9f3c2 [BOLT] Fix debug line info in lite relocation mode
Summary: Emit line info for functions that were not emitted in relocation mode.

(cherry picked from FBD24267650)
2020-10-12 20:16:59 -07:00
Alexander Shaposhnikov 473a6199ab Add first bits to support emitting instrumented code on MachO
Summary:
Add first bits to support emitting instrumented code on MachO.
This diff enables us to instrument branches / emit counters.

(cherry picked from FBD24255164)
2020-10-12 10:11:17 -07:00
Maksim Panchenko 247b4181a3 [BOLT] Emit symbol size for functions
Summary:
On targets that support it, emit size of the emitted function symbol.

At the moment there's no use for the size except that it is visible in a
temporary .o file symbol table.

(cherry picked from FBD24246177)
2020-10-12 13:02:50 -07:00
Alexander Shaposhnikov 528da5d795 Fix handling of _end symbol on MachO
Summary: _end is "defined" but its address doesn't belong to any section. This diff adds special handling for this symbol.

(cherry picked from FBD24249120)
2020-10-12 03:56:50 -07:00
Maksim Panchenko c27e254056 [BOLT] Change label name for cold fragments
Summary:
Append ".cold.0" suffix to the original part of the name, such that
"foo/1" becomes "foo.cold.0/1" instead of "foo/1.cold.0".

(cherry picked from FBD24246112)
2020-10-12 11:26:07 -07:00
Alexander Shaposhnikov 7f1fd80762 Add support for emitting code into a new segment on MachO
Summary: Add support for emitting code into a new segment on MachO (in the instrumentation mode).

(cherry picked from FBD24097547)
2020-10-02 19:25:17 -07:00
Maksim Panchenko 843309c075 [BOLT] Disable PatchEntries in non-relocation mode on ELF
Summary:
At the moment we are not using PatchEntries pass in non-relocation mode
on ELF. However, we will use it on MachO.

(cherry picked from FBD24235271)
2020-10-09 19:37:12 -07:00
Maksim Panchenko 0465d952cc [BOLT] Refactor PatchEntries pass
Summary:
Use injected functions with fixed addresses to patch original function
entries.

(cherry picked from FBD24133890)
2020-10-09 16:06:27 -07:00
Alexander Shaposhnikov 0376abe252 Add ToolPath field to MachORewriteInstance
Summary: Add ToolPath field to MachORewriteInstance. This will enable us to locate the runtime library relative to the tool's location.

(cherry picked from FBD24183448)
2020-10-07 17:52:47 -07:00
Rafael Auler 35632d4828 [BOLT] Refactor relocations class impl per arch, NFC
Summary:
Do not mix relocation codes from different archs. Even though
they do not intersect at the moment, this could easily introduce bugs
once new relocations are supported (for example, ILP32 for AArch64).

(cherry picked from FBD24169425)
2020-10-07 15:40:51 -07:00
Alexander Shaposhnikov 59c21b42da Precompute symbol section indices on MachO
Summary: Precompute symbol section indices on Mach-O.

(cherry picked from FBD24133810)
2020-10-06 01:30:55 -07:00
Alexander Shaposhnikov 71e185f2da Add -check-overlapping-elements option
Summary:
This diff adds a command line option to disable the check of overlapping elements in Mach-O parsing. This check in its current form is prohibitively expensive for large binaries.
A long-term fix would be to reimplement the check in a more efficient manner (and contribute it to the upstream).

(cherry picked from FBD24109468)
2020-10-05 02:35:26 -07:00
Rafael Auler d7fb998637 [BOLT] Fix sign issue when validating X86 relocations
Summary:
In analyzeRelocations, we extract the result of the relocation
from binary code to recreate the target of it in a few special cases.
For R_X86_64_32S relocations, however, we were neglecting the
possibility of the encoded value in the instruction to be negative.

(cherry picked from FBD24096347)
2020-10-05 12:41:03 -07:00
Alexander Shaposhnikov 2808c800e8 Read the entry point address on MachO
Summary: Read the entry point address on MachO

(cherry picked from FBD24039370)
2020-09-30 19:10:24 -07:00
Amir Ayupov d1ec11b28f postProcessEntryPoints: return after setIgnored and setSimple are set
Summary:
This patch fixes the assertion failure during instrumentation.

The assertion is raised by `getInstructionAtOffset` , which expects `CurrentState` to be either `Disassembled` or `CFG`.

The function is called from `postProcessEntryPoints`, which goes over Labels and performs a series of checks. The checks call BinaryFunction methods `setSimple(false)` or `setIgnored()`.
However, if `setIgnored` is invoked, it resets the state to `Empty`. Thus subsequent call to `getInstructionAtOffset` will fail.

(cherry picked from FBD24005197)
2020-09-29 19:37:47 -07:00
Alexander Shaposhnikov 0601ae6438 Set InputFileOffset for MachO sections
Summary: Set InputFileOffset for MachO sections.

(cherry picked from FBD23903542)
2020-09-24 03:22:31 -07:00
Maksim Panchenko a10f799290 [BOLT][Linux] Initial support for special Linux Kernel sections
Summary:
Enable initial support for reading and patching special Linux kernel sections.

Author: Tanvir Ahmed Khan <takh@fb.com>

GitHub Author: takhandipu

(cherry picked from FBD22998869)
2020-09-15 11:42:03 -07:00
Maksim Panchenko a82cff0f52 [BOLT] Eliminate "shallow" function lookup
Summary:
Whenever we search for a function based on its address in the input
binary, we now always return a corresponding fragment for split
functions. If the user needs an access to the main fragment, they can
call getTopmostFragment().

(cherry picked from FBD23670311)
2020-09-14 15:48:32 -07:00
Maksim Panchenko 62469b5036 [BOLT] Do no map sections with zero address
Summary:
Sections that do not originate from the input binary will have an
input address set to zero and thus do not have to be mapped.

Mapping such sections caused a build time regression in non-relocation
mode.

(cherry picked from FBD23670334)
2020-09-14 14:31:50 -07:00