Commit Graph

906 Commits

Author SHA1 Message Date
Vasily Leonenko 519cbbaa9a [PR] Instrumentation: Introduce instrumentation-binpath argument
Summary:
This commit introduces -instrumentation-binpath argument used
to point instuqmented binary in runtime in case if /proc/self/map_files
path is not accessible due to access restriction issues.

Vasily Leonenko
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092681)
2021-07-30 18:07:53 +03:00
Vasily Leonenko 285ac26d16 [PR] README: remove note about experimental status of instrumentation
Summary:
Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092666)
2021-06-25 16:27:47 +08:00
Vladislav Khmelevsky 361f3b5576 [PR] Instrumentation: Fix runtime handlers for PIE files
Summary:
This commit fixes runtime instrumentation handlers for PIE
binaries case.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092522)
2021-06-23 18:24:09 +00:00
Vasily Leonenko 9b39a823ea [PR] Instrumentation: Initial support for static executables
Summary:
This commit introduces static binaries instrumentation
support.  Note that current implementation does not support profile
output on the instrumented binary finalization. So it requires to use
-instrumentation-sleep-time=N (N>0) option usage.  Note: There is
unhandled case with static PIE executable which might have dynamic
header.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092471)
2021-06-21 01:59:38 +08:00
Elvina Yakubova 2ffd6e2b43 [PR] Instrumentation: Add support for opening libs based on links /proc/self/map_files
Summary:
This commit adds support for opening libs based on links
/proc/self/map_files.  For this we're getting current virtual address
and searching the lib in the directory with such address range. After
that, we're getting full path to the binary by using readlink
function. Direct read from link in /proc/self/map_files entries is not
possible because of lack of permissions.

Elvina Yakubova,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092422)
2021-01-19 02:08:55 +08:00
Elvina Yakubova 6665c628ea [PR] Instrumentation: Add readlink and getdents support
Summary:
This commit adds support for getting directory entries and
reading value of a symbolic link in instrumentation runtime library

Elvina Yakubova,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092362)
2021-01-18 22:08:10 +08:00
Vasily Leonenko ad79d51778 [PR] Instrumentation: Generate and use _start and _fini trampolines
Summary:
This commit implements new method for _start & _fini functions hooking
which allows to use relative jumps for future PIE & .so library support.
Instead of using absolute address of _start & _fini functions known on
linking stage - we'll use dynamically created trampoline functions and
use corresponding symbols in instrumentation runtime library.

As we would like to use instrumentation for dynamically loaded binaries
(with PIE & .so), thus we need to compile instrumentation library with
"-fPIC" flag to support relative address resolution for functions and
data.

For shared libraries we need to handle initialization of instrumentation
library case by using DT_INIT section entry point.

Also this commit adds detection if the binary is executable or shared
library based on existence of PT_INTERP header. In case of shared
library we save information about real library init function address
for further usage for instrumentation library init trampoline function
creation and also update DT_INIT to point instrumentation library init
function.

Functions called from init/fini functions should be called with forced
stack alignment to avoid issues with instructions which relies on it.
E.g. optimized string operations.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092316)
2021-06-19 04:08:35 +08:00
Amir Ayupov 60b10a8ead [BOLT][NFC] Unify isTailCall interface across X86 and AArch64
Summary:
Move the common code into MCPlusBuilder.h.
Use group 1 `kTailCall` MCAnnotation instead of dynamically allocated
annotation.
This diff reduces the processing time overhead to 1.5% vs using
TAILJMP opcode.

(cherry picked from FBD30055585)
2021-07-29 17:28:51 -07:00
Maksim Panchenko 89a2e16037 [BOLT] Support PLT sections with variable entry sizes
Summary:
The linker can generate 8- or 16-byte entries in .plt.got and .plt.sec
sections. On X86, the main differentiator is the presence of endbr64
instruction at the beginning of the entry. Detect the instruction and
adjust the size accordingly.

(cherry picked from FBD29847639)
2021-07-14 01:35:34 -07:00
Amir Ayupov c33f08e7df [BOLT] Update build instructions in README
Summary: Remove llvm.patch from build instructions.

(cherry picked from FBD29973395)
2021-07-28 14:45:10 -07:00
Joey Thaman a7e2a8f946 [BOLT] Tail Duplication active pass
Summary:
Amended the Tail Duplication
analysis pass to do the tail duplication in question

(cherry picked from FBD29833794)
2021-07-16 11:45:44 -07:00
Vasily Leonenko 68be8caf3f RewriteInstance: account .stab and .stabstr as debug sections
Summary:
.stab and .stabstr are special sections containing debugging
information and strings associated with the debugging information.
This commit adds them to the list of debugging sections, so
these sections can be removed for output binary.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD29746153)
2021-06-25 15:42:56 +08:00
James Luo 0df7bf7b8b [BOLT][CSSPGO] Handle indirect call promotion in Pseudo Probe Integration
Summary:
Match new direct call generated during ICP to correct pseudo probe

New call is matched to the probes of original call instruction.

(cherry picked from FBD29591662)
2021-07-16 16:05:18 -07:00
James Luo 3e55dea4dd [BOLT][CSSPGO] Encode pseudo probe section to binary
Summary:
Update .pseudo_probe section in input binary

DFS inline tree and emit pseudo probes with updated addresses

(cherry picked from FBD29522142)
2021-07-15 14:58:32 -07:00
Joey Thaman 2f46660559 [BOLT] Tail duplication analysis pass
Summary:
Created a binary pass that records how many
times tail duplication would be used and how many cache
misses it would theoretically stop

(cherry picked from FBD29619858)
2021-07-01 07:11:26 -07:00
Zino Benaissa 60b15062e1 [BOLT] Dump dynamic execution per instruction opcode
Summary:
We extended DynoStats to dump the histogram per instruction opcode. By
default the dump is turned off. Use '-print-dyno-opcode-stats' to enable
the dump.

BOLT also dumps for each instruction opcode the maximum execution count and
corresponding function name and basic block offsets where the instruction
occurs. Below is a sample of the dump:

                   Opcode,    Execution Count,      Max Exec Count, Function Name:Offset
                  SHR8rCL,                232,                 232, _ZNK5folly14AsyncSSLSocket4goodEv:53
                VPADDDYrr,              13956,                 388, chacha20_encrypt_bytes.part.0/3:736
               PMOVSXBWrr,                  4,                   2, ares_expand_name/1:264
                VMOVAPSmr,               1082,                  43, chacha20_encrypt_bytes.part.0/3:2864
                VPSHUFBrr,               9540,                1667, chacha20_encrypt_bytes.part.0/3:4416
            VPUNPCKLDQYrr,               1102,                 188, jsimd_ycc_rgb_convert_avx2/1:125
          VPBROADCASTQYrm,                 39,                  39, chacha20_encrypt_bytes.part.0/3:400
               PMOVSXWDrr,                  8,                   2, ares_expand_name/1:264
                   VPORrr,                817,                 129, jsimd_idct_islow_avx2/1:41
                  PSLLDri,            8690752,               65644, blockmix_salsa8_xor/1:1424

(cherry picked from FBD28859624)
2021-05-24 21:33:43 -07:00
Maksim Panchenko c9f5f47b51 [BOLT] Add support for .plt.sec and refactor PLT-reading code
Summary:
A binary can contain multiple PLT sections with different name and
attributes (such as an entry size). Extend the support to .plt.sec and
refactor the code to make future extensions simpler.

(cherry picked from FBD29502107)
2021-06-30 14:41:41 -07:00
Joey Thaman 4c12afc1f4 [BOLT][NFC] Resolved all clang-12 warnings for bolt
Summary:
clang-12 now compiles bolt without warnings.
Some warnings were fixed if possible while others were suppressed by
doing (void)variable for unused variable warnings or moving code inside
assert statements of LLVM_DEBUG blocks.

(cherry picked from FBD29469054)
2021-06-29 12:11:56 -07:00
Maksim Panchenko 1de0746790 [BOLT] Read all dynamic relocations and refactor code
Summary:
Add code to read more dynamic relocations (DT_JMPREL) and enforce strict
checks that corresponding sections sizes match .dynamic entry
description.

(cherry picked from FBD29502109)
2021-06-30 14:38:50 -07:00
Alexander Yermolovich f7499c6711 [BOLT][DWARF] Fix writing out dwo with DWP as input
Summary:
The code for writing out dwo files wasn't handling case where DWP is an input.
Because all the sections are part of the same binary.

One note with current implementation. .debug-str.dwo will have strings for all the dwo objects.
This is because llvm-dwp de-duplicates strings and combines them in to one section. It then re-writes .debug-str-offsets.dwo to point to new .debug-str.dwo section.

(cherry picked from FBD29244835)
2021-06-18 15:57:34 -07:00
Maksim Panchenko 3e5ce1f282 [BOLT][TESTS] Remove dynamic relocations from YAML tests
Summary:
Our YAML objects contain references to dynamic relocations via .dynamic,
but there are no corresponding relocation sections. Change .dynamic
contents to specify no dynamic relocations.

(cherry picked from FBD29502108)
2021-06-30 14:33:59 -07:00
Amir Ayupov a07d24cc4b [BOLT][NFC] Un-inline checking AArch64 linker veneers out of disassemble loop
Summary:
Move the AArch64 `matchLinkerVeneer` check out of a for-loop
in `BinaryFunction::disassemble`

(cherry picked from FBD29411348)
2021-06-25 17:52:51 -07:00
Amir Ayupov c7c0803b59 [BOLT][NFC] Un-inline indirect branch handling out of disassemble loop
Summary:
Move the `processIndirectBranch` switch statement out of a for-loop
in `BinaryFunction::disassemble`

(cherry picked from FBD29411346)
2021-06-25 17:49:43 -07:00
Amir Ayupov 8f751bc058 [BOLT][NFC] Un-inline adding external references out of disassemble loop
Summary:
Move the code that handles true external references (non-unreachable)
out of a for-loop in `BinaryFunction::disassemble`.

(cherry picked from FBD29411345)
2021-06-25 17:32:25 -07:00
Amir Ayupov 8f7a400629 [BOLT][NFC] Delete MoveRelocations entirely
Summary: MoveRelocations are unused. Remove interfaces and emission part.

(cherry picked from FBD29468409)
2021-06-25 17:06:21 -07:00
Maksim Panchenko 38c5887992 [BOLT][NFC] Always process runtime relocations
Summary:
Dynamic relocations applied at runtime should be processed even in
non-relocation mode.

(cherry picked from FBD29311906)
2021-06-22 13:46:06 -07:00
Amir Ayupov ef1b1e7184 [BOLT][NFC] Refactor handlePCRelOperand
Summary: Move error logging to handlePCRelOperand, reduce code duplication

(cherry picked from FBD29309702)
2021-06-17 17:41:28 -07:00
Amir Ayupov b964e852d5 [BOLT][NFC] Readability improvements in X86,Aarch64 MCPlusBuilder
Summary: Minor refactorings in target specific MCPlusBuilders to improve readability

(cherry picked from FBD29309701)
2021-06-17 18:22:32 -07:00
James Luo dea6c247d9 [BOLT][CSSPGO] Relate decoded pseudo probe basic blocks
Summary:
Assign decoded pseudo probe to correlated output block

Pseudo probes can then be encoded to a proper address

(cherry picked from FBD29211688)
2021-06-25 11:42:58 -07:00
Amir Ayupov 521a61b056 [BOLT][NFC] Use MCPlusBuilder::isPseudo
Summary: Consistently use this interface across BOLT codebase

(cherry picked from FBD29171718)
2021-06-16 12:10:20 -07:00
Amir Ayupov da276d73c7 [BOLT] Handle R_X86_64_64 in flushPendingRelocations
Summary:
Handle R_X86_64_64 the same way as R_X86_64_32;
`getSizeForType` takes care of the size:

```x86_64 ABI relocation types
Name        Value Field  Calculation
R_X86_64_64 1     word64 S + A
R_X86_64_32 10    word32 S + A
```

(cherry picked from FBD29370417)
2021-06-24 12:18:16 -07:00
Maksim Panchenko f46af9e9bc [BOLT][TESTS] Fix ICF test case
Summary:
Host compiler may generate duplicate functions and as a result BOLT can
fold more than 1 function.

(cherry picked from FBD29347302)
2021-06-23 16:13:30 -07:00
Joey Thaman be0da0fac2 Throw an error in instrument for dynamic libs
Summary:
In InstrumentatonRuntimeLibrary, throw an error

if the program uses dynamic libraries

(cherry picked from FBD29265147)
2021-06-21 07:45:52 -07:00
Maksim Panchenko bbbd159ccb [BOLT] Fix undefined symbol warnings/errors
Summary:
When we fold a function in relocation mode, make sure to clear its state
to avoid emitting relocations against undefined symbols.

(cherry picked from FBD29245320)
2021-06-18 14:35:39 -07:00
Sameeran joshi ba915af1cd [PR][BOLT] Print revision in perf2bolt and bolt-diff modes"
Summary:
Fix issue facebookincubator/BOLT#160
PR facebookincubator/BOLT#172

(cherry picked from FBD29139522)
2021-06-08 23:28:37 +05:30
Rafael Auler e485a9830b Rebase: [BOLT][DebugFission] Fix reading support for DWP
Summary:
Dived more in to DWARF APIs and llvm-symbolizer this is a more streamline way of doing it, and address base gets set properly.
Writing out dwo files with dwp input will be separate patch.

(cherry picked from FBD31361529)
2021-06-16 09:52:03 -07:00
Vladislav Khmelevsky a8b9319536 [PR] Patch allocatable relocations for AArch64
Summary: PR facebookincubator/BOLT#166

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28910060)
2021-06-02 00:03:56 +03:00
Vladislav Khmelevsky 2cf9008a60 [PR] Instrumentation: Disable signals on mutex lock
Summary:
When indirect call is instrmented it locks SimpleHashTable's mutex on get() call.
If while locked we we receive a signal and signal handler also will call
indirect function we will end up with deadlock.

PR facebookincubator/BOLT#167

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28909921)
2021-06-04 19:51:06 +03:00
Maksim Panchenko 1efadeedf2 [BOLT] Fix rodata load simplification pass
Summary:
If the target address has a runtime relocation against it, do not
perform the load simplification.

(cherry picked from FBD29091939)
2021-06-13 15:37:31 -07:00
Amir Ayupov f7f0a571d7 [BOLT][NFC] Suppress addList override warning
Summary:
Suppresses the warning
```
src/DebugData.h:338:20: warning: 'addList' overrides a member function but is not marked 'override' [-Wsuggest-override]
```

(cherry picked from FBD28858201)
2021-06-02 19:12:13 -07:00
James Luo 8a919593c7 [BOLT][CSSPGO] Pseudo probe decoding
Summary:
Make bolt decode pseudo probe section in binary

For more detail of pseudo probe, check https://reviews.llvm.org/D86490.

(cherry picked from FBD28856316)
2021-06-11 13:06:12 -07:00
Alexander Yermolovich 226d1c3b0b [BOLT] Change how DF DWO logging is handled
Summary: Changing assert to a warning when DWO debug information can't be retrieved. Usually due to invalid path.

(cherry picked from FBD29005217)
2021-06-09 12:55:09 -07:00
Amir Ayupov 2da5b12a3d [BOLT] Hugify: check for THP support via sysfs
Summary:
Remove dependence on kernel version check, query sysfs directly
instead.

(cherry picked from FBD28858208)
2021-06-02 19:11:52 -07:00
Maksim Panchenko 7bccf8d25d [BOLT][NFC] Fix debug info printouts for inlined functions
Summary:
While printing debug info for instructions, we should use line tables
from the corresponding DWARF CU which could be different from the
containing function CU in case of inlined instructions.

(cherry picked from FBD28908324)
2021-06-04 12:31:31 -07:00
Amir Ayupov 65d227c035 [BOLT][TEST] Fix test case to conform to analyzePICJumpTable pattern matching
Summary:
Make sure that jump table is properly recognized in
`split_func_jump_table_fragment.s`.

(cherry picked from FBD28839976)
2021-06-02 10:50:47 -07:00
James Luo 1c06193d0f [BOLT] Resolve JumpTable namespace issue in pseudo probe decoder migration
Summary: This diff fixes the JumpTable namespace conflicts during the migration of pseudo probe decoder.

(cherry picked from FBD28859927)
2021-06-02 22:46:57 -07:00
Maksim Panchenko a26370389a [BOLT][NFC] Disable ProcessAllSections in RuntimeDyld
Summary:
FBD55943 changed the way ProcessAllSections works in RuntimeDyld. After
the change, all sections, including symbol table, section table, etc.
are loaded into memory whenever ProcessAllSections is enabled.

In BOLT we rely on RuntimeDyld for processing sections with relocations.
These include most allocatable sections and additionally .debug_line.
The latter is skipped by RuntimeDyld without ProcessAllSections flag.
If we enable ProcessAllSections, we will have to deal with allocating
memory for more sections than we need (see above) and later to filter
them out.

The alternative is to mark all sections that we actually plan to use as
"required for execution" (using RuntimeDyld terminology). For
.debug_line section on ELF it means adding SHF_ALLOC flag. On MachO,
RuntimeDyld currently treats all sections as required.

(cherry picked from FBD28729398)
2021-05-26 16:23:34 -07:00
Vladislav Khmelevsky 5a6c379f5b [PR] Instrumentation: Emit paddings to preserve data alignment
Summary:
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

facebookincubator/BOLT#156

(cherry picked from FBD28521843)
2021-05-14 14:09:05 +03:00
Vladislav Khmelevsky 79807d99fe [PR] Introduce loop inversion pass
Summary:
This patch introduces LoopInversionPass. Its main purpose is to ensure
that the loop layout is optimal depending on the profile information. So
if profile information shows that the loop is used, the unconditional
jump instruction must be executed only once and vice-versa. Please take
a look to the pass header file and test for more details.

Also change link_fdata script a bit, to be able to change FDATA prefix,
like FileCheck does.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

PR facebookincubator/BOLT#153

(cherry picked from FBD28391811)
2021-05-11 20:59:13 +03:00
Amir Ayupov 12e9fec697 Rebase: [BOLT] DebugFission Support
Summary:
Implemented support for Debug Fission.
For the most part it doesn't impact Monolithic execution path.
One area that was changed is the DW_AT_low_pc/DW_AT_high_pc conversion. Before it was to DW_AT_ranges/DW_AT_low_pc, now DW_AT_low_pc is kept in same place.
Another more visible impact is in Skeleton CU the DW_AT_low_pc is replaced with DW_AT_ranges_base if it's not originally present and bolt converted ranges conversion inside the dwo units.

Output of this are multiple .dwo files with updated debug information.

(cherry picked from FBD29569788)
2021-04-01 11:43:00 -07:00