Commit Graph

1129 Commits

Author SHA1 Message Date
Alexey Moksyakov 85ffa8e4ba [PR][BOLT][Instrumentation] Optimize eflags load/store
Summary:
This commit uses reviews.llvm.org/D6629 as a reference to optimize
X86::EFLAGS load/store in the instrumentation snippet by using lahf/sahf
instructions instead of pushf/popf.

(cherry picked from FBD31662303)
2021-10-11 16:10:06 +00:00
Rafael Auler 443f1b4ff4 Rebase: [BOLT] AsmDump: dump function assembly and profile info
Summary:
Added new functionality of dumping simple functions into assembly.
This includes:
- function control flow (basic blocks, instructions),
- profile information as `FDATA` directives, to be consumed by link_fdata,
- data labels,
- CFI directives,
- symbols for callee functions,
- jump table symbols.

Envisioned usage:
1. Find a function that triggers BOLT crash (e.g. with `bughunter.sh`).
2. Generate reproducer asm source for that function (using `-funcs`).
3. Attach it to an issue.
4. Reduce and include as a test case.

Current limitations:
1. Emitted assembly won't match input file relocations.
2. No DWARF support.
3. Data is not emitted.

(cherry picked from FBD32746857)
2021-09-27 10:51:25 -07:00
Maksim Panchenko 60b0999723 [BOLT][NFC] Do not pass BinaryContext alongside BinaryFunction
Summary:
BinaryContext is available via BinaryFunction::getBinaryContext(),
hence there's no reason to pass both as arguments to a function.

In a similar fashion, BinaryBasicBlock has an access to BinaryFunction
via getFunction(). Eliminate unneeded arguments.

(cherry picked from FBD31921680)
2021-10-26 00:06:34 -07:00
Rafael Auler 0559dab546 [BOLT] Improve cmake configs for opensource
Summary:
Change cmake config in BOLT to only support Linux. In other
platforms, we print a warning that we won't build BOLT.  Change
configs to determine whether we will build BOLT runtime libs. This
only happens in x86 hosts. If true, we will build the runtime and
enable bolt-runtime tests. New tests that depend on the bolt_rt lib
needs to be marked REQUIRES:bolt-runtime. I updated the relevant
tests.  Fix cmake to do not crash when building llvm with a target
that BOLT does not support.

(cherry picked from FBD31935760)
2021-10-26 12:26:23 -07:00
Rafael Auler a34c753fe7 Rebase: [NFC] Refactor sources to be buildable in shared mode
Summary:
Moves source files into separate components, and make explicit
component dependency on each other, so LLVM build system knows how to
build BOLT in BUILD_SHARED_LIBS=ON.

Please use the -c merge.renamelimit=230 git option when rebasing your
work on top of this change.

To achieve this, we create a new library to hold core IR files (most
classes beginning with Binary in their names), a new library to hold
Utils, some command line options shared across both RewriteInstance
and core IR files, a new library called Rewrite to hold most classes
concerned with running top-level functions coordinating the binary
rewriting process, and a new library called Profile to hold classes
dealing with profile reading and writing.

To remove the dependency from BinaryContext into X86-specific classes,
we do some refactoring on the BinaryContext constructor to receive a
reference to the specific backend directly from RewriteInstance. Then,
the dependency on X86 or AArch64-specific classes is transfered to the
Rewrite library. We can't have the Core library depend on targets
because targets depend on Core (which would create a cycle).

Files implementing the entry point of a tool are transferred to the
tools/ folder. All header files are transferred to the include/
folder. The src/ folder was renamed to lib/.

(cherry picked from FBD32746834)
2021-10-08 11:47:10 -07:00
Marius Wachtler 46bc197d72 [PR] bolt_rt: getBinaryPath() increase max file path
Summary:
Increase the hard limit from 256 to 4096.
This fixes the 'Assertion failed: failed to open binary path' error I'm seeing.

(cherry picked from FBD31911946)
2021-10-25 20:42:29 +02:00
Maksim Panchenko 1ccc3d500e [BOLT] Add Dockerfile
Summary:
Dockerfile based on Ubuntu:20.04.

Fixes facebookincubator/BOLT#214.

(cherry picked from FBD31883210)
2021-10-23 15:44:08 -07:00
Vladislav Khmelevsky 95ee12977b [PR] Introduce remove-symtab option
Summary:
This patch introduces remove-symtab option to be able to skip emitting
symtab section in the final binary.
Also this patch adds ".zdebug_*" (compressed debug section) in the list
of debug section names.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31738239)
2021-10-16 17:02:45 +03:00
Vladislav Khmelevsky 10088a1e15 [PR] Fix warning
Summary:
Fix control reaches end of non-void function warning

(cherry picked from FBD31738391)
2021-10-17 14:25:57 +03:00
Maksim Panchenko 32782574d2 [BOLT][DWARF] Keep original line info for unmodified units
Summary:
Some compilation units will contain only code that is left unmodified by
BOLT, e.g. there is no profile data available for any function from such
units as they are rarely or never executed.

To save processing time and memory, we disable building line info tables
for such units and write unmodified tables to the output file.

(cherry picked from FBD31599759)
2021-10-11 12:05:34 -07:00
Vladislav Khmelevsky cb8d701b7b [PR] Disable instrumentation and hugify build for aarch64
Summary:
This patch temporarily disables instrumentation and higufy build not for
x86 platforms to be able to build llvm-bolt tool on aarch64.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31738306)
2021-10-16 17:35:29 +03:00
Vladislav Khmelevsky dc4b32e1b1 [PR] Skip NONE static relocations
Summary:
To supress warning of unsupported relocations

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31738420)
2021-10-17 16:36:24 +03:00
Vladislav Khmelevsky dcdd37fdc2 [PR] Instrumentation: Sync file on dump
Summary:
Sync the file with storage device on data dump to stabilize
instrumentation testing

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31738021)
2021-10-15 20:46:09 +03:00
Vladislav Khmelevsky 2d431eefbf [PR] Fix constant islands handling
Summary:
After the "Allocate memory for constant islands on-demand" patch there
are couple of problems found in constant islands handling:
1. When creating constant island dependency we need to check that we
already allocated IslandInfo for BF.
2. In ADRRelaxationPass we need to set constant island check under new
hasIslandsInfo condition.
3. In binaryemitter we need to replace hasConstantIsland with
hasIslandsInfo check since originally the BF might not have constant
island, but might have access to other's BF CI.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31737935)
2021-10-16 14:44:29 +03:00
Alexander Yermolovich fdd9184db5 [BOLT][DWARF] Refactor of Loc and LocLists writers
Summary:
Refactored Loc and LocList writers to write out entries during finalization phase,
 and hid some of the details in a class.
This simplifies things from impelementation details, and also will be needed for
DWARF5 where we need to know how many locLists entries there are there.

(cherry picked from FBD31563795)
2021-10-11 17:51:05 -07:00
Elvina Yakubova 53ec21e3a1 [PR][BOLT][TEST] Fix tests
Summary:
Add lit.local.cfg to X86 and AArch64 folders.
Fix host_arch in lit config for AArch64.
Fix AArch64 and X86 tests.

Elvina Yakubova,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31702068)
2021-10-11 11:15:08 +03:00
Vladislav Khmelevsky a2214e8f0d [PR] Fix LongJmp pass
Summary:
This patch handles 2 problems with LongJmp pass:
1. The pass should be executed before FinalizeFunctions, since the pass
may add new entry points for the function, and the
BinaryFunction::addEntryPoint has an assert "CurrentState == State::CFG"
2. Replaced shortJmp implementation with position-independent code.
Currently we could handle PIC binaries with max +-4Gb offsets, the
longJmp uses absolute addreses and could could be used only in non-PIE
binaries.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31416925)
2021-10-04 19:17:01 +03:00
Maksim Panchenko 96bb090653 [BOLT][DWARF] Use MCAsmLayout to update stmt_list values
Summary:
Use MCAsmLayout to update stmt_list in updateLineTableOffsets() instead
of manually calculating the layout.

(cherry picked from FBD31623071)
2021-10-13 13:19:06 -07:00
Maksim Panchenko 93444ce8e8 [BOLT] Fix build after auto rebase
(cherry picked from FBD31550675)
2021-10-11 12:46:22 -07:00
Maksim Panchenko 9bb3908b61 [BOLT] Allocate memory for constant islands on-demand
Summary:
Allocate memory for storing constant island info only when needed.

(cherry picked from FBD31510149)
2021-10-08 11:31:45 -07:00
Amir Ayupov 01a81dca41 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- shared_object.test
- shrinkwrapping.test
- static_exe.test
- tailcall.test
- vararg.test

(cherry picked from FBD31523478)
2021-10-08 18:23:32 -07:00
Amir Ayupov 44e08ead30 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- sctc_bug{,2,3,4}.test

(cherry picked from FBD31517120)
2021-10-08 14:49:23 -07:00
Amir Ayupov f44e1df9d0 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- re-optimize.test
- relaxed_tailcall.test
- remove_unused.test
- retpoline_synthetic.test

(cherry picked from FBD31516680)
2021-10-08 14:33:33 -07:00
Amir Ayupov 872013e077 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- cfi_instrs_reordered.s
- no_entry_reordering.test
- no_relocs.test
- pie.test

(cherry picked from FBD31514823)
2021-10-08 13:39:24 -07:00
Amir Ayupov d41b4e6e2d [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- keep_aranges.test
- layout_heuristic.test
- line_number.test
- block_reordering.test
- branch_data.test
- reader.test

(cherry picked from FBD31486371)
2021-10-07 13:38:58 -07:00
Amir Ayupov c74e5bfee3 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- jmp_optimization.test
- jmpjmp.test
- jump_table_footprint_reduction.test
- jump_table_reference.test

(cherry picked from FBD31483122)
2021-10-06 16:20:00 -07:00
Amir Ayupov 92e306de0c [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- indirect_goto.test
- indirect_goto_pie.test
- inlined_function_mixed.test

(cherry picked from FBD31446571)
2021-10-06 12:23:05 -07:00
Vladislav Khmelevsky 5f953277a9 [PR] Handle relocations in constant islands
Summary:
In non-PIC binaries compiler could save absolute addresses in constant
isalnd which we should handle properly. This patch adds relocations
handling in constant islands.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31416848)
2021-10-04 19:05:18 +03:00
Amir Ayupov 8ab49cb4aa [BOLT] link_fdata: accept symbols with slash in the name
Summary:
Change sed separator to allow replacing symbols with slash in the name.
This is required for symbol names produced by BOLT which include
"/1" suffix.

(cherry picked from FBD31324540)
2021-09-30 16:11:09 -07:00
Amir Ayupov b86c91eae0 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- invalid_profile.test
- internal_call.test
- internal_call_instrument.test

(cherry picked from FBD31452386)
2021-10-06 14:25:29 -07:00
Vladislav Khmelevsky e424d16f0e [PR] AArch64: Add TSTBR14 and CONDB19 relocations support
Summary:
This patch adds R_AARCH64_TSTBR14 and R_AARCH64_CONDBR19 relocations
support in order to handle condition branches, cbz/cnbz and tbz/tbnz
instructions correctly

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31416734)
2021-10-03 13:41:41 +03:00
Vladislav Khmelevsky 848f07792c [PR] Update skipRelocationProcess
Summary:
The ELF::R_AARCH64_TLSDESC_LD64_LO12 and
ELF::R_AARCH64_TLSDESC_ADR_PAGE21 relocations might also be relaxed to
mov instructions, handle these cases

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31353063)
2021-10-01 22:06:15 +03:00
Amir Ayupov c637fcf24e [BOLT][NFC] Use const pointers in PrintProgramStats
Summary: Small refactoring to use const BinaryFunction pointers in PrintProgramStats.

(cherry picked from FBD31387253)
2021-10-04 22:43:01 -07:00
Rafael Auler a8cbc8093f [BOLT] Do not process DWARF relocs
Summary:
Use the new API introduced in https://reviews.llvm.org/D106624
to request LLVM do not process relocations for debug sections, since
BOLT processes final binaries that are already relocated.

(cherry picked from FBD31449206)
2021-10-06 13:03:56 -07:00
Maksim Panchenko 8ef3b27834 [BOLT][DWARF] Properly emit of end-of-sequence entries for line tables
Summary:
When the compiler emits line table program, it emits EOS using the label
at the end of the containing code section. Since each compilation unit
has its own set of code sections it works as expected (* see the excerpt
from the standard below). However, in BOLT the code from many CUs is
combined into a common section, such as hot text or cold text.
As a result, the symbol at the end of the section may point way past the
code sequence for a given unit.

Since we can emit functions in any order, we conservatively emit
end-of-sequence at the end of every emitted function.

Fixes a problem while intermixing source code with disassembly in
binutils' objdump.

(*) DWARF v4 6.2.5.3:
"Every line number program sequence must end with a DW_LNE_end_sequence
instruction which creates a row whose address is that of the byte after
the last target machine instruction of the sequence."

(cherry picked from FBD31347870)
2021-09-30 17:47:50 -07:00
Maksim Panchenko 98bc9876fb [BOLT][DWARF] Change line info emission for unmodified functions
Summary:
Generate line tables for original/unmodified functions directly from
input line tables, bypassing conversion into intermediate structures,
such as BinaryLineDivisions.

Emit end-of-sequence markers only when necessary, i.e. when the line
sequence is not adjacent to the next one, or at the end of the line
sequence for the compilation unit.

If the sequence starts with ambiguous line info (multiple lines per
address), make sure we emit all such lines.

Reduce memory consumption when updating debug info by eliminating
intermediate data structures allocation.

(cherry picked from FBD30829448)
2021-09-08 10:22:19 -07:00
Amir Ayupov e903671bbf [BOLT][TEST] Imported small tests, removed duplicate input
Summary:
Imported small internal tests.
- call_zero.s
- cfi_expr_rewrite.s
- cfi_insts_count.s
- exceptions_pic.test
- exceptions_run.test

Removed duplicate input file (switch_statement.cpp)

(cherry picked from FBD31355466)
2021-10-01 15:35:43 -07:00
Maksim Panchenko 7b61cb7812 [BOLT][DWARF] Deprecate usage of DWARFAbbreviationDeclaration::findAttribute()
Summary: Deprecate the usage of extension to LLVM API.

(cherry picked from FBD31360154)
2021-10-01 21:01:05 -07:00
Maksim Panchenko d4a0e8526a [BOLT][DWARF] Move line info emission into BOLT
Summary:
BOLT needs to generate line info tables using absolute addresses as well
as using the standard MC way of labels attached to instructions. Move
line table generation code under BOLT.

Ideally, we should be able to extend existing interfaces in LLVM, but
without other users of the interface it will be hard to justify the
change.

(cherry picked from FBD30723466)
2021-09-01 21:40:54 -07:00
Maksim Panchenko ba1f503f1b [BOLT][NFC] Remove redundant code
Summary:
For historical reasons, we are populating FailedAddresses twice in
RewriteInstance. Remove the second (happening later) call to avoid the
confusion.

(cherry picked from FBD31278956)
2021-09-29 11:40:16 -07:00
Maksim Panchenko e3b901aaee [BOLT][DWARF] Fix abbrev offsets for type units
Summary:
When rewriting .debug_abbrev section, update abbrev offsets for type
units in addition to compile units.

Reuse abbreviation entries if they were shared by multiple compile/type
units.

(cherry picked from FBD31262326)
2021-09-28 23:30:06 -07:00
Amir Ayupov 47455e98b3 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- R_X86_64_64.pic.lld.cpp
- avx512_trap.test
- bad_exe.test
- bolt_info.test

(cherry picked from FBD31251439)
2021-09-28 15:47:51 -07:00
Rafael Auler 62550dd22c Rebase: [PR] Fix build instructions
Summary:
As titled.

(cherry picked from FBD32740596)
2021-09-25 21:20:47 +03:00
Amir Ayupov 4157682fd9 [BOLT][TEST] Import internal_call_instrument.s
Summary: Imported standalone assembly test

(cherry picked from FBD31161181)
2021-09-23 14:28:13 -07:00
Amir Ayupov 6b4eb0b94a [BOLT][TEST] Split runtime tests into test/runtime folder
Summary:
Create bolt/test/runtime folder and move tests that execute the binary.
Move lit.local.cfg with host_arch check to the corresponding folder.
Addresses issue facebookincubator/BOLT#132.

AArch64/tls.c shows a different behavior with clang hence marked as XFAIL

TODO: add a check for non-exec tests for a corresponding LLVM_TARGETS_TO_BUILD.

(cherry picked from FBD31132234)
2021-09-22 17:58:33 -07:00
Maksim Panchenko 122254bc35 [BOLT][DWARF][NFC] Get rid of updateRangeBase() helper function
Summary:
Move attribute patching code out of updateRangesBase into
convertToRanges() functions.

(cherry picked from FBD31154742)
2021-09-23 14:08:15 -07:00
Maksim Panchenko 64db3e7b7c [BOLT][DWARF][NFC] Use only skeleton/main CUs to update .debug_aranges
Summary:
Previously, we were registering all CUs with aranges writer. Since DWO
CUs have offsets set to 0, and we were registering them after the
skeleton unit at offset 0 was already registered, it was mostly
harmless as DWO CUs were effectively ignored.

(cherry picked from FBD31162621)
2021-09-23 19:08:54 -07:00
Maksim Panchenko 4d5cd1bf82 [BOLT][DWARF] Write new .debug_abbrev sections
Summary:
Instead of patching the original .debug_abbrev section contents,
generate new section data based on parsed compilation unit
abbreviations.

This eliminates the dependency on the LLVM extension that records
abbreviation attribute offsets while parsing .debug_abbrev contents.

The output with this patch should stay the same (NFC).

(cherry picked from FBD31133611)
2021-09-17 14:48:14 -07:00
Vladislav Khmelevsky e1da1539e3 [PR] Add AARCH64_MOVW_UABS_G* relocations support
Summary:
This patch fixes issue facebookincubator/BOLT#177

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31130162)
2021-09-23 00:52:36 +03:00
Amir Ayupov d4fdc98140 [BOLT][TEST] Remove dependence on host_cc and host_cxx
Summary: Add dependency on clang and clangxx instead.

(cherry picked from FBD31128140)
2021-09-22 15:53:38 -07:00
Maksim Panchenko 43fffff671 [BOLT][DWARF][NFC] Refactor code
Summary: Minor refactoring to improve code readability.

(cherry picked from FBD31122375)
2021-09-22 13:10:19 -07:00
Vladislav Khmelevsky 00c0659b13 [PR] AArch64: Skip some of the relocations processing
Summary:
There are some cases, when relocations must not be processed by bolt.
This patch handles three of such cases:
* The linker might eliminate the instruction and replace it with NOP
* The linker might perform TLS relocations relaxations, replacing the
got to direct TP + offset access.
* Due to errata 843419 the linker might create a veneer, replacing the
load/store instruction with branching.

In both cases linker leaves old relocations, that are no longer matches
the instruction emmited to binary, so we must avoid processing of these
relocations.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31002384)
2021-09-08 13:37:19 +03:00
Vladislav Khmelevsky 542c03c3a3 [PR] Fix aarch64 TLS relocations handling
Summary:
There are few problems found when dealing with TLS relocations for
aarch64.

* RewriteInstance.cpp
** While analyzing TLS relocation we don't have to modify
SymbolAddress (which is the offset from the TLS section), so we need to
just skip verifiction
** The non-got related TLS relocations on aarch64 might be skipped too
** The forse relocation must be applied for GOT relocations on
Aarch64. The symbol adress for GOT relocation might no be pointing
on GOT section (for example ADRP GOT may point to the wrong section,
since GOT table is not page-aligned), so we won't try to get section by
the symbol address.

* Relocation.cpp - Remove R_AARCH64_TLSLE_ADD_TPREL_HI12 and
R_AARCH64_TLSLE_ADD_TPREL_LO12_NC from isGOT check, since they are not
got-related relocations

* BinaryFunction.h
** Remove R_AARCH64_TLSLE_ADD_TPREL_HI12 and
R_AARCH64_TLSLE_ADD_TPREL_LO12_NC from adding to relocation list, since
this is actually an offset in TLS section and BOLT does not change it we
don't need to do something with this relocations, the value won't change
in new binary files
** Refactor the code, separating aarch64 and x86 relocations

* AArch64MCPlusBuilder.cpp
** Add forgotten LO12 relocations to switch case to getTargetExprFor

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31003349)
2021-09-02 21:04:33 +03:00
Maksim Panchenko 48fbeb1a46 [BOLT] Fix warnings from LLVM DWARF reading library
Summary:
LLVM started printing warnings when DWARFDebugInfoEntry::extractFast()
is invoked trying to read a DIE past the current unit limits. This
results in verbose warnings from BOLT which are harmless but confusing
to the user. Check the boundaries before calling the API above.

(cherry picked from FBD31097271)
2021-09-21 15:39:35 -07:00
Rafael Auler 1ca3a8b824 [NFC] Fix warnings when building with clang
Summary:
Fix switch-cases that don't handle all MCCFIInstruction
enumeration types. Fix range-loop iterator forced copy.

(cherry picked from FBD31068505)
2021-09-20 15:16:01 -07:00
Rafael Auler 47ce9b39e4 [BOLT] [NFC] Cleanup old code in mapCodeSections
Summary:
In "Add initial function injection support", Laith added this
code because injected functions would use the original text section as
the section to emit their code to. Now, what happens is that functions
are mapped to either their own section in non-reloc mode, or mapped to
a particular section in the pass reassign sections. So this section does
not need to have an output address anymore and this code is obsolete.

(cherry picked from FBD30980450)
2021-09-15 18:03:50 -07:00
Rafael Auler 7b779f819f [BOLT] Fix binary corruption in non-reloc mode
Summary:
We have a problem where we will emit sections that we are not supposed
to emit (with no output offset assigned). This will make us write at
file offset 0 and corrupt the first sections in the binary (usually
.interp section will be corrupted and bash will refuse to run the
binary).

This only happens in non-reloc mode when using JTS_BASIC and when we
do not emit a function that has a jump table (if it gets too large).

Using -update-debug-sections will trigger the pass
check-large-functions, which will mark large funcs as non-simple
and will hide this bug.

(cherry picked from FBD30882012)
2021-09-10 16:19:50 -07:00
Vasily Leonenko 9aa134dc2d [PR] Instrumentation: use TryLock for SimpleHashTable getter
Summary:
This commit introduces TryLock usage for SimpleHashTable getter to
avoid deadlock and relax syscalls usage which causes significant
overhead in runtime.
The old behavior left under -conservative-instrumentation option passed
to instrumentation library.
Also, this commit includes a corresponding test case: instrumentation of
executable which performs indirect calls from common code and signal
handler.

Note: in case if TryLock was failed to acquire the lock - this indirect
call will not be accounted in the resulting profile.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30821949)
2021-08-08 04:50:06 +08:00
Vasily Leonenko e2480fcc98 [PR] LIT: add checking if maxIndividualTestTime is availabe on the platform
Summary:
This commit adds checking if maxIndividualTestTime is availabe on
the platform. If available - it sets per test timeout to 60sec and
declares lit-max-individual-test-time feature for further checking
by particular test cases.
Based on https://reviews.llvm.org/D64251 implementation.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30821986)
2021-08-27 21:56:24 +03:00
Vladislav Khmelevsky 856299594c [PR] ReorderAlgorithm.cpp: Fix iterator types
Summary:
The clang 12 doesn't want to build this place due to unrelated
types of iterator element and std vector.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30821177)
2021-09-06 20:30:22 +03:00
Alexander Yermolovich 23fc454f68 [BOLT] Refactor to use new APIs for getting offset of attribute
Summary: Changing to use the new APIs for getting offset of attribute from .debug_info. They were split in to multiple ones so that Offset can be gotten seperatly.

(cherry picked from FBD30616705)
2021-08-27 13:48:31 -07:00
Joey Thaman 3e8af67a95 [BOLT] Optimize the three way branch
Summary:
Three way branches commonly appear
in HHVM. They have one test and then two jumps.  The
jump's destinations are not currently optimized.
This pass attempts to optimize which is the first branch.

(cherry picked from FBD30460441)
2021-08-17 10:15:21 -07:00
Vladislav Khmelevsky c040431fe6 [PR] AArch64: Fix ADR instruction handling
Summary:
There are 2 problems found when handling ADR instruction:
1. When extracting value from the ADR instruction we need to do
it another way, then we do it for ADRP instruction.
2. When creating target expression the VariantKind should be other for
ADR instruction.

And we introduces R_AARCH64_ADR_PREL_LO21,
R_AARCH64_TLSDESC_ADR_PREL21 and R_AARCH64_ADR_PREL_PG_HI21_NC
relocations support.

Also this patch introduces AdrPass, which will replace non-local
pointing ADR instructions with ADRP + ADD instructions sequence due to
small offset range of ADR instruction, so after BOLT magic there are no
guarantees that ADR instruction will still be in the range of
just +- 1MB from its target. The instruction replacement needs
relocations to be avalailable, so we won't remove "IsFromCode"
relocations after disassembly from BF anymore. Also we need original
offset of ADR instruction to be available so we add offset annotation
for these instructions.

The last thing this patch adds is ARM testing directory, which will be
used only on ARM testing servers. The common tests (non-assembler tests
which are platform-independent) might be moved from the X86 directory to
the parent one in the future, so such tests could be tested on both X86
and ARM machines.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30497379)
2021-08-20 03:07:01 +03:00
Vladislav Khmelevsky a1036e42da [PR] Print relocations warning if failed to process
Summary:
Currently most of the warnings are printed only in debug mode. Since
relocations are very important for binary correct work I suggest to
print number of failed to process relocations to pay extra attention in
case some problems with them were met

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30500629)
2021-08-22 02:44:30 +03:00
Joey Thaman ef6186c822 [BOLT] Added Constant and Copy Propagation to tail duplicated blocks
Summary:
Added a function in TailDuplication
that will do Constant and Copy Propagation for blocks that
we duplicated as a part of tail duplication.  Added supporting
functions to MCPlusBuilder to find src registers and replace
registers

(cherry picked from FBD30231907)
2021-08-10 10:02:32 -07:00
Vladislav Khmelevsky 2a5790b670 [PR] Fdata: Escape whitespaces in symbol names
Summary:
This patch is part of preparation for golang support. The golang symbols
might have spaces in the name (for example "type..eq.[10]interface {}").
Since fdata uses spaces as a field separator such names brakes the fdata
format, so we need to escape whitespaces and backslashes in symbol names
using the backslash character.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD29999491)
2021-06-29 19:54:08 +03:00
Amir Ayupov b64de07569 [BOLT][NFC][PR] Removed unused singletonSet
Summary:
Remove unused code introduced a while ago (2016), with its use removed
since then.

PR facebookincubator/BOLT#198

Author: Amir Aupov <aaupov@fb.com>

(cherry picked from FBD30376537)
2021-08-12 14:46:50 -07:00
Vladislav Khmelevsky 8459c14c68 [PR] Fix AARCH64 ADR* relocations
Summary:
The ADRP instructions has 21 bits to store page offsets + 12 lowest bits
are zero, that give us a total of 33 bits (32 bits for address + 1 sign
bit, to address +- 4GB).

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30283044)
2021-08-11 22:21:37 +03:00
Rafael Auler faee814fb9 Fix NFC tests
Summary:
Our NFC tests are failing on debug-fission-single.s. Fix the test
to be compliant with our checking script.

(cherry picked from FBD30352415)
2021-08-16 11:33:20 -07:00
Rafael Auler d217e2f338 Rebase: [BOLT] DWP output support
Summary:
Added support for writing out DWP file. Works with regular dwo as input or DWP as input.

(cherry picked from FBD31361619)
2021-06-29 15:28:52 -07:00
Vasily Leonenko 900914d3c6 [PR] Tests: add instrumentation tests for PIE exec & shared libs
Summary:
This commit adds dummy tests for checking instrumentation
support for PIE executables and shared libraries.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092729)
2021-06-19 23:01:28 +08:00
Vladislav Khmelevsky af58da4ef3 [PR] Instrumentation: Avoid generating GOT table in instrumentation library
Summary:
To avoid RELATIVE relocations avoid using of GOT table
by using hidden visibility for all symbols in library.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092712)
2021-07-22 00:04:28 +03:00
Vladislav Khmelevsky 553f28e921 [PR] Instrumentation: Fix start and fini trampoline pointers
Summary:
The trampolines are no loger pointers to the functions.  For
propper name resolving by bolt use extern "C" for all external symbols
in instr.cpp

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092698)
2021-07-31 00:29:23 +03:00
Vasily Leonenko 519cbbaa9a [PR] Instrumentation: Introduce instrumentation-binpath argument
Summary:
This commit introduces -instrumentation-binpath argument used
to point instuqmented binary in runtime in case if /proc/self/map_files
path is not accessible due to access restriction issues.

Vasily Leonenko
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092681)
2021-07-30 18:07:53 +03:00
Vasily Leonenko 285ac26d16 [PR] README: remove note about experimental status of instrumentation
Summary:
Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092666)
2021-06-25 16:27:47 +08:00
Vladislav Khmelevsky 361f3b5576 [PR] Instrumentation: Fix runtime handlers for PIE files
Summary:
This commit fixes runtime instrumentation handlers for PIE
binaries case.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092522)
2021-06-23 18:24:09 +00:00
Vasily Leonenko 9b39a823ea [PR] Instrumentation: Initial support for static executables
Summary:
This commit introduces static binaries instrumentation
support.  Note that current implementation does not support profile
output on the instrumented binary finalization. So it requires to use
-instrumentation-sleep-time=N (N>0) option usage.  Note: There is
unhandled case with static PIE executable which might have dynamic
header.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092471)
2021-06-21 01:59:38 +08:00
Elvina Yakubova 2ffd6e2b43 [PR] Instrumentation: Add support for opening libs based on links /proc/self/map_files
Summary:
This commit adds support for opening libs based on links
/proc/self/map_files.  For this we're getting current virtual address
and searching the lib in the directory with such address range. After
that, we're getting full path to the binary by using readlink
function. Direct read from link in /proc/self/map_files entries is not
possible because of lack of permissions.

Elvina Yakubova,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092422)
2021-01-19 02:08:55 +08:00
Elvina Yakubova 6665c628ea [PR] Instrumentation: Add readlink and getdents support
Summary:
This commit adds support for getting directory entries and
reading value of a symbolic link in instrumentation runtime library

Elvina Yakubova,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092362)
2021-01-18 22:08:10 +08:00
Vasily Leonenko ad79d51778 [PR] Instrumentation: Generate and use _start and _fini trampolines
Summary:
This commit implements new method for _start & _fini functions hooking
which allows to use relative jumps for future PIE & .so library support.
Instead of using absolute address of _start & _fini functions known on
linking stage - we'll use dynamically created trampoline functions and
use corresponding symbols in instrumentation runtime library.

As we would like to use instrumentation for dynamically loaded binaries
(with PIE & .so), thus we need to compile instrumentation library with
"-fPIC" flag to support relative address resolution for functions and
data.

For shared libraries we need to handle initialization of instrumentation
library case by using DT_INIT section entry point.

Also this commit adds detection if the binary is executable or shared
library based on existence of PT_INTERP header. In case of shared
library we save information about real library init function address
for further usage for instrumentation library init trampoline function
creation and also update DT_INIT to point instrumentation library init
function.

Functions called from init/fini functions should be called with forced
stack alignment to avoid issues with instructions which relies on it.
E.g. optimized string operations.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092316)
2021-06-19 04:08:35 +08:00
Amir Ayupov 60b10a8ead [BOLT][NFC] Unify isTailCall interface across X86 and AArch64
Summary:
Move the common code into MCPlusBuilder.h.
Use group 1 `kTailCall` MCAnnotation instead of dynamically allocated
annotation.
This diff reduces the processing time overhead to 1.5% vs using
TAILJMP opcode.

(cherry picked from FBD30055585)
2021-07-29 17:28:51 -07:00
Maksim Panchenko 89a2e16037 [BOLT] Support PLT sections with variable entry sizes
Summary:
The linker can generate 8- or 16-byte entries in .plt.got and .plt.sec
sections. On X86, the main differentiator is the presence of endbr64
instruction at the beginning of the entry. Detect the instruction and
adjust the size accordingly.

(cherry picked from FBD29847639)
2021-07-14 01:35:34 -07:00
Amir Ayupov c33f08e7df [BOLT] Update build instructions in README
Summary: Remove llvm.patch from build instructions.

(cherry picked from FBD29973395)
2021-07-28 14:45:10 -07:00
Joey Thaman a7e2a8f946 [BOLT] Tail Duplication active pass
Summary:
Amended the Tail Duplication
analysis pass to do the tail duplication in question

(cherry picked from FBD29833794)
2021-07-16 11:45:44 -07:00
Vasily Leonenko 68be8caf3f RewriteInstance: account .stab and .stabstr as debug sections
Summary:
.stab and .stabstr are special sections containing debugging
information and strings associated with the debugging information.
This commit adds them to the list of debugging sections, so
these sections can be removed for output binary.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD29746153)
2021-06-25 15:42:56 +08:00
James Luo 0df7bf7b8b [BOLT][CSSPGO] Handle indirect call promotion in Pseudo Probe Integration
Summary:
Match new direct call generated during ICP to correct pseudo probe

New call is matched to the probes of original call instruction.

(cherry picked from FBD29591662)
2021-07-16 16:05:18 -07:00
James Luo 3e55dea4dd [BOLT][CSSPGO] Encode pseudo probe section to binary
Summary:
Update .pseudo_probe section in input binary

DFS inline tree and emit pseudo probes with updated addresses

(cherry picked from FBD29522142)
2021-07-15 14:58:32 -07:00
Joey Thaman 2f46660559 [BOLT] Tail duplication analysis pass
Summary:
Created a binary pass that records how many
times tail duplication would be used and how many cache
misses it would theoretically stop

(cherry picked from FBD29619858)
2021-07-01 07:11:26 -07:00
Zino Benaissa 60b15062e1 [BOLT] Dump dynamic execution per instruction opcode
Summary:
We extended DynoStats to dump the histogram per instruction opcode. By
default the dump is turned off. Use '-print-dyno-opcode-stats' to enable
the dump.

BOLT also dumps for each instruction opcode the maximum execution count and
corresponding function name and basic block offsets where the instruction
occurs. Below is a sample of the dump:

                   Opcode,    Execution Count,      Max Exec Count, Function Name:Offset
                  SHR8rCL,                232,                 232, _ZNK5folly14AsyncSSLSocket4goodEv:53
                VPADDDYrr,              13956,                 388, chacha20_encrypt_bytes.part.0/3:736
               PMOVSXBWrr,                  4,                   2, ares_expand_name/1:264
                VMOVAPSmr,               1082,                  43, chacha20_encrypt_bytes.part.0/3:2864
                VPSHUFBrr,               9540,                1667, chacha20_encrypt_bytes.part.0/3:4416
            VPUNPCKLDQYrr,               1102,                 188, jsimd_ycc_rgb_convert_avx2/1:125
          VPBROADCASTQYrm,                 39,                  39, chacha20_encrypt_bytes.part.0/3:400
               PMOVSXWDrr,                  8,                   2, ares_expand_name/1:264
                   VPORrr,                817,                 129, jsimd_idct_islow_avx2/1:41
                  PSLLDri,            8690752,               65644, blockmix_salsa8_xor/1:1424

(cherry picked from FBD28859624)
2021-05-24 21:33:43 -07:00
Maksim Panchenko c9f5f47b51 [BOLT] Add support for .plt.sec and refactor PLT-reading code
Summary:
A binary can contain multiple PLT sections with different name and
attributes (such as an entry size). Extend the support to .plt.sec and
refactor the code to make future extensions simpler.

(cherry picked from FBD29502107)
2021-06-30 14:41:41 -07:00
Joey Thaman 4c12afc1f4 [BOLT][NFC] Resolved all clang-12 warnings for bolt
Summary:
clang-12 now compiles bolt without warnings.
Some warnings were fixed if possible while others were suppressed by
doing (void)variable for unused variable warnings or moving code inside
assert statements of LLVM_DEBUG blocks.

(cherry picked from FBD29469054)
2021-06-29 12:11:56 -07:00
Maksim Panchenko 1de0746790 [BOLT] Read all dynamic relocations and refactor code
Summary:
Add code to read more dynamic relocations (DT_JMPREL) and enforce strict
checks that corresponding sections sizes match .dynamic entry
description.

(cherry picked from FBD29502109)
2021-06-30 14:38:50 -07:00
Alexander Yermolovich f7499c6711 [BOLT][DWARF] Fix writing out dwo with DWP as input
Summary:
The code for writing out dwo files wasn't handling case where DWP is an input.
Because all the sections are part of the same binary.

One note with current implementation. .debug-str.dwo will have strings for all the dwo objects.
This is because llvm-dwp de-duplicates strings and combines them in to one section. It then re-writes .debug-str-offsets.dwo to point to new .debug-str.dwo section.

(cherry picked from FBD29244835)
2021-06-18 15:57:34 -07:00
Maksim Panchenko 3e5ce1f282 [BOLT][TESTS] Remove dynamic relocations from YAML tests
Summary:
Our YAML objects contain references to dynamic relocations via .dynamic,
but there are no corresponding relocation sections. Change .dynamic
contents to specify no dynamic relocations.

(cherry picked from FBD29502108)
2021-06-30 14:33:59 -07:00
Amir Ayupov a07d24cc4b [BOLT][NFC] Un-inline checking AArch64 linker veneers out of disassemble loop
Summary:
Move the AArch64 `matchLinkerVeneer` check out of a for-loop
in `BinaryFunction::disassemble`

(cherry picked from FBD29411348)
2021-06-25 17:52:51 -07:00
Amir Ayupov c7c0803b59 [BOLT][NFC] Un-inline indirect branch handling out of disassemble loop
Summary:
Move the `processIndirectBranch` switch statement out of a for-loop
in `BinaryFunction::disassemble`

(cherry picked from FBD29411346)
2021-06-25 17:49:43 -07:00
Amir Ayupov 8f751bc058 [BOLT][NFC] Un-inline adding external references out of disassemble loop
Summary:
Move the code that handles true external references (non-unreachable)
out of a for-loop in `BinaryFunction::disassemble`.

(cherry picked from FBD29411345)
2021-06-25 17:32:25 -07:00
Amir Ayupov 8f7a400629 [BOLT][NFC] Delete MoveRelocations entirely
Summary: MoveRelocations are unused. Remove interfaces and emission part.

(cherry picked from FBD29468409)
2021-06-25 17:06:21 -07:00
Maksim Panchenko 38c5887992 [BOLT][NFC] Always process runtime relocations
Summary:
Dynamic relocations applied at runtime should be processed even in
non-relocation mode.

(cherry picked from FBD29311906)
2021-06-22 13:46:06 -07:00
Amir Ayupov ef1b1e7184 [BOLT][NFC] Refactor handlePCRelOperand
Summary: Move error logging to handlePCRelOperand, reduce code duplication

(cherry picked from FBD29309702)
2021-06-17 17:41:28 -07:00
Amir Ayupov b964e852d5 [BOLT][NFC] Readability improvements in X86,Aarch64 MCPlusBuilder
Summary: Minor refactorings in target specific MCPlusBuilders to improve readability

(cherry picked from FBD29309701)
2021-06-17 18:22:32 -07:00
James Luo dea6c247d9 [BOLT][CSSPGO] Relate decoded pseudo probe basic blocks
Summary:
Assign decoded pseudo probe to correlated output block

Pseudo probes can then be encoded to a proper address

(cherry picked from FBD29211688)
2021-06-25 11:42:58 -07:00
Amir Ayupov 521a61b056 [BOLT][NFC] Use MCPlusBuilder::isPseudo
Summary: Consistently use this interface across BOLT codebase

(cherry picked from FBD29171718)
2021-06-16 12:10:20 -07:00
Amir Ayupov da276d73c7 [BOLT] Handle R_X86_64_64 in flushPendingRelocations
Summary:
Handle R_X86_64_64 the same way as R_X86_64_32;
`getSizeForType` takes care of the size:

```x86_64 ABI relocation types
Name        Value Field  Calculation
R_X86_64_64 1     word64 S + A
R_X86_64_32 10    word32 S + A
```

(cherry picked from FBD29370417)
2021-06-24 12:18:16 -07:00
Maksim Panchenko f46af9e9bc [BOLT][TESTS] Fix ICF test case
Summary:
Host compiler may generate duplicate functions and as a result BOLT can
fold more than 1 function.

(cherry picked from FBD29347302)
2021-06-23 16:13:30 -07:00
Joey Thaman be0da0fac2 Throw an error in instrument for dynamic libs
Summary:
In InstrumentatonRuntimeLibrary, throw an error

if the program uses dynamic libraries

(cherry picked from FBD29265147)
2021-06-21 07:45:52 -07:00
Maksim Panchenko bbbd159ccb [BOLT] Fix undefined symbol warnings/errors
Summary:
When we fold a function in relocation mode, make sure to clear its state
to avoid emitting relocations against undefined symbols.

(cherry picked from FBD29245320)
2021-06-18 14:35:39 -07:00
Sameeran joshi ba915af1cd [PR][BOLT] Print revision in perf2bolt and bolt-diff modes"
Summary:
Fix issue facebookincubator/BOLT#160
PR facebookincubator/BOLT#172

(cherry picked from FBD29139522)
2021-06-08 23:28:37 +05:30
Rafael Auler e485a9830b Rebase: [BOLT][DebugFission] Fix reading support for DWP
Summary:
Dived more in to DWARF APIs and llvm-symbolizer this is a more streamline way of doing it, and address base gets set properly.
Writing out dwo files with dwp input will be separate patch.

(cherry picked from FBD31361529)
2021-06-16 09:52:03 -07:00
Vladislav Khmelevsky a8b9319536 [PR] Patch allocatable relocations for AArch64
Summary: PR facebookincubator/BOLT#166

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28910060)
2021-06-02 00:03:56 +03:00
Vladislav Khmelevsky 2cf9008a60 [PR] Instrumentation: Disable signals on mutex lock
Summary:
When indirect call is instrmented it locks SimpleHashTable's mutex on get() call.
If while locked we we receive a signal and signal handler also will call
indirect function we will end up with deadlock.

PR facebookincubator/BOLT#167

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28909921)
2021-06-04 19:51:06 +03:00
Maksim Panchenko 1efadeedf2 [BOLT] Fix rodata load simplification pass
Summary:
If the target address has a runtime relocation against it, do not
perform the load simplification.

(cherry picked from FBD29091939)
2021-06-13 15:37:31 -07:00
Amir Ayupov f7f0a571d7 [BOLT][NFC] Suppress addList override warning
Summary:
Suppresses the warning
```
src/DebugData.h:338:20: warning: 'addList' overrides a member function but is not marked 'override' [-Wsuggest-override]
```

(cherry picked from FBD28858201)
2021-06-02 19:12:13 -07:00
James Luo 8a919593c7 [BOLT][CSSPGO] Pseudo probe decoding
Summary:
Make bolt decode pseudo probe section in binary

For more detail of pseudo probe, check https://reviews.llvm.org/D86490.

(cherry picked from FBD28856316)
2021-06-11 13:06:12 -07:00
Alexander Yermolovich 226d1c3b0b [BOLT] Change how DF DWO logging is handled
Summary: Changing assert to a warning when DWO debug information can't be retrieved. Usually due to invalid path.

(cherry picked from FBD29005217)
2021-06-09 12:55:09 -07:00
Amir Ayupov 2da5b12a3d [BOLT] Hugify: check for THP support via sysfs
Summary:
Remove dependence on kernel version check, query sysfs directly
instead.

(cherry picked from FBD28858208)
2021-06-02 19:11:52 -07:00
Maksim Panchenko 7bccf8d25d [BOLT][NFC] Fix debug info printouts for inlined functions
Summary:
While printing debug info for instructions, we should use line tables
from the corresponding DWARF CU which could be different from the
containing function CU in case of inlined instructions.

(cherry picked from FBD28908324)
2021-06-04 12:31:31 -07:00
Amir Ayupov 65d227c035 [BOLT][TEST] Fix test case to conform to analyzePICJumpTable pattern matching
Summary:
Make sure that jump table is properly recognized in
`split_func_jump_table_fragment.s`.

(cherry picked from FBD28839976)
2021-06-02 10:50:47 -07:00
James Luo 1c06193d0f [BOLT] Resolve JumpTable namespace issue in pseudo probe decoder migration
Summary: This diff fixes the JumpTable namespace conflicts during the migration of pseudo probe decoder.

(cherry picked from FBD28859927)
2021-06-02 22:46:57 -07:00
Maksim Panchenko a26370389a [BOLT][NFC] Disable ProcessAllSections in RuntimeDyld
Summary:
FBD55943 changed the way ProcessAllSections works in RuntimeDyld. After
the change, all sections, including symbol table, section table, etc.
are loaded into memory whenever ProcessAllSections is enabled.

In BOLT we rely on RuntimeDyld for processing sections with relocations.
These include most allocatable sections and additionally .debug_line.
The latter is skipped by RuntimeDyld without ProcessAllSections flag.
If we enable ProcessAllSections, we will have to deal with allocating
memory for more sections than we need (see above) and later to filter
them out.

The alternative is to mark all sections that we actually plan to use as
"required for execution" (using RuntimeDyld terminology). For
.debug_line section on ELF it means adding SHF_ALLOC flag. On MachO,
RuntimeDyld currently treats all sections as required.

(cherry picked from FBD28729398)
2021-05-26 16:23:34 -07:00
Vladislav Khmelevsky 5a6c379f5b [PR] Instrumentation: Emit paddings to preserve data alignment
Summary:
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

facebookincubator/BOLT#156

(cherry picked from FBD28521843)
2021-05-14 14:09:05 +03:00
Vladislav Khmelevsky 79807d99fe [PR] Introduce loop inversion pass
Summary:
This patch introduces LoopInversionPass. Its main purpose is to ensure
that the loop layout is optimal depending on the profile information. So
if profile information shows that the loop is used, the unconditional
jump instruction must be executed only once and vice-versa. Please take
a look to the pass header file and test for more details.

Also change link_fdata script a bit, to be able to change FDATA prefix,
like FileCheck does.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

PR facebookincubator/BOLT#153

(cherry picked from FBD28391811)
2021-05-11 20:59:13 +03:00
Amir Ayupov 12e9fec697 Rebase: [BOLT] DebugFission Support
Summary:
Implemented support for Debug Fission.
For the most part it doesn't impact Monolithic execution path.
One area that was changed is the DW_AT_low_pc/DW_AT_high_pc conversion. Before it was to DW_AT_ranges/DW_AT_low_pc, now DW_AT_low_pc is kept in same place.
Another more visible impact is in Skeleton CU the DW_AT_low_pc is replaced with DW_AT_ranges_base if it's not originally present and bolt converted ranges conversion inside the dwo units.

Output of this are multiple .dwo files with updated debug information.

(cherry picked from FBD29569788)
2021-04-01 11:43:00 -07:00
Amir Ayupov 99d7f90635 [BOLT][NFC][TEST] Added llvm-dwarfdump and llvm-mc to BOLT_TEST_DEPS
(cherry picked from FBD28427352)
2021-05-13 15:36:43 -07:00
Maksim Panchenko ba6fdb8113 [BOLT] Preserve original jump table relocations
Summary:
Remove relocations against internal function labels, e.g. jump table
relocations, only when overwriting them.

While reading an input file with relocations, we create internal
relocations against code references (we skip PIC relocations).
Later, when we discover jump tables, we remove corresponding relocations
with the assumption that original relocations will either be ignored or
replaced by new relocations. However, it is possible to miss some
references to the jump table, in which case the original entries will
not be ignored. While such situation is abnormal, it is still a
better/safer approach to preserve relocations if we are not replacing
them with new ones.

(cherry picked from FBD28406628)
2021-05-12 23:35:10 -07:00
Maksim Panchenko 81c59d9a54 [BOLT][NFC] Change interface for searching relocations
(cherry picked from FBD28406629)
2021-05-12 23:29:04 -07:00
Amir Ayupov 500edf26c9 [BOLT][NFC] Address warning about ProgramPoint implicit copy constructor
Summary:
Explicit assignment operator can be replaced with an implicit one.
Remove it to allow an implicit copy constructor:
```
bolt/src/Passes/DataflowAnalysis.h:74:8: warning: definition of
implicit copy constructor for 'ProgramPoint' is deprecated because it
has a user-declared copy assignment operator [-Wdeprecated-copy]
  void operator=(const ProgramPoint &PP) {
       ^
bolt/src/Passes/DataflowAnalysis.h:62:14: note: in implicit copy
constructor for 'llvm::bolt::ProgramPoint' first required here
      return ProgramPoint(&*Last);
```

(cherry picked from FBD28335138)
2021-05-10 14:16:25 -07:00
Maksim Panchenko fe37f1870e [BOLT][NFC] Follow LLVM variable initialization style
(cherry picked from FBD28417604)
2021-05-13 10:50:47 -07:00
Vladislav Khmelevsky b728bfc70a [PR] Add missing includes
Summary:
Adds missing headers removed by IWYU.
NB: this caused build breakage on ubuntu-latest

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28368185)
2021-05-11 15:55:57 +03:00
Vladislav Khmelevsky de298c08fd [PR] Fix tests build with -no-pie option
Summary:
Since gcc/ld could produce and expect PIE files we need to pass -no-pie option to avoid linking errors for tests.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28360045)
2021-05-11 03:25:49 +03:00
Alexey Moksyakov ce84e9607a [PR] Fix bb reordering optimization
Summary:
Reorder-blocks optimization pass doesn't take into account that
available offset for legacy Jcc instructions (for example,
JRCXZ - operand 8 bits) has to be less than 255 bytes.
It's rare case and to exclude such functions with unsupported
instructions from optimization passes added extra checking

Alexey Moksyakov
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28264117)
2021-04-23 11:34:40 +03:00
Amir Ayupov 9a884543f1 [BOLT][NFC] Avoid unnecessary copies with push_back
Summary: Small refactoring inspired by clang-tidy modernize-use-emplace

(cherry picked from FBD28307493)
2021-05-07 18:43:25 -07:00
Amir Ayupov 94653797f3 Rebase: [BOLT][NFC] Avoid binutils in tests
Summary:
Replace binutils tools with llvm tools

(cherry picked from FBD29575630)
2021-05-04 16:45:28 -07:00
Amir Ayupov eb99a6665c Rebase: [BOLT][NFC] Remove unneeded includes with include-what-you-use
Summary:
Ran iwyu multiple times, manually picked header remove lines.
Reached fixed point wrt removal: iwyu doesn't automatically remove
any more headers or forward declarations.

(cherry picked from FBD29569221)
2021-04-30 13:54:02 -07:00
Maksim Panchenko 5239182075 [perf2bolt] Further relax segment matching
Summary:
Previously, we used p_align value of the code segment to predict the
mapping of the segment at runtime. However, at times the reported
value is not aligned and at other times the actual aligned value will
be different because of the different page size used.
All we know is that the page size used at runtime should not exceed
p_align value. Adjust our segment address matching accordingly.

(cherry picked from FBD28133066)
2021-04-30 15:02:29 -07:00
Maksim Panchenko bd86c06c1b [BOLT][NFC] Remove CFIReaderWriter::fdes()
(cherry picked from FBD27918126)
2021-04-21 12:33:08 -07:00
Maksim Panchenko f8fa3e97d5 [BOLT] Remove -dump-eh-frame option
Summary: The option duplicates functionality of "llvm-dwarfdump -eh-frame".

(cherry picked from FBD27917505)
2021-04-21 12:13:22 -07:00
Maksim Panchenko 3355936e14 [BOLT][NFC] Remove RewriteInstance::EHFrame
(cherry picked from FBD27915725)
2021-04-21 11:24:15 -07:00
Amir Ayupov f84f451a54 [BOLT][NFC] Use const reference for MCInstrDesc
Summary:
Addressing comments from the review for "Expand auto types".
Use const reference in MCPlusBuilder for MCInstrDesc where the copy
is not necessary.

(cherry picked from FBD27844344)
2021-04-17 21:48:46 -07:00
Amir Ayupov c7306cc219 Rebase: [BOLT][NFC] Expand auto types
Summary:
Expanded auto types across BOLT semi-automatically with the aid
of clangd LSP

(cherry picked from FBD33289309)
2021-04-08 00:19:26 -07:00
Rafael Auler dc2673a039 [BOLT] Fix value invalidation bug in runtimelib
Summary:
We can't use a fragment of the old LibPath as an input
to create a new one.

(cherry picked from FBD27642728)
2021-04-07 21:40:23 -07:00
Rafael Auler 35732d954b [BOLT] Remove cantFail in getAddressRanges calls
Summary:
We may have a CU with empty ranges, so accept errors coming
from DWARFDie::getAddressRanges(). This happens when using tools that
selectively strip debuginfo from the binary.

(cherry picked from FBD27602731)
2021-04-06 12:57:09 -07:00
Amir Ayupov f1bfb18ceb [BOLT] Refactor SectionPatchers map to a Patcher in BinarySection
Summary:
Refactor SectionPatches to avoid the use of extra map and a cast
from StringRef to std::string.

cherry-picked from FBD26756560

(cherry picked from FBD27490641)
2021-03-18 13:06:18 -07:00
Amir Ayupov 081e39aa15 Rebase: [cherry-pick] [BOLT] Add option to skip writing an output file
Summary:
The user may wish to run BOLT for printing statistics only
(i.e. to check that the profile is valid). Add an option to run BOLT
without writing any output file, similar to a dry run. This option
is triggered by supplying -o with "/dev/null".

(cherry picked from FBD29568632)
2021-03-29 16:04:57 -07:00
Maksim Panchenko e7169be93f [BOLT] Do not assert on jump table heuristic failure
Summary:
During the initial indirect jump analysis, we used to assert that the
discovered jump table type matched the pattern of the corresponding
instruction sequence. E.g., for PIC jump table memory we expected the
PIC jump table instruction sequence. The assertions were too
conservative, as in the case of a mismatch we can mark the indirect jump
as having an unknown control flow. That should be sufficient to either
skip the function processing or rely on relocation information for
possible recovery of the control flow.

(cherry picked from FBD27255816)
2021-03-23 13:41:41 -07:00
Rafael Auler b3c34d568a [BOLT] Fix instrumentation bug in duplicated JTs
Summary:
Fix a bug with instrumentation when trying to instrument
functions that share a jump table with multiple indirect
jumps. Usually, each indirect jump that uses a JT will have its own
copy of it. When this does not happen, we need to duplicate the jump
table safely, so we can split the edges correctly (each copy of the
jump table may have different split edges). For this to happen, we
need to correctly match the sequence of instructions that perform the
indirect jump to identify the base address of the jump table and patch
it to point to the new cloned JT. It was reported to us a case in
which the compiler generated suboptimal code to do an indirect jump
which our matcher failed to identify.

Fixes facebookincubator/BOLT#126

(cherry picked from FBD27065579)
2021-03-15 16:34:25 -07:00
Maksim Panchenko b11c826889 [BOLT] Fix false references to zero-sized objects
Summary:
Whenever BOLT encounters a data reference in code, it tries to convert
it into <Object+Offset> form. The primary reason behind this approach is
to support read-only data-reordering optimization. However, with the
current level of the linker and compiler support we don't have enough
information to always correctly restore the original <Object+Offset>.
E.g. with zero-sized symbols we have to speculate that the actual size
of the underlying object extends to the next symbol. Most of the time,
there will be an object pointed by a zero-sized symbol and even
if we are guessing incorrectly, there will be no harm in creating
references of such form.

The problem happens when there's no object corresponding to the original
symbol and the next object is an (unmarked) jump table:

  A:                   # <- zero-sized object
  .LJUMP_TABLE:
    .long <entry1>
    .long <entry2>
    ....
  .LB:
    .long 21
  .LC:
    .long 42

The jump table will be moved and all references past it (up to the next
named object) will be incorrectly updated.

We should not speculate about the size of A in a case like that and
treat all discovered data objects (and thus references) independently.

(cherry picked from FBD27005660)
2021-03-15 12:06:56 -07:00
Vladislav Khmelevsky 76d346ca14 [BOLT][PR] Instrumentation: Introduce -no-counters-clear and -wait-forks options
Summary:
This PR introduces 2 new instrumentation options:
1. instrumentation-no-counters-clear: Discussed at https://github.com/facebookincubator/BOLT/issues/121
2. instrumentation-wait-forks: Since the instrumentation counters are mapped as MAP_SHARED it will be nice to add ability to wait until all forks of the parent process will die using tracking of process group.
The last patch is just emitBinary code refactor.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/125
GitHub Author: Vladislav Khmelevskyi <Vladislav.Khmelevskyi@huawei.com>

(cherry picked from FBD26919011)
2021-03-09 16:18:11 -08:00
Maksim Panchenko 225a8d7f2c [BOLT] Ignore TBSS section at layout time
Summary:
TBSS section is a "virtual" section that does not take memory or file
space. Ignore it completely while adjusting section sizes.

(cherry picked from FBD26824484)
2021-03-04 16:31:12 -08:00
Vladislav Khmelevsky ec9751eef5 [BOLT][PR] readDynamicRelocations: Skip NONE relocations
Summary:
NONE relocations should not be processed during dynamic relocations read process
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/118
GitHub Author: Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com>

(cherry picked from FBD26489881)
2021-02-17 15:36:58 -08:00
Alexander Yermolovich 06959eedcf Fix up test for Update DW_AT_stmt_list for .debug_types
Summary: As titled.

(cherry picked from FBD28112186)
2021-03-17 17:08:26 -07:00
Rafael Auler da752c9c5c Fix license for a few remaining files
Summary: As titled.

(cherry picked from FBD28112137)
2021-03-17 15:04:19 -07:00
Alexander Yermolovich 0ec91a25df Update DW_AT_stmt_list for .debug_types
Summary:
There is no real link between CU and TU, so relying on fact
that address are the same, and we are updating all of them.

(cherry picked from FBD28112114)
2021-02-17 15:30:10 -08:00
Rafael Auler 16521f1f79 [BOLT] Update license headers
Summary: Update license and fix headers for some files.

(cherry picked from FBD28112041)
2021-03-15 18:04:18 -07:00
Amir Ayupov 1c5d3a056c Rebase: Merge BOLT codebase in monorepo
Summary:
This commit is the first step in rebasing all of BOLT
history in the LLVM monorepo. It also solves trivial build issues
by updating BOLT codebase to use current LLVM. There is still work
left in rebasing some BOLT features and in making sure everything
is working as intended.

History has been rewritten to put BOLT in the /bolt folder, as
opposed to /tools/llvm-bolt.

(cherry picked from FBD33289252)
2020-12-01 16:29:39 -08:00
Alexander Shaposhnikov 0a8aaf56bb [BOLT] Add support for reading profile on Mach-O
Summary: Add support for reading profile on Mach-O.

(cherry picked from FBD25777049)
2021-01-29 16:37:07 -08:00
Alexander Shaposhnikov a0dd5b05dc [BOLT] Add support for dumping profile on MacOS
Summary: Add support for dumping profile on MacOS.

(cherry picked from FBD25751363)
2021-01-28 12:44:14 -08:00
Alexander Shaposhnikov 3b876cc3e7 [BOLT] Add support for dumping counters on MacOS
Summary: Add support for dumping counters on MacOS

(cherry picked from FBD25750516)
2021-01-28 12:32:03 -08:00
Alexander Shaposhnikov 6a84124e1d [BOLT] Add support for __literal16 section on MachO
Summary:
1. Add support for __literal16 section in the instrumentation runtime library for MacOS.
2. Fix emitting __counters section.

(cherry picked from FBD25746342)
2021-01-28 12:04:46 -08:00
Sergey Pupyrev fea6b4e469 an updated version of ExtTSP
Summary:
a few minor updates in block reordering:
- some refactoring to improve readability;
- optimized chain splitting strategy to improve quality of layout and performance of the algorithm.

(cherry picked from FBD25126220)
2021-01-27 18:29:16 -08:00
Alexander Shaposhnikov d6e60c5bec [BOLT] Enable intToStr for MacOS
Summary: Enable intToStr et al. in the runtime library for MacOS.

(cherry picked from FBD25745358)
2021-01-20 16:40:17 -08:00
Alexander Shaposhnikov faaefff618 [BOLT] Fix operator new signature
Summary:
Use size_t for the first parameter of operator new.
https://en.cppreference.com/w/cpp/memory/new/operator_new

(cherry picked from FBD25750921)
2021-01-20 12:56:41 -08:00
Amir Ayupov a86cd533b3 [BOLT] Fix missing newlines in debug prints
(cherry picked from FBD25966797)
2021-01-19 18:43:16 -08:00
Rafael Auler 0de92b8346 [PERF2BOLT] Relax segment matching requirements
Summary:
When looking at perf.data's available binaries and their
respective mmap'ed segments, match them with the input binary by
looking at both aligned and non-aligned addresses. If we suppose
the alignment is the mmap'ed page size, we may miss some cases and
perf2bolt will refuse to proceed because it failed to match the
input binary with a process recorded in perf.data.

(cherry picked from FBD25732673)
2021-01-11 06:24:46 -08:00
Rafael Auler e3898d5969 [BOLT] Add threshold options for lite mode
Summary:
Add options for trading processing speed for binary performance.

  -lite-threshold-pct=<uint>
    Threshold (in percent) for selecting functions to process in lite
    mode. Higher threshold means fewer functions to process.
    E.g threshold of 90 means only top 10 percent of functions with
    profile will be processed.

  -lite-threshold-count=<uint>
    Similar to '-lite-threshold-pct' but specify threshold using
    absolute function call count. I.e. limit processing to functions
    executed at least the specified number of times.

  -no-scan
    Do not scan cold functions for external references (may result in
    slower binary).

(cherry picked from FBD24739092)
2020-12-30 12:23:58 -08:00
Rafael Auler e0261a22ce [TEST] Remove dependency on debug output
Summary:
Test mistakenly used -debug output, which makes it fail on
no-asserts build.

(cherry picked from FBD25399449)
2020-12-09 12:25:58 -08:00
Rafael Auler d2f68039bc [BOLT] Fix shrinkwrapping bug when changing frame alignment
Summary:
This fixes a bug with shrink wrapping when trying to move
push-pops in a function where we are not allowed to modify the
stack layout for alignment reasons. In this bug, we failed to
propagate alignment requirement upwards in the call graph from
function A to B when: (1) there is a cycle in the call graph and
(2) the distance from A to B is greater than 1 in the call graph
and (3) there is a node in the path from A to B, not including
A or B, that does not access parameters in the stack.

(cherry picked from FBD25315977)
2020-12-03 20:09:32 -08:00
Alexander Shaposhnikov e067f2adf4 Inject instrumentation's global dtor on MachO
Summary:
This diff is a preparation for dumping the profile generated by BOLT's instrumenation on MachO.

1/  Function "bolt_instr_fini" is placed into the predefined section "__fini"

2/ In the instrumentation pass we create a symbol "bolt_instr_fini" and
replace the last global destructor with it.

This is a temporary solution, in the future we need to register bolt_instr_fini in addition to the existing destructors without dropping the last one.

(cherry picked from FBD25071864)
2020-11-19 18:18:28 -08:00
Alexander Shaposhnikov 1b258b8908 Refactor syscall wrappers for OSX
Summary: Refactor syscall wrappers for OSX.

(cherry picked from FBD25084642)
2020-11-19 14:56:45 -08:00
Amir Ayupov f9d00d418b [BOLT] Handle insertion of updated CFI at the first basic block
Summary:
Fix corner case of insertion of updated CFI with unset `PrevBB`.
Handle it in the same way as inserting past hot-cold split point.

(cherry picked from FBD24943911)
2020-11-17 18:40:19 -08:00
Alexander Shaposhnikov 1cf23e5ee8 Link the instrumentation runtime on OSX
Summary: Link the instrumentation runtime on OSX.

(cherry picked from FBD24390019)
2020-11-17 13:57:29 -08:00
Maksim Panchenko 7eaf63a118 [BOLT] Fix data race while running split functions pass
Summary:
In BinaryContext::calculateEmittedSize(), after the temporary code
emission, we have to perform a cleanup and mark all symbols used
during the emission as undefined and unregistered (so that we can emit
them again later). The cleanup is happening even for symbols that were
referenced and not defined by emitted code.

If all emitted symbols are local, there is no risk that one thread will
define a symbol while some other thread will undefine it in its cleanup
code. Such behavior is expected as local symbols can only be referenced
within the containing function and each function is processed in one
thread. However, secondary entry points have associated global symbols
and if we emit them, then it is possible for a thread to undefine
a symbol while the other thread had defined it and was in the process of
emitting the fragment with it. In such case, a data race may happen and
the thread that contains the definition of the symbol may define it
twice causing a redefinition error.

To avoid the data race, we skip the emission of secondary entry global
symbols when emitting code used only for the size estimation.

(cherry picked from FBD24986007)
2020-11-16 14:34:51 -08:00
Sergey Pupyrev 1e9b733008 a new version of hfsort+
Summary:
A faster and better version of function reordering:
- fixed a bug when some computed probabilities were negative;
- changed an O(n^2) loop to a priority queue to find a candidate of chains to merge

(cherry picked from FBD24571208)
2020-11-14 13:18:58 -08:00
Amir Ayupov 6401af89c7 [BOLT] Support jump tables in split fragments with entries pointing back to parent functions
Summary:
Support jump tables belonging to split fragments with entries
pointing back to parent functions.
While skipping such families of functions, make sure to use the
topmost fragment to ignore its fragments.

(cherry picked from FBD24907438)
2020-11-12 11:54:51 -08:00
Amir Ayupov e8234b3b98 [BOLT] Add invalid offset for a JT entry pointing to a fragment
Summary:
In a jump table identification, register an invalid offset for jump table
entries pointing to function fragments.
These invalid offsets have no effect other than padding the jump
table size, calculated as `max(OffsetEntries, Entries)`.
Correct jump table size is required in strict mode (enabled by default
in aggregation mode by `perf2bolt`) in accounting of all PC-relative
relocations in data.
Functions containing these jump tables with invalid offsets are
marked to be ignored immediately afterwards in
`populateJumpTables`.

(cherry picked from FBD24897464)
2020-11-12 11:54:44 -08:00
Amir Ayupov 157129b751 [BOLT] Debug logging in analyzeJumpTable
Summary:
Added debug logging in/around `analyzeJumpTable`:
- Dump jump table entries as they are being processed:
```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2)
  * Checking 0x428ff40 -> OK: real entry
  * Checking 0x428ff44 -> OK: real entry
  * Checking 0x428ff48 -> OK: real entry
  * Checking 0x428ff4c -> OK: real entry
  * Checking 0x428ff50 -> OK: real entry
  * Checking 0x428ff54 -> OK: address in split fragment
  * Checking 0x428ff58 -> OK: address in split fragment
  * Checking 0x428ff5c -> OK: address in split fragment
  * Checking 0x428ff60 -> OK: address in split fragment
  * Checking 0x428ff64 -> OK: real entry
  * Checking 0x428ff68 -> OK: real entry
  * Checking 0x428ff6c -> OK: real entry
  * Checking 0x428ff70 -> OK: real entry
BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2)
  * Checking 0x428ff74 -> OK: real entry
  ...
```
- Dump skipped functions:
```
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2)
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2)
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2)
```
- Dump values of unclaimed PC-relative relocations in data.

(cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
Amir Ayupov c0cb550536 Minimize X86/shrinkwrapping-critedge test case
Summary: Minimized test case while preserving the CFG subgraph with an issue

(cherry picked from FBD24871063)
2020-11-10 21:22:57 -08:00
Amir Ayupov e54d389799 [BOLT] Disable DynoStats printing after SCTC
Summary:
Introduce new BinaryFunction flag `IsCanonicalCFG`, which gets
unset by SCTC pass. Make DynoStats collection conditional on this
new flag.
SCTC leaves CFG in a state where branch counters of BBs with tail
calls/conditional tail calls are not available (except via annotations,
which get stripped by `lower-annotations`). Without branch
counters, DynoStats are invalid.

(cherry picked from FBD24558050)
2020-11-10 10:51:23 -08:00
Amir Ayupov c36b71686c Improve cold fragment name matching
Summary:
Fix cold fragment name matching regex by replacing existing
regexes `.*\.cold\..*` and  `.*\.cold`
and combining them into `.*\.cold(\.\d)?`,
applied to restored name (with BOLT-added suffixes stripped)

This allows matching names like "execute_stack_op.cold/1", which
previously weren't recognized.

(cherry picked from FBD24804880)
2020-11-09 12:38:51 -08:00
Amir Ayupov f86a78a4e7 Lost in rebase: call registerFragment with a reference to TargetBF
Summary: Fixes broken build due to a lost dereferencing

(cherry picked from FBD24799948)
2020-11-06 12:22:22 -08:00
Amir Ayupov 2b09d672ce Conservatively handle jump tables in split functions
Summary:
- Allow jump table entries to point to locations inside the function and its fragments.
Reasoning behind this is that jump table identification has the logic of stopping at entry which belongs to a function different from the one originally referencing jump table. This assumption is invalid for jump tables with entries pointing to both parent function and cold fragments, leading to "unclaimed PC-relative relocations" assertion.

- Add fragment identification heuristic based on function name regex and contiguous jump table entries.
Currently, parent-to-fragment relationship is set up based on interprocedural references – direct references from the parent function. These references don't include references through jump table.
Additionally, some fragments are only reachable through jump table. In that case, in order to fully consume jump table, add parent-to-fragment relationship during `analyzeJumpTable` using the following heuristics:
  1. Fragment is identified as such based on name (contains `.cold.` part), but
  2. Parent function is not set – no direct interprocedural references to that fragment, and
  3. Fragment has the name of the form <parent>.cold(.\d+)

* For split functions with jump table entries spanning parent and fragments, mark parent and all fragments as ignored.

(cherry picked from FBD24456904)
2020-11-06 11:19:03 -08:00
Amir Ayupov dc48354f71 processInterproceduralReferences: record references to cold fragments as entry points
Summary:
For interprocedural references to fragments, record them as
fragment entry points. Not registering these entry points leads to
UCE removing the blocks and "Undefined temporary symbol"
assertion.

(cherry picked from FBD24511281)
2020-11-06 10:57:47 -08:00
Amir Ayupov 5452287710 Extract BinaryContext::registerFragment
Summary: registerFragment to be reused in adding fragments reachable only through jump tables.

(cherry picked from FBD24656651)
2020-11-06 10:27:33 -08:00
Vladislav Khmelevsky 58460460d9 [BOLT][PR] Handle TLS relocations on AArch64
Summary:
Some of the TLS relocatios like R_AARCH64_TLSDESC_ADR_PAGE21 must be
handled by bolt and should not be skipped by the removed condition. Some
of the TLS relocations like R_AARCH64_TLS_TPREL64 could really be skipped
here, but AFAIU this condition was added as part of BOLT its self optimization, so
to prevent future problems here my suggestion is not to add another condition
like "isTLS(RType) && isTLSRelocatable(RType)", but just remove it since
absense of this condition should not broke any other TLS relocation.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/103
GitHub Author: Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com>

(cherry picked from FBD24745928)
2020-11-04 16:45:58 -08:00
Maksim Panchenko 4f4239ceba [BOLT] Fix C++ exceptions for shared objects
Summary:
Fix several issues to make C++ exceptions work in shared objects:
  * Set MCObjectFileInfo PIC type based on the input binary type.
  * Support indirect (DW_EH_PE_indirect) encoding while writing
    exception Type Table.
  * Use different LPStart value and landing pad encoding for .so's.
  * Disable splitting of exception-handling code for .so's because of
    the new encoding.

(cherry picked from FBD24698765)
2020-11-04 11:44:02 -08:00
Rafael Auler c1bb4dcb2b [BOLT] Remove threaded EliminateUnreachableBlock version
Summary:
EliminateUnreachableBlocks has a data race because it depends
on BinaryContext::computeCodeSize. computeCodeSize supports independent
Emitters, enabling a lock-free execution. Unfortunately, that is almost
as expensive as the lock. Removing the boilerplate code for
parallellization of this pass turned out to be the best alternative: no
races and slightly better execution time for HHVM.

(cherry picked from FBD24716250)
2020-11-03 11:28:59 -08:00
Rafael Auler 37921b489a [BOLT] Please sanitizers
Summary:
In BinaryContext, we had StringRef holding a reference to
an r-value std::string. This triggers clang's address sanitizer
warnings. In MCPlusBuilder we had a left shift overflowing a type,
which is undefined behavior. Similarly, in CallGraph, we had a hash
function shifting a negative value, which is also UB. The last two
triggers the UB sanitizer.

(cherry picked from FBD24661045)
2020-10-30 15:11:52 -07:00
Rafael Auler 3e78082c1b [DOCS] Add instrumentation instructions to README
Summary: Add basic instructions on how to instrument a binary.

(cherry picked from FBD24660183)
2020-10-30 14:45:30 -07:00
Rafael Auler eb12d719ac [BOLT] Fix no-asserts build
Summary: Only use dump() method under DEBUG() macro.

(cherry picked from FBD24666481)
2020-10-30 19:59:07 -07:00
Maksim Panchenko 6b185cccf4 [BOLT] Always keep dynamic symbols defined
Summary:
Some symbols in .dynsym will be erroneously marked as belonging to a
non-allocatable section that BOLT can remove. In that case, keep the
original invalid index for such symbols instead of setting the UNDEF
index.

(cherry picked from FBD24488677)
2020-10-22 16:35:29 -07:00
Amir Ayupov 5f2f96c4c9 Add pass number to dot dump filename
Summary:
Change .dot dumps filename format from
  <function>-<passname>.dot
to
  <function>-<passidx>_<passname>.dot
This change helps navigate dumps by making the pass order explicit.
Example:
  execute_stack_op.cold.6-1(*2)-00_build-cfg.dot
  execute_stack_op.cold.6-1(*2)-01_validate-internal-calls.dot
  execute_stack_op.cold.6-1(*2)-02_strip-rep-ret.dot
  ...

(cherry picked from FBD24452903)
2020-10-21 17:08:32 -07:00
Maksim Panchenko d91add0bfe [BOLT] Fix PatchEntries pass
Summary:
While refactoring the pass, I removed the important transactional
property of the patching process. Restore it.

(cherry picked from FBD24440214)
2020-10-21 12:31:09 -07:00
Maksim Panchenko d6d88399fc [BOLT] Enable lite mode by default with relocations
Summary:
When optimizing input with relocations, make it faster and less
memory-hungry with lite mode.

(cherry picked from FBD24374241)
2020-10-17 15:09:06 -07:00
Rafael Auler e4396c41da [BOLT] Ignore __hot_start, __hot_end from input
Summary:
When -hot-text is on, do not read __hot_start and __hot_end
from input (inserted by a linker script with the intent of ordering
functions). This can confuse BOLT into creating a function with this
name depending on which address the symbol lands and we will assert
when trying to emit our own __hot_start/__hot_end with symbol
redefinition.

(cherry picked from FBD24366636)
2020-10-17 00:50:27 -07:00
Alexander Shaposhnikov 6133d2598b Inject a hook into the entry point on MachO
Summary:
This diff is a preparation for loading the runtime on MachO.
The proposed schema is the following:

1/  Function "bolt_instr_setup" is placed into the predefined section "setup" (in the final setting this function will be coming from the instrumentation runtime but we still will be placing it into this section).

2/ In the instrumentation pass we create a symbol "bolt_instr_setup" and inject the corresponding call into the beginning of the function representing the entry point of the binary.

(cherry picked from FBD24329530)
2020-10-15 01:39:35 -07:00
Maksim Panchenko f15532c2aa [BOLT][DWARF] Streamline processing of DWARF unit DIEs
Summary:
Do not store processed DWARF DIEs, but instead process them while
reading one at a time.

Reduces memory consumption when updating debug info by 10%-25%.

(cherry picked from FBD24327029)
2020-10-16 00:11:24 -07:00
Alexander Shaposhnikov bbd9d610fe Add first bits to cross-compile the runtime for OSX
Summary: Add first bits to cross-compile the runtime for OSX.

(cherry picked from FBD24330977)
2020-10-15 03:51:56 -07:00
Rafael Auler 0b6df06e04 [BOLT] In shrinkwrap, do not split prefix/instr
Summary:
When placing restore instructions in the shrink wrapping pass,
we typically put them right before the last instruction of a block at
the dominance frontier. If this instruction happened to have a prefix,
because the MC lib separates prefix into separate MCInsts, we would
accidentally put a load between a prefix and another instruction. Fix
this.

(cherry picked from FBD24295324)
2020-10-14 12:40:33 -07:00
Maksim Panchenko 53bd88c7fe [BOLT] Refactor reading of debug line info
Summary:
Match BinaryFunction to a DWARFUnit based on the unit's address ranges
skipping the parsing of DIEs.

(cherry picked from FBD24269325)
2020-10-12 21:04:42 -07:00
Maksim Panchenko 9f15b9f3c2 [BOLT] Fix debug line info in lite relocation mode
Summary: Emit line info for functions that were not emitted in relocation mode.

(cherry picked from FBD24267650)
2020-10-12 20:16:59 -07:00