Summary:
Use the new API introduced in https://reviews.llvm.org/D106624
to request LLVM do not process relocations for debug sections, since
BOLT processes final binaries that are already relocated.
(cherry picked from FBD31449206)
Summary:
When the compiler emits line table program, it emits EOS using the label
at the end of the containing code section. Since each compilation unit
has its own set of code sections it works as expected (* see the excerpt
from the standard below). However, in BOLT the code from many CUs is
combined into a common section, such as hot text or cold text.
As a result, the symbol at the end of the section may point way past the
code sequence for a given unit.
Since we can emit functions in any order, we conservatively emit
end-of-sequence at the end of every emitted function.
Fixes a problem while intermixing source code with disassembly in
binutils' objdump.
(*) DWARF v4 6.2.5.3:
"Every line number program sequence must end with a DW_LNE_end_sequence
instruction which creates a row whose address is that of the byte after
the last target machine instruction of the sequence."
(cherry picked from FBD31347870)
Summary:
Generate line tables for original/unmodified functions directly from
input line tables, bypassing conversion into intermediate structures,
such as BinaryLineDivisions.
Emit end-of-sequence markers only when necessary, i.e. when the line
sequence is not adjacent to the next one, or at the end of the line
sequence for the compilation unit.
If the sequence starts with ambiguous line info (multiple lines per
address), make sure we emit all such lines.
Reduce memory consumption when updating debug info by eliminating
intermediate data structures allocation.
(cherry picked from FBD30829448)
Summary:
BOLT needs to generate line info tables using absolute addresses as well
as using the standard MC way of labels attached to instructions. Move
line table generation code under BOLT.
Ideally, we should be able to extend existing interfaces in LLVM, but
without other users of the interface it will be hard to justify the
change.
(cherry picked from FBD30723466)
Summary:
For historical reasons, we are populating FailedAddresses twice in
RewriteInstance. Remove the second (happening later) call to avoid the
confusion.
(cherry picked from FBD31278956)
Summary:
When rewriting .debug_abbrev section, update abbrev offsets for type
units in addition to compile units.
Reuse abbreviation entries if they were shared by multiple compile/type
units.
(cherry picked from FBD31262326)
Summary:
Create bolt/test/runtime folder and move tests that execute the binary.
Move lit.local.cfg with host_arch check to the corresponding folder.
Addresses issue facebookincubator/BOLT#132.
AArch64/tls.c shows a different behavior with clang hence marked as XFAIL
TODO: add a check for non-exec tests for a corresponding LLVM_TARGETS_TO_BUILD.
(cherry picked from FBD31132234)
Summary:
Previously, we were registering all CUs with aranges writer. Since DWO
CUs have offsets set to 0, and we were registering them after the
skeleton unit at offset 0 was already registered, it was mostly
harmless as DWO CUs were effectively ignored.
(cherry picked from FBD31162621)
Summary:
Instead of patching the original .debug_abbrev section contents,
generate new section data based on parsed compilation unit
abbreviations.
This eliminates the dependency on the LLVM extension that records
abbreviation attribute offsets while parsing .debug_abbrev contents.
The output with this patch should stay the same (NFC).
(cherry picked from FBD31133611)
Summary:
There are some cases, when relocations must not be processed by bolt.
This patch handles three of such cases:
* The linker might eliminate the instruction and replace it with NOP
* The linker might perform TLS relocations relaxations, replacing the
got to direct TP + offset access.
* Due to errata 843419 the linker might create a veneer, replacing the
load/store instruction with branching.
In both cases linker leaves old relocations, that are no longer matches
the instruction emmited to binary, so we must avoid processing of these
relocations.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD31002384)
Summary:
There are few problems found when dealing with TLS relocations for
aarch64.
* RewriteInstance.cpp
** While analyzing TLS relocation we don't have to modify
SymbolAddress (which is the offset from the TLS section), so we need to
just skip verifiction
** The non-got related TLS relocations on aarch64 might be skipped too
** The forse relocation must be applied for GOT relocations on
Aarch64. The symbol adress for GOT relocation might no be pointing
on GOT section (for example ADRP GOT may point to the wrong section,
since GOT table is not page-aligned), so we won't try to get section by
the symbol address.
* Relocation.cpp - Remove R_AARCH64_TLSLE_ADD_TPREL_HI12 and
R_AARCH64_TLSLE_ADD_TPREL_LO12_NC from isGOT check, since they are not
got-related relocations
* BinaryFunction.h
** Remove R_AARCH64_TLSLE_ADD_TPREL_HI12 and
R_AARCH64_TLSLE_ADD_TPREL_LO12_NC from adding to relocation list, since
this is actually an offset in TLS section and BOLT does not change it we
don't need to do something with this relocations, the value won't change
in new binary files
** Refactor the code, separating aarch64 and x86 relocations
* AArch64MCPlusBuilder.cpp
** Add forgotten LO12 relocations to switch case to getTargetExprFor
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD31003349)
Summary:
LLVM started printing warnings when DWARFDebugInfoEntry::extractFast()
is invoked trying to read a DIE past the current unit limits. This
results in verbose warnings from BOLT which are harmless but confusing
to the user. Check the boundaries before calling the API above.
(cherry picked from FBD31097271)
Summary:
In "Add initial function injection support", Laith added this
code because injected functions would use the original text section as
the section to emit their code to. Now, what happens is that functions
are mapped to either their own section in non-reloc mode, or mapped to
a particular section in the pass reassign sections. So this section does
not need to have an output address anymore and this code is obsolete.
(cherry picked from FBD30980450)
Summary:
We have a problem where we will emit sections that we are not supposed
to emit (with no output offset assigned). This will make us write at
file offset 0 and corrupt the first sections in the binary (usually
.interp section will be corrupted and bash will refuse to run the
binary).
This only happens in non-reloc mode when using JTS_BASIC and when we
do not emit a function that has a jump table (if it gets too large).
Using -update-debug-sections will trigger the pass
check-large-functions, which will mark large funcs as non-simple
and will hide this bug.
(cherry picked from FBD30882012)
Summary:
This commit introduces TryLock usage for SimpleHashTable getter to
avoid deadlock and relax syscalls usage which causes significant
overhead in runtime.
The old behavior left under -conservative-instrumentation option passed
to instrumentation library.
Also, this commit includes a corresponding test case: instrumentation of
executable which performs indirect calls from common code and signal
handler.
Note: in case if TryLock was failed to acquire the lock - this indirect
call will not be accounted in the resulting profile.
Vasily Leonenko,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30821949)
Summary:
This commit adds checking if maxIndividualTestTime is availabe on
the platform. If available - it sets per test timeout to 60sec and
declares lit-max-individual-test-time feature for further checking
by particular test cases.
Based on https://reviews.llvm.org/D64251 implementation.
Vasily Leonenko,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30821986)
Summary:
The clang 12 doesn't want to build this place due to unrelated
types of iterator element and std vector.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30821177)
Summary: Changing to use the new APIs for getting offset of attribute from .debug_info. They were split in to multiple ones so that Offset can be gotten seperatly.
(cherry picked from FBD30616705)
Summary:
Three way branches commonly appear
in HHVM. They have one test and then two jumps. The
jump's destinations are not currently optimized.
This pass attempts to optimize which is the first branch.
(cherry picked from FBD30460441)
Summary:
There are 2 problems found when handling ADR instruction:
1. When extracting value from the ADR instruction we need to do
it another way, then we do it for ADRP instruction.
2. When creating target expression the VariantKind should be other for
ADR instruction.
And we introduces R_AARCH64_ADR_PREL_LO21,
R_AARCH64_TLSDESC_ADR_PREL21 and R_AARCH64_ADR_PREL_PG_HI21_NC
relocations support.
Also this patch introduces AdrPass, which will replace non-local
pointing ADR instructions with ADRP + ADD instructions sequence due to
small offset range of ADR instruction, so after BOLT magic there are no
guarantees that ADR instruction will still be in the range of
just +- 1MB from its target. The instruction replacement needs
relocations to be avalailable, so we won't remove "IsFromCode"
relocations after disassembly from BF anymore. Also we need original
offset of ADR instruction to be available so we add offset annotation
for these instructions.
The last thing this patch adds is ARM testing directory, which will be
used only on ARM testing servers. The common tests (non-assembler tests
which are platform-independent) might be moved from the X86 directory to
the parent one in the future, so such tests could be tested on both X86
and ARM machines.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30497379)
Summary:
Currently most of the warnings are printed only in debug mode. Since
relocations are very important for binary correct work I suggest to
print number of failed to process relocations to pay extra attention in
case some problems with them were met
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30500629)
Summary:
Added a function in TailDuplication
that will do Constant and Copy Propagation for blocks that
we duplicated as a part of tail duplication. Added supporting
functions to MCPlusBuilder to find src registers and replace
registers
(cherry picked from FBD30231907)
Summary:
This patch is part of preparation for golang support. The golang symbols
might have spaces in the name (for example "type..eq.[10]interface {}").
Since fdata uses spaces as a field separator such names brakes the fdata
format, so we need to escape whitespaces and backslashes in symbol names
using the backslash character.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD29999491)
Summary:
Remove unused code introduced a while ago (2016), with its use removed
since then.
PR facebookincubator/BOLT#198
Author: Amir Aupov <aaupov@fb.com>
(cherry picked from FBD30376537)
Summary:
The ADRP instructions has 21 bits to store page offsets + 12 lowest bits
are zero, that give us a total of 33 bits (32 bits for address + 1 sign
bit, to address +- 4GB).
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30283044)
Summary:
This commit adds dummy tests for checking instrumentation
support for PIE executables and shared libraries.
Vasily Leonenko,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092729)
Summary:
To avoid RELATIVE relocations avoid using of GOT table
by using hidden visibility for all symbols in library.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092712)
Summary:
The trampolines are no loger pointers to the functions. For
propper name resolving by bolt use extern "C" for all external symbols
in instr.cpp
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092698)
Summary:
This commit introduces -instrumentation-binpath argument used
to point instuqmented binary in runtime in case if /proc/self/map_files
path is not accessible due to access restriction issues.
Vasily Leonenko
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092681)
Summary:
This commit introduces static binaries instrumentation
support. Note that current implementation does not support profile
output on the instrumented binary finalization. So it requires to use
-instrumentation-sleep-time=N (N>0) option usage. Note: There is
unhandled case with static PIE executable which might have dynamic
header.
Vasily Leonenko,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092471)
Summary:
This commit adds support for opening libs based on links
/proc/self/map_files. For this we're getting current virtual address
and searching the lib in the directory with such address range. After
that, we're getting full path to the binary by using readlink
function. Direct read from link in /proc/self/map_files entries is not
possible because of lack of permissions.
Elvina Yakubova,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092422)
Summary:
This commit adds support for getting directory entries and
reading value of a symbolic link in instrumentation runtime library
Elvina Yakubova,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092362)
Summary:
This commit implements new method for _start & _fini functions hooking
which allows to use relative jumps for future PIE & .so library support.
Instead of using absolute address of _start & _fini functions known on
linking stage - we'll use dynamically created trampoline functions and
use corresponding symbols in instrumentation runtime library.
As we would like to use instrumentation for dynamically loaded binaries
(with PIE & .so), thus we need to compile instrumentation library with
"-fPIC" flag to support relative address resolution for functions and
data.
For shared libraries we need to handle initialization of instrumentation
library case by using DT_INIT section entry point.
Also this commit adds detection if the binary is executable or shared
library based on existence of PT_INTERP header. In case of shared
library we save information about real library init function address
for further usage for instrumentation library init trampoline function
creation and also update DT_INIT to point instrumentation library init
function.
Functions called from init/fini functions should be called with forced
stack alignment to avoid issues with instructions which relies on it.
E.g. optimized string operations.
Vasily Leonenko,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092316)
Summary:
Move the common code into MCPlusBuilder.h.
Use group 1 `kTailCall` MCAnnotation instead of dynamically allocated
annotation.
This diff reduces the processing time overhead to 1.5% vs using
TAILJMP opcode.
(cherry picked from FBD30055585)
Summary:
The linker can generate 8- or 16-byte entries in .plt.got and .plt.sec
sections. On X86, the main differentiator is the presence of endbr64
instruction at the beginning of the entry. Detect the instruction and
adjust the size accordingly.
(cherry picked from FBD29847639)