Commit Graph

216 Commits

Author SHA1 Message Date
Maksim Panchenko 8ef3b27834 [BOLT][DWARF] Properly emit of end-of-sequence entries for line tables
Summary:
When the compiler emits line table program, it emits EOS using the label
at the end of the containing code section. Since each compilation unit
has its own set of code sections it works as expected (* see the excerpt
from the standard below). However, in BOLT the code from many CUs is
combined into a common section, such as hot text or cold text.
As a result, the symbol at the end of the section may point way past the
code sequence for a given unit.

Since we can emit functions in any order, we conservatively emit
end-of-sequence at the end of every emitted function.

Fixes a problem while intermixing source code with disassembly in
binutils' objdump.

(*) DWARF v4 6.2.5.3:
"Every line number program sequence must end with a DW_LNE_end_sequence
instruction which creates a row whose address is that of the byte after
the last target machine instruction of the sequence."

(cherry picked from FBD31347870)
2021-09-30 17:47:50 -07:00
Amir Ayupov e903671bbf [BOLT][TEST] Imported small tests, removed duplicate input
Summary:
Imported small internal tests.
- call_zero.s
- cfi_expr_rewrite.s
- cfi_insts_count.s
- exceptions_pic.test
- exceptions_run.test

Removed duplicate input file (switch_statement.cpp)

(cherry picked from FBD31355466)
2021-10-01 15:35:43 -07:00
Amir Ayupov 47455e98b3 [BOLT][TEST] Imported small tests
Summary:
Imported small internal tests:
- R_X86_64_64.pic.lld.cpp
- avx512_trap.test
- bad_exe.test
- bolt_info.test

(cherry picked from FBD31251439)
2021-09-28 15:47:51 -07:00
Amir Ayupov 4157682fd9 [BOLT][TEST] Import internal_call_instrument.s
Summary: Imported standalone assembly test

(cherry picked from FBD31161181)
2021-09-23 14:28:13 -07:00
Amir Ayupov 6b4eb0b94a [BOLT][TEST] Split runtime tests into test/runtime folder
Summary:
Create bolt/test/runtime folder and move tests that execute the binary.
Move lit.local.cfg with host_arch check to the corresponding folder.
Addresses issue facebookincubator/BOLT#132.

AArch64/tls.c shows a different behavior with clang hence marked as XFAIL

TODO: add a check for non-exec tests for a corresponding LLVM_TARGETS_TO_BUILD.

(cherry picked from FBD31132234)
2021-09-22 17:58:33 -07:00
Vladislav Khmelevsky e1da1539e3 [PR] Add AARCH64_MOVW_UABS_G* relocations support
Summary:
This patch fixes issue facebookincubator/BOLT#177

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31130162)
2021-09-23 00:52:36 +03:00
Amir Ayupov d4fdc98140 [BOLT][TEST] Remove dependence on host_cc and host_cxx
Summary: Add dependency on clang and clangxx instead.

(cherry picked from FBD31128140)
2021-09-22 15:53:38 -07:00
Vladislav Khmelevsky 542c03c3a3 [PR] Fix aarch64 TLS relocations handling
Summary:
There are few problems found when dealing with TLS relocations for
aarch64.

* RewriteInstance.cpp
** While analyzing TLS relocation we don't have to modify
SymbolAddress (which is the offset from the TLS section), so we need to
just skip verifiction
** The non-got related TLS relocations on aarch64 might be skipped too
** The forse relocation must be applied for GOT relocations on
Aarch64. The symbol adress for GOT relocation might no be pointing
on GOT section (for example ADRP GOT may point to the wrong section,
since GOT table is not page-aligned), so we won't try to get section by
the symbol address.

* Relocation.cpp - Remove R_AARCH64_TLSLE_ADD_TPREL_HI12 and
R_AARCH64_TLSLE_ADD_TPREL_LO12_NC from isGOT check, since they are not
got-related relocations

* BinaryFunction.h
** Remove R_AARCH64_TLSLE_ADD_TPREL_HI12 and
R_AARCH64_TLSLE_ADD_TPREL_LO12_NC from adding to relocation list, since
this is actually an offset in TLS section and BOLT does not change it we
don't need to do something with this relocations, the value won't change
in new binary files
** Refactor the code, separating aarch64 and x86 relocations

* AArch64MCPlusBuilder.cpp
** Add forgotten LO12 relocations to switch case to getTargetExprFor

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD31003349)
2021-09-02 21:04:33 +03:00
Maksim Panchenko 48fbeb1a46 [BOLT] Fix warnings from LLVM DWARF reading library
Summary:
LLVM started printing warnings when DWARFDebugInfoEntry::extractFast()
is invoked trying to read a DIE past the current unit limits. This
results in verbose warnings from BOLT which are harmless but confusing
to the user. Check the boundaries before calling the API above.

(cherry picked from FBD31097271)
2021-09-21 15:39:35 -07:00
Rafael Auler 7b779f819f [BOLT] Fix binary corruption in non-reloc mode
Summary:
We have a problem where we will emit sections that we are not supposed
to emit (with no output offset assigned). This will make us write at
file offset 0 and corrupt the first sections in the binary (usually
.interp section will be corrupted and bash will refuse to run the
binary).

This only happens in non-reloc mode when using JTS_BASIC and when we
do not emit a function that has a jump table (if it gets too large).

Using -update-debug-sections will trigger the pass
check-large-functions, which will mark large funcs as non-simple
and will hide this bug.

(cherry picked from FBD30882012)
2021-09-10 16:19:50 -07:00
Vasily Leonenko 9aa134dc2d [PR] Instrumentation: use TryLock for SimpleHashTable getter
Summary:
This commit introduces TryLock usage for SimpleHashTable getter to
avoid deadlock and relax syscalls usage which causes significant
overhead in runtime.
The old behavior left under -conservative-instrumentation option passed
to instrumentation library.
Also, this commit includes a corresponding test case: instrumentation of
executable which performs indirect calls from common code and signal
handler.

Note: in case if TryLock was failed to acquire the lock - this indirect
call will not be accounted in the resulting profile.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30821949)
2021-08-08 04:50:06 +08:00
Vasily Leonenko e2480fcc98 [PR] LIT: add checking if maxIndividualTestTime is availabe on the platform
Summary:
This commit adds checking if maxIndividualTestTime is availabe on
the platform. If available - it sets per test timeout to 60sec and
declares lit-max-individual-test-time feature for further checking
by particular test cases.
Based on https://reviews.llvm.org/D64251 implementation.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30821986)
2021-08-27 21:56:24 +03:00
Joey Thaman 3e8af67a95 [BOLT] Optimize the three way branch
Summary:
Three way branches commonly appear
in HHVM. They have one test and then two jumps.  The
jump's destinations are not currently optimized.
This pass attempts to optimize which is the first branch.

(cherry picked from FBD30460441)
2021-08-17 10:15:21 -07:00
Vladislav Khmelevsky c040431fe6 [PR] AArch64: Fix ADR instruction handling
Summary:
There are 2 problems found when handling ADR instruction:
1. When extracting value from the ADR instruction we need to do
it another way, then we do it for ADRP instruction.
2. When creating target expression the VariantKind should be other for
ADR instruction.

And we introduces R_AARCH64_ADR_PREL_LO21,
R_AARCH64_TLSDESC_ADR_PREL21 and R_AARCH64_ADR_PREL_PG_HI21_NC
relocations support.

Also this patch introduces AdrPass, which will replace non-local
pointing ADR instructions with ADRP + ADD instructions sequence due to
small offset range of ADR instruction, so after BOLT magic there are no
guarantees that ADR instruction will still be in the range of
just +- 1MB from its target. The instruction replacement needs
relocations to be avalailable, so we won't remove "IsFromCode"
relocations after disassembly from BF anymore. Also we need original
offset of ADR instruction to be available so we add offset annotation
for these instructions.

The last thing this patch adds is ARM testing directory, which will be
used only on ARM testing servers. The common tests (non-assembler tests
which are platform-independent) might be moved from the X86 directory to
the parent one in the future, so such tests could be tested on both X86
and ARM machines.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30497379)
2021-08-20 03:07:01 +03:00
Joey Thaman ef6186c822 [BOLT] Added Constant and Copy Propagation to tail duplicated blocks
Summary:
Added a function in TailDuplication
that will do Constant and Copy Propagation for blocks that
we duplicated as a part of tail duplication.  Added supporting
functions to MCPlusBuilder to find src registers and replace
registers

(cherry picked from FBD30231907)
2021-08-10 10:02:32 -07:00
Vladislav Khmelevsky 2a5790b670 [PR] Fdata: Escape whitespaces in symbol names
Summary:
This patch is part of preparation for golang support. The golang symbols
might have spaces in the name (for example "type..eq.[10]interface {}").
Since fdata uses spaces as a field separator such names brakes the fdata
format, so we need to escape whitespaces and backslashes in symbol names
using the backslash character.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD29999491)
2021-06-29 19:54:08 +03:00
Rafael Auler faee814fb9 Fix NFC tests
Summary:
Our NFC tests are failing on debug-fission-single.s. Fix the test
to be compliant with our checking script.

(cherry picked from FBD30352415)
2021-08-16 11:33:20 -07:00
Rafael Auler d217e2f338 Rebase: [BOLT] DWP output support
Summary:
Added support for writing out DWP file. Works with regular dwo as input or DWP as input.

(cherry picked from FBD31361619)
2021-06-29 15:28:52 -07:00
Vasily Leonenko 900914d3c6 [PR] Tests: add instrumentation tests for PIE exec & shared libs
Summary:
This commit adds dummy tests for checking instrumentation
support for PIE executables and shared libraries.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092729)
2021-06-19 23:01:28 +08:00
Maksim Panchenko 89a2e16037 [BOLT] Support PLT sections with variable entry sizes
Summary:
The linker can generate 8- or 16-byte entries in .plt.got and .plt.sec
sections. On X86, the main differentiator is the presence of endbr64
instruction at the beginning of the entry. Detect the instruction and
adjust the size accordingly.

(cherry picked from FBD29847639)
2021-07-14 01:35:34 -07:00
Joey Thaman a7e2a8f946 [BOLT] Tail Duplication active pass
Summary:
Amended the Tail Duplication
analysis pass to do the tail duplication in question

(cherry picked from FBD29833794)
2021-07-16 11:45:44 -07:00
Joey Thaman 2f46660559 [BOLT] Tail duplication analysis pass
Summary:
Created a binary pass that records how many
times tail duplication would be used and how many cache
misses it would theoretically stop

(cherry picked from FBD29619858)
2021-07-01 07:11:26 -07:00
Maksim Panchenko c9f5f47b51 [BOLT] Add support for .plt.sec and refactor PLT-reading code
Summary:
A binary can contain multiple PLT sections with different name and
attributes (such as an entry size). Extend the support to .plt.sec and
refactor the code to make future extensions simpler.

(cherry picked from FBD29502107)
2021-06-30 14:41:41 -07:00
Maksim Panchenko 3e5ce1f282 [BOLT][TESTS] Remove dynamic relocations from YAML tests
Summary:
Our YAML objects contain references to dynamic relocations via .dynamic,
but there are no corresponding relocation sections. Change .dynamic
contents to specify no dynamic relocations.

(cherry picked from FBD29502108)
2021-06-30 14:33:59 -07:00
Maksim Panchenko f46af9e9bc [BOLT][TESTS] Fix ICF test case
Summary:
Host compiler may generate duplicate functions and as a result BOLT can
fold more than 1 function.

(cherry picked from FBD29347302)
2021-06-23 16:13:30 -07:00
Maksim Panchenko bbbd159ccb [BOLT] Fix undefined symbol warnings/errors
Summary:
When we fold a function in relocation mode, make sure to clear its state
to avoid emitting relocations against undefined symbols.

(cherry picked from FBD29245320)
2021-06-18 14:35:39 -07:00
Maksim Panchenko 7bccf8d25d [BOLT][NFC] Fix debug info printouts for inlined functions
Summary:
While printing debug info for instructions, we should use line tables
from the corresponding DWARF CU which could be different from the
containing function CU in case of inlined instructions.

(cherry picked from FBD28908324)
2021-06-04 12:31:31 -07:00
Amir Ayupov 65d227c035 [BOLT][TEST] Fix test case to conform to analyzePICJumpTable pattern matching
Summary:
Make sure that jump table is properly recognized in
`split_func_jump_table_fragment.s`.

(cherry picked from FBD28839976)
2021-06-02 10:50:47 -07:00
Vladislav Khmelevsky 79807d99fe [PR] Introduce loop inversion pass
Summary:
This patch introduces LoopInversionPass. Its main purpose is to ensure
that the loop layout is optimal depending on the profile information. So
if profile information shows that the loop is used, the unconditional
jump instruction must be executed only once and vice-versa. Please take
a look to the pass header file and test for more details.

Also change link_fdata script a bit, to be able to change FDATA prefix,
like FileCheck does.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

PR facebookincubator/BOLT#153

(cherry picked from FBD28391811)
2021-05-11 20:59:13 +03:00
Amir Ayupov 12e9fec697 Rebase: [BOLT] DebugFission Support
Summary:
Implemented support for Debug Fission.
For the most part it doesn't impact Monolithic execution path.
One area that was changed is the DW_AT_low_pc/DW_AT_high_pc conversion. Before it was to DW_AT_ranges/DW_AT_low_pc, now DW_AT_low_pc is kept in same place.
Another more visible impact is in Skeleton CU the DW_AT_low_pc is replaced with DW_AT_ranges_base if it's not originally present and bolt converted ranges conversion inside the dwo units.

Output of this are multiple .dwo files with updated debug information.

(cherry picked from FBD29569788)
2021-04-01 11:43:00 -07:00
Amir Ayupov 99d7f90635 [BOLT][NFC][TEST] Added llvm-dwarfdump and llvm-mc to BOLT_TEST_DEPS
(cherry picked from FBD28427352)
2021-05-13 15:36:43 -07:00
Vladislav Khmelevsky de298c08fd [PR] Fix tests build with -no-pie option
Summary:
Since gcc/ld could produce and expect PIE files we need to pass -no-pie option to avoid linking errors for tests.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28360045)
2021-05-11 03:25:49 +03:00
Alexey Moksyakov ce84e9607a [PR] Fix bb reordering optimization
Summary:
Reorder-blocks optimization pass doesn't take into account that
available offset for legacy Jcc instructions (for example,
JRCXZ - operand 8 bits) has to be less than 255 bytes.
It's rare case and to exclude such functions with unsupported
instructions from optimization passes added extra checking

Alexey Moksyakov
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28264117)
2021-04-23 11:34:40 +03:00
Amir Ayupov 94653797f3 Rebase: [BOLT][NFC] Avoid binutils in tests
Summary:
Replace binutils tools with llvm tools

(cherry picked from FBD29575630)
2021-05-04 16:45:28 -07:00
Amir Ayupov 081e39aa15 Rebase: [cherry-pick] [BOLT] Add option to skip writing an output file
Summary:
The user may wish to run BOLT for printing statistics only
(i.e. to check that the profile is valid). Add an option to run BOLT
without writing any output file, similar to a dry run. This option
is triggered by supplying -o with "/dev/null".

(cherry picked from FBD29568632)
2021-03-29 16:04:57 -07:00
Maksim Panchenko e7169be93f [BOLT] Do not assert on jump table heuristic failure
Summary:
During the initial indirect jump analysis, we used to assert that the
discovered jump table type matched the pattern of the corresponding
instruction sequence. E.g., for PIC jump table memory we expected the
PIC jump table instruction sequence. The assertions were too
conservative, as in the case of a mismatch we can mark the indirect jump
as having an unknown control flow. That should be sufficient to either
skip the function processing or rely on relocation information for
possible recovery of the control flow.

(cherry picked from FBD27255816)
2021-03-23 13:41:41 -07:00
Rafael Auler b3c34d568a [BOLT] Fix instrumentation bug in duplicated JTs
Summary:
Fix a bug with instrumentation when trying to instrument
functions that share a jump table with multiple indirect
jumps. Usually, each indirect jump that uses a JT will have its own
copy of it. When this does not happen, we need to duplicate the jump
table safely, so we can split the edges correctly (each copy of the
jump table may have different split edges). For this to happen, we
need to correctly match the sequence of instructions that perform the
indirect jump to identify the base address of the jump table and patch
it to point to the new cloned JT. It was reported to us a case in
which the compiler generated suboptimal code to do an indirect jump
which our matcher failed to identify.

Fixes facebookincubator/BOLT#126

(cherry picked from FBD27065579)
2021-03-15 16:34:25 -07:00
Maksim Panchenko b11c826889 [BOLT] Fix false references to zero-sized objects
Summary:
Whenever BOLT encounters a data reference in code, it tries to convert
it into <Object+Offset> form. The primary reason behind this approach is
to support read-only data-reordering optimization. However, with the
current level of the linker and compiler support we don't have enough
information to always correctly restore the original <Object+Offset>.
E.g. with zero-sized symbols we have to speculate that the actual size
of the underlying object extends to the next symbol. Most of the time,
there will be an object pointed by a zero-sized symbol and even
if we are guessing incorrectly, there will be no harm in creating
references of such form.

The problem happens when there's no object corresponding to the original
symbol and the next object is an (unmarked) jump table:

  A:                   # <- zero-sized object
  .LJUMP_TABLE:
    .long <entry1>
    .long <entry2>
    ....
  .LB:
    .long 21
  .LC:
    .long 42

The jump table will be moved and all references past it (up to the next
named object) will be incorrectly updated.

We should not speculate about the size of A in a case like that and
treat all discovered data objects (and thus references) independently.

(cherry picked from FBD27005660)
2021-03-15 12:06:56 -07:00
Alexander Yermolovich 06959eedcf Fix up test for Update DW_AT_stmt_list for .debug_types
Summary: As titled.

(cherry picked from FBD28112186)
2021-03-17 17:08:26 -07:00
Alexander Yermolovich 0ec91a25df Update DW_AT_stmt_list for .debug_types
Summary:
There is no real link between CU and TU, so relying on fact
that address are the same, and we are updating all of them.

(cherry picked from FBD28112114)
2021-02-17 15:30:10 -08:00
Amir Ayupov 1c5d3a056c Rebase: Merge BOLT codebase in monorepo
Summary:
This commit is the first step in rebasing all of BOLT
history in the LLVM monorepo. It also solves trivial build issues
by updating BOLT codebase to use current LLVM. There is still work
left in rebasing some BOLT features and in making sure everything
is working as intended.

History has been rewritten to put BOLT in the /bolt folder, as
opposed to /tools/llvm-bolt.

(cherry picked from FBD33289252)
2020-12-01 16:29:39 -08:00
Rafael Auler e0261a22ce [TEST] Remove dependency on debug output
Summary:
Test mistakenly used -debug output, which makes it fail on
no-asserts build.

(cherry picked from FBD25399449)
2020-12-09 12:25:58 -08:00
Rafael Auler d2f68039bc [BOLT] Fix shrinkwrapping bug when changing frame alignment
Summary:
This fixes a bug with shrink wrapping when trying to move
push-pops in a function where we are not allowed to modify the
stack layout for alignment reasons. In this bug, we failed to
propagate alignment requirement upwards in the call graph from
function A to B when: (1) there is a cycle in the call graph and
(2) the distance from A to B is greater than 1 in the call graph
and (3) there is a node in the path from A to B, not including
A or B, that does not access parameters in the stack.

(cherry picked from FBD25315977)
2020-12-03 20:09:32 -08:00
Amir Ayupov f9d00d418b [BOLT] Handle insertion of updated CFI at the first basic block
Summary:
Fix corner case of insertion of updated CFI with unset `PrevBB`.
Handle it in the same way as inserting past hot-cold split point.

(cherry picked from FBD24943911)
2020-11-17 18:40:19 -08:00
Amir Ayupov 6401af89c7 [BOLT] Support jump tables in split fragments with entries pointing back to parent functions
Summary:
Support jump tables belonging to split fragments with entries
pointing back to parent functions.
While skipping such families of functions, make sure to use the
topmost fragment to ignore its fragments.

(cherry picked from FBD24907438)
2020-11-12 11:54:51 -08:00
Amir Ayupov c0cb550536 Minimize X86/shrinkwrapping-critedge test case
Summary: Minimized test case while preserving the CFG subgraph with an issue

(cherry picked from FBD24871063)
2020-11-10 21:22:57 -08:00
Amir Ayupov e54d389799 [BOLT] Disable DynoStats printing after SCTC
Summary:
Introduce new BinaryFunction flag `IsCanonicalCFG`, which gets
unset by SCTC pass. Make DynoStats collection conditional on this
new flag.
SCTC leaves CFG in a state where branch counters of BBs with tail
calls/conditional tail calls are not available (except via annotations,
which get stripped by `lower-annotations`). Without branch
counters, DynoStats are invalid.

(cherry picked from FBD24558050)
2020-11-10 10:51:23 -08:00
Amir Ayupov 2b09d672ce Conservatively handle jump tables in split functions
Summary:
- Allow jump table entries to point to locations inside the function and its fragments.
Reasoning behind this is that jump table identification has the logic of stopping at entry which belongs to a function different from the one originally referencing jump table. This assumption is invalid for jump tables with entries pointing to both parent function and cold fragments, leading to "unclaimed PC-relative relocations" assertion.

- Add fragment identification heuristic based on function name regex and contiguous jump table entries.
Currently, parent-to-fragment relationship is set up based on interprocedural references – direct references from the parent function. These references don't include references through jump table.
Additionally, some fragments are only reachable through jump table. In that case, in order to fully consume jump table, add parent-to-fragment relationship during `analyzeJumpTable` using the following heuristics:
  1. Fragment is identified as such based on name (contains `.cold.` part), but
  2. Parent function is not set – no direct interprocedural references to that fragment, and
  3. Fragment has the name of the form <parent>.cold(.\d+)

* For split functions with jump table entries spanning parent and fragments, mark parent and all fragments as ignored.

(cherry picked from FBD24456904)
2020-11-06 11:19:03 -08:00
Amir Ayupov dc48354f71 processInterproceduralReferences: record references to cold fragments as entry points
Summary:
For interprocedural references to fragments, record them as
fragment entry points. Not registering these entry points leads to
UCE removing the blocks and "Undefined temporary symbol"
assertion.

(cherry picked from FBD24511281)
2020-11-06 10:57:47 -08:00
Rafael Auler e4396c41da [BOLT] Ignore __hot_start, __hot_end from input
Summary:
When -hot-text is on, do not read __hot_start and __hot_end
from input (inserted by a linker script with the intent of ordering
functions). This can confuse BOLT into creating a function with this
name depending on which address the symbol lands and we will assert
when trying to emit our own __hot_start/__hot_end with symbol
redefinition.

(cherry picked from FBD24366636)
2020-10-17 00:50:27 -07:00
Rafael Auler 0b6df06e04 [BOLT] In shrinkwrap, do not split prefix/instr
Summary:
When placing restore instructions in the shrink wrapping pass,
we typically put them right before the last instruction of a block at
the dominance frontier. If this instruction happened to have a prefix,
because the MC lib separates prefix into separate MCInsts, we would
accidentally put a load between a prefix and another instruction. Fix
this.

(cherry picked from FBD24295324)
2020-10-14 12:40:33 -07:00
Rafael Auler d7fb998637 [BOLT] Fix sign issue when validating X86 relocations
Summary:
In analyzeRelocations, we extract the result of the relocation
from binary code to recreate the target of it in a few special cases.
For R_X86_64_32S relocations, however, we were neglecting the
possibility of the encoded value in the instruction to be negative.

(cherry picked from FBD24096347)
2020-10-05 12:41:03 -07:00
Amir Ayupov 8c4ba8f165 Bugfix for splitting critical edges in shrink wrapping
Summary:
Fix issue with splitting critical edges originating at
the same BB in ShrinkWrapping::splitFrontierCritEdges.

Splitting of critical edges originating at the same FromBB
wasn't handled correctly as the Frontier at index corresponding
to FromBB was overwritten with basic blocks created for
multiple DestinationBBs.

(cherry picked from FBD23232398)
2020-08-20 19:00:29 -07:00
Rafael Auler c6799a689d [BOLT] Fix stack alignment for runtime lib
Summary:
Right now, the SAVE_ALL sequence executed upon entry of both
of our runtime libs (hugify and instrumentation) will cause the stack to
not be aligned at a 16B boundary because it saves 15 8-byte regs. Change
the code sequence to adjust for that. The compiler may generate code
that assumes the stack is aligned by using movaps instructions, which
will crash.

(cherry picked from FBD22744307)
2020-07-27 16:52:51 -07:00
Rafael Auler ed02946281 [BOLT] Fix hot_end symbol update with user function order
Summary:
If no profile data is provided, but only a user-provided order
file for functions, fix the placement of the __hot_end symbol.

(cherry picked from FBD22713265)
2020-07-24 10:28:36 -07:00
Rafael Auler 170f73ac9e [BOLT] Fix fix-branches in presence of JRCXZ and friends
Summary:
Do not fail/assert when trying to reorder blocks that terminate
with JRCXZ/JECXZ/LOOP instructions. We cannot invert the condition of
these instructions, so just treat them accordingly in fixBranches().

(cherry picked from FBD22487107)
2020-07-15 23:02:58 -07:00
Rafael Auler 26ad0bd951 [TESTS] Re-add issue20/issue26 tests
Summary:
Re-add tests removed because they used to depend on yaml2obj.
Rewrite them with an assembler (llvm-mc) and use the system linker to
produce a valid ELF as input to BOLT.

(cherry picked from FBD22323449)
2020-06-30 18:36:49 -07:00
Rafael Auler 41cb6b68ed Update X86/pre-aggregated-perf.test
Summary:
Add REQUIRED statement.

(cherry picked from FBD22290759)
2020-06-24 18:24:07 -07:00
Maksim Panchenko 5296b6d12a [BOLT] Change symbol handling for secondary function entries
Summary:
Some functions could be called at an address inside their function body.
Typically, these functions are written in assembly as C/C++ does not
have a multi-entry function concept. The addresses inside a function
body that could be referenced from outside are called secondary entry
points.

In BOLT we support processing functions with secondary/multiple entry
points. We used to mark basic blocks representing those entry points
with a special flag. There was only one problem - each basic block has
exactly one MCSymbol associated with it, and for the most efficient
processing we prefer that symbol to be local/temporary. However, in
certain scenarios, e.g. when running in non-relocation mode, we need
the entry symbol to be global/non-temporary.

We could create global symbols for secondary points ahead of time when
the entry point is marked in the symbol table. But not all such entries
are properly marked. This means that potentially we could discover an
entry point only after disassembling the code that references it, and
it could happen after a local label was already created at the same
location together with all its references. Replacing the local symbol
and updating the references turned out to be an error-prone process.

This diff takes a different approach. All basic blocks are created with
permanently local symbols. Whenever there's a need to add a secondary
entry point, we create an extra global symbol or use an existing one
at that location. Containing BinaryFunction maps a local symbol of a
basic block to the global symbol representing a secondary entry point.
This way we can tell if the basic block is a secondary entry point,
and we emit both symbols for all secondary entry points. Since secondary
entry points are quite rare, the overhead of this approach is minimal.

Note that the same location could be referenced via local symbol from
inside a function and via global entry point symbol from outside.
This is true for both primary and secondary entry points.

(cherry picked from FBD21150193)
2020-04-19 22:29:54 -07:00
Rafael Auler 6dbd15bc01 [BOLT-X86] Fix instrumentation issue with indirect calls
Summary:
Indirect calls that use RSP to compute the target address would
break in instrumentation mode because we were adding instructions that
changed the stack pointer. Fix this.

(cherry picked from FBD20883791)
2020-04-06 17:38:11 -07:00
Rafael Auler 340da8f294 [BOLT] Fix shrink wrapping to check pops
Summary:
Shrink wrapping has a mode where it will directly move push
pop pairs, instead of replacing them with stores/loads. This is an
ambitious mode that is triggered sometimes, but whenever matching with
a push, it would operate with the assumption that the restoring
instruction was a pop, not a load, otherwise it would assert. Fix this
assertion to bail nicely back to non-pushpop mode (use regular store and
load instructions).

(cherry picked from FBD20085905)
2020-02-18 16:00:40 -08:00
Xin-Xin Wang d87f95065a [BOLT] Add missing CMake test dependencies
Summary:
I noticed when setting up a new repository for bolt that bolt tests
would fail unexpectedly when running `ninja check-bolt` and
`ninja check-llvm`. This turns out to be because dependencies for bolt
binaries were not specified in the CMake configuration so they were not
built before running the tests. This diff adds the dependencies to the
CMake configuration for check-bolt and check-llvm so that bolt binaries
are built before running tests.

(cherry picked from FBD17919505)
2019-10-14 16:03:54 -07:00
Rafael Auler ddfcf4f266 [BOLT] Add parser for pre-aggregated perf data
Summary:
The regular perf2bolt aggregation job is to read perf output directly.
However, if the data is coming from a database instead of perf, one
could write a query to produce a pre-aggregated file. This function
deals with this case.

The pre-aggregated file contains aggregated LBR data, but without binary
knowledge. BOLT will parse it and, using information from the
disassembled binary, augment it with fall-through edge frequency
information. After this step is finished, this data can be either
written to disk to be consumed by BOLT later, or can be used by BOLT
immediately if kept in memory.

File format syntax:
{B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count>
[<mispred_count>]

B - indicates an aggregated branch
F - an aggregated fall-through (trace)
f - an aggregated fall-through with external origin - used to disambiguate
between a return hitting a basic block head and a regular internal
jump to the block

<start_id> - build id of the object containing the start address. We can
skip it for the main binary and use "X" for an unknown object. This will
save some space and facilitate human parsing.

<start_offset> - hex offset from the object base load address (0 for the
main executable unless it's PIE) to the start address.

<end_id>, <end_offset> - same for the end address.

<count> - total aggregated count of the branch or a fall-through.

<mispred_count> - the number of times the branch was mispredicted.
Omitted for fall-throughs.

Example
F 41be50 41be50 3
F 41be90 41be90 4
f 41be90 41be90 7
B 4b1942 39b57f0 3 0
B 4b196f 4b19e0 2 0

(cherry picked from FBD8887182)
2018-07-17 18:31:46 -07:00
Rafael Auler 12380b8b06 Fix assembly after adding entry points
Summary:
When a given function B, located after function A, references
one of A's basic blocks, it registers a new global symbol at the
reference address and update A's Labels vector via
BinaryFunction::addEntryPoint(). However, we don't update A's branch
targets at this point. So we end up with an inconsistent CFG, where the
basic block names are global symbols, but the internal branch operands
are still referencing the old local name of the corresponding blocks
that got promoted to an entry point. This patch fix this by detecting
this situation in addEntryPoint and iterating over all instructions,
looking for references to the old symbol and replacing them to use the
new global symbol (since this is now an entry point).

Fixes facebookincubator/BOLT#26

(cherry picked from FBD8728407)
2018-07-03 11:57:46 -07:00
Rafael Auler 544d1577c1 Avoid removing BBs referenced by JTs
Summary:
While removing unreachable blocks, we may decide to remove a
block that is listed as a target in a jump table entry. If we do that,
this label will be then undefined and LLVM assembler will crash.
Mitigate this for now by not removing such blocks, as we don't support
removing unnecessary jump tables yet.

Fixes facebookincubator/BOLT#20

(cherry picked from FBD8730269)
2018-07-03 17:02:33 -07:00
Rafael Auler 8f717dd25e [BOLT] Add initial bolt-only test infra
Summary:
Create folders and setup to make LIT run BOLT-only tests. Add
a test example. This will add a new make/ninja rule "check-bolt" that
the user can invoke to run LIT on this folder.

(cherry picked from FBD8595786)
2018-06-22 13:50:07 -07:00