Commit Graph

901 Commits

Author SHA1 Message Date
Vladislav Khmelevsky b728bfc70a [PR] Add missing includes
Summary:
Adds missing headers removed by IWYU.
NB: this caused build breakage on ubuntu-latest

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28368185)
2021-05-11 15:55:57 +03:00
Vladislav Khmelevsky de298c08fd [PR] Fix tests build with -no-pie option
Summary:
Since gcc/ld could produce and expect PIE files we need to pass -no-pie option to avoid linking errors for tests.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28360045)
2021-05-11 03:25:49 +03:00
Alexey Moksyakov ce84e9607a [PR] Fix bb reordering optimization
Summary:
Reorder-blocks optimization pass doesn't take into account that
available offset for legacy Jcc instructions (for example,
JRCXZ - operand 8 bits) has to be less than 255 bytes.
It's rare case and to exclude such functions with unsupported
instructions from optimization passes added extra checking

Alexey Moksyakov
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28264117)
2021-04-23 11:34:40 +03:00
Amir Ayupov 9a884543f1 [BOLT][NFC] Avoid unnecessary copies with push_back
Summary: Small refactoring inspired by clang-tidy modernize-use-emplace

(cherry picked from FBD28307493)
2021-05-07 18:43:25 -07:00
Amir Ayupov 94653797f3 Rebase: [BOLT][NFC] Avoid binutils in tests
Summary:
Replace binutils tools with llvm tools

(cherry picked from FBD29575630)
2021-05-04 16:45:28 -07:00
Amir Ayupov eb99a6665c Rebase: [BOLT][NFC] Remove unneeded includes with include-what-you-use
Summary:
Ran iwyu multiple times, manually picked header remove lines.
Reached fixed point wrt removal: iwyu doesn't automatically remove
any more headers or forward declarations.

(cherry picked from FBD29569221)
2021-04-30 13:54:02 -07:00
Maksim Panchenko 5239182075 [perf2bolt] Further relax segment matching
Summary:
Previously, we used p_align value of the code segment to predict the
mapping of the segment at runtime. However, at times the reported
value is not aligned and at other times the actual aligned value will
be different because of the different page size used.
All we know is that the page size used at runtime should not exceed
p_align value. Adjust our segment address matching accordingly.

(cherry picked from FBD28133066)
2021-04-30 15:02:29 -07:00
Maksim Panchenko bd86c06c1b [BOLT][NFC] Remove CFIReaderWriter::fdes()
(cherry picked from FBD27918126)
2021-04-21 12:33:08 -07:00
Maksim Panchenko f8fa3e97d5 [BOLT] Remove -dump-eh-frame option
Summary: The option duplicates functionality of "llvm-dwarfdump -eh-frame".

(cherry picked from FBD27917505)
2021-04-21 12:13:22 -07:00
Maksim Panchenko 3355936e14 [BOLT][NFC] Remove RewriteInstance::EHFrame
(cherry picked from FBD27915725)
2021-04-21 11:24:15 -07:00
Amir Ayupov f84f451a54 [BOLT][NFC] Use const reference for MCInstrDesc
Summary:
Addressing comments from the review for "Expand auto types".
Use const reference in MCPlusBuilder for MCInstrDesc where the copy
is not necessary.

(cherry picked from FBD27844344)
2021-04-17 21:48:46 -07:00
Amir Ayupov c7306cc219 Rebase: [BOLT][NFC] Expand auto types
Summary:
Expanded auto types across BOLT semi-automatically with the aid
of clangd LSP

(cherry picked from FBD33289309)
2021-04-08 00:19:26 -07:00
Rafael Auler dc2673a039 [BOLT] Fix value invalidation bug in runtimelib
Summary:
We can't use a fragment of the old LibPath as an input
to create a new one.

(cherry picked from FBD27642728)
2021-04-07 21:40:23 -07:00
Rafael Auler 35732d954b [BOLT] Remove cantFail in getAddressRanges calls
Summary:
We may have a CU with empty ranges, so accept errors coming
from DWARFDie::getAddressRanges(). This happens when using tools that
selectively strip debuginfo from the binary.

(cherry picked from FBD27602731)
2021-04-06 12:57:09 -07:00
Amir Ayupov f1bfb18ceb [BOLT] Refactor SectionPatchers map to a Patcher in BinarySection
Summary:
Refactor SectionPatches to avoid the use of extra map and a cast
from StringRef to std::string.

cherry-picked from FBD26756560

(cherry picked from FBD27490641)
2021-03-18 13:06:18 -07:00
Amir Ayupov 081e39aa15 Rebase: [cherry-pick] [BOLT] Add option to skip writing an output file
Summary:
The user may wish to run BOLT for printing statistics only
(i.e. to check that the profile is valid). Add an option to run BOLT
without writing any output file, similar to a dry run. This option
is triggered by supplying -o with "/dev/null".

(cherry picked from FBD29568632)
2021-03-29 16:04:57 -07:00
Maksim Panchenko e7169be93f [BOLT] Do not assert on jump table heuristic failure
Summary:
During the initial indirect jump analysis, we used to assert that the
discovered jump table type matched the pattern of the corresponding
instruction sequence. E.g., for PIC jump table memory we expected the
PIC jump table instruction sequence. The assertions were too
conservative, as in the case of a mismatch we can mark the indirect jump
as having an unknown control flow. That should be sufficient to either
skip the function processing or rely on relocation information for
possible recovery of the control flow.

(cherry picked from FBD27255816)
2021-03-23 13:41:41 -07:00
Rafael Auler b3c34d568a [BOLT] Fix instrumentation bug in duplicated JTs
Summary:
Fix a bug with instrumentation when trying to instrument
functions that share a jump table with multiple indirect
jumps. Usually, each indirect jump that uses a JT will have its own
copy of it. When this does not happen, we need to duplicate the jump
table safely, so we can split the edges correctly (each copy of the
jump table may have different split edges). For this to happen, we
need to correctly match the sequence of instructions that perform the
indirect jump to identify the base address of the jump table and patch
it to point to the new cloned JT. It was reported to us a case in
which the compiler generated suboptimal code to do an indirect jump
which our matcher failed to identify.

Fixes facebookincubator/BOLT#126

(cherry picked from FBD27065579)
2021-03-15 16:34:25 -07:00
Maksim Panchenko b11c826889 [BOLT] Fix false references to zero-sized objects
Summary:
Whenever BOLT encounters a data reference in code, it tries to convert
it into <Object+Offset> form. The primary reason behind this approach is
to support read-only data-reordering optimization. However, with the
current level of the linker and compiler support we don't have enough
information to always correctly restore the original <Object+Offset>.
E.g. with zero-sized symbols we have to speculate that the actual size
of the underlying object extends to the next symbol. Most of the time,
there will be an object pointed by a zero-sized symbol and even
if we are guessing incorrectly, there will be no harm in creating
references of such form.

The problem happens when there's no object corresponding to the original
symbol and the next object is an (unmarked) jump table:

  A:                   # <- zero-sized object
  .LJUMP_TABLE:
    .long <entry1>
    .long <entry2>
    ....
  .LB:
    .long 21
  .LC:
    .long 42

The jump table will be moved and all references past it (up to the next
named object) will be incorrectly updated.

We should not speculate about the size of A in a case like that and
treat all discovered data objects (and thus references) independently.

(cherry picked from FBD27005660)
2021-03-15 12:06:56 -07:00
Vladislav Khmelevsky 76d346ca14 [BOLT][PR] Instrumentation: Introduce -no-counters-clear and -wait-forks options
Summary:
This PR introduces 2 new instrumentation options:
1. instrumentation-no-counters-clear: Discussed at https://github.com/facebookincubator/BOLT/issues/121
2. instrumentation-wait-forks: Since the instrumentation counters are mapped as MAP_SHARED it will be nice to add ability to wait until all forks of the parent process will die using tracking of process group.
The last patch is just emitBinary code refactor.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/125
GitHub Author: Vladislav Khmelevskyi <Vladislav.Khmelevskyi@huawei.com>

(cherry picked from FBD26919011)
2021-03-09 16:18:11 -08:00
Maksim Panchenko 225a8d7f2c [BOLT] Ignore TBSS section at layout time
Summary:
TBSS section is a "virtual" section that does not take memory or file
space. Ignore it completely while adjusting section sizes.

(cherry picked from FBD26824484)
2021-03-04 16:31:12 -08:00
Vladislav Khmelevsky ec9751eef5 [BOLT][PR] readDynamicRelocations: Skip NONE relocations
Summary:
NONE relocations should not be processed during dynamic relocations read process
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/118
GitHub Author: Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com>

(cherry picked from FBD26489881)
2021-02-17 15:36:58 -08:00
Alexander Yermolovich 06959eedcf Fix up test for Update DW_AT_stmt_list for .debug_types
Summary: As titled.

(cherry picked from FBD28112186)
2021-03-17 17:08:26 -07:00
Rafael Auler da752c9c5c Fix license for a few remaining files
Summary: As titled.

(cherry picked from FBD28112137)
2021-03-17 15:04:19 -07:00
Alexander Yermolovich 0ec91a25df Update DW_AT_stmt_list for .debug_types
Summary:
There is no real link between CU and TU, so relying on fact
that address are the same, and we are updating all of them.

(cherry picked from FBD28112114)
2021-02-17 15:30:10 -08:00
Rafael Auler 16521f1f79 [BOLT] Update license headers
Summary: Update license and fix headers for some files.

(cherry picked from FBD28112041)
2021-03-15 18:04:18 -07:00
Amir Ayupov 1c5d3a056c Rebase: Merge BOLT codebase in monorepo
Summary:
This commit is the first step in rebasing all of BOLT
history in the LLVM monorepo. It also solves trivial build issues
by updating BOLT codebase to use current LLVM. There is still work
left in rebasing some BOLT features and in making sure everything
is working as intended.

History has been rewritten to put BOLT in the /bolt folder, as
opposed to /tools/llvm-bolt.

(cherry picked from FBD33289252)
2020-12-01 16:29:39 -08:00
Alexander Shaposhnikov 0a8aaf56bb [BOLT] Add support for reading profile on Mach-O
Summary: Add support for reading profile on Mach-O.

(cherry picked from FBD25777049)
2021-01-29 16:37:07 -08:00
Alexander Shaposhnikov a0dd5b05dc [BOLT] Add support for dumping profile on MacOS
Summary: Add support for dumping profile on MacOS.

(cherry picked from FBD25751363)
2021-01-28 12:44:14 -08:00
Alexander Shaposhnikov 3b876cc3e7 [BOLT] Add support for dumping counters on MacOS
Summary: Add support for dumping counters on MacOS

(cherry picked from FBD25750516)
2021-01-28 12:32:03 -08:00
Alexander Shaposhnikov 6a84124e1d [BOLT] Add support for __literal16 section on MachO
Summary:
1. Add support for __literal16 section in the instrumentation runtime library for MacOS.
2. Fix emitting __counters section.

(cherry picked from FBD25746342)
2021-01-28 12:04:46 -08:00
Sergey Pupyrev fea6b4e469 an updated version of ExtTSP
Summary:
a few minor updates in block reordering:
- some refactoring to improve readability;
- optimized chain splitting strategy to improve quality of layout and performance of the algorithm.

(cherry picked from FBD25126220)
2021-01-27 18:29:16 -08:00
Alexander Shaposhnikov d6e60c5bec [BOLT] Enable intToStr for MacOS
Summary: Enable intToStr et al. in the runtime library for MacOS.

(cherry picked from FBD25745358)
2021-01-20 16:40:17 -08:00
Alexander Shaposhnikov faaefff618 [BOLT] Fix operator new signature
Summary:
Use size_t for the first parameter of operator new.
https://en.cppreference.com/w/cpp/memory/new/operator_new

(cherry picked from FBD25750921)
2021-01-20 12:56:41 -08:00
Amir Ayupov a86cd533b3 [BOLT] Fix missing newlines in debug prints
(cherry picked from FBD25966797)
2021-01-19 18:43:16 -08:00
Rafael Auler 0de92b8346 [PERF2BOLT] Relax segment matching requirements
Summary:
When looking at perf.data's available binaries and their
respective mmap'ed segments, match them with the input binary by
looking at both aligned and non-aligned addresses. If we suppose
the alignment is the mmap'ed page size, we may miss some cases and
perf2bolt will refuse to proceed because it failed to match the
input binary with a process recorded in perf.data.

(cherry picked from FBD25732673)
2021-01-11 06:24:46 -08:00
Rafael Auler e3898d5969 [BOLT] Add threshold options for lite mode
Summary:
Add options for trading processing speed for binary performance.

  -lite-threshold-pct=<uint>
    Threshold (in percent) for selecting functions to process in lite
    mode. Higher threshold means fewer functions to process.
    E.g threshold of 90 means only top 10 percent of functions with
    profile will be processed.

  -lite-threshold-count=<uint>
    Similar to '-lite-threshold-pct' but specify threshold using
    absolute function call count. I.e. limit processing to functions
    executed at least the specified number of times.

  -no-scan
    Do not scan cold functions for external references (may result in
    slower binary).

(cherry picked from FBD24739092)
2020-12-30 12:23:58 -08:00
Rafael Auler e0261a22ce [TEST] Remove dependency on debug output
Summary:
Test mistakenly used -debug output, which makes it fail on
no-asserts build.

(cherry picked from FBD25399449)
2020-12-09 12:25:58 -08:00
Rafael Auler d2f68039bc [BOLT] Fix shrinkwrapping bug when changing frame alignment
Summary:
This fixes a bug with shrink wrapping when trying to move
push-pops in a function where we are not allowed to modify the
stack layout for alignment reasons. In this bug, we failed to
propagate alignment requirement upwards in the call graph from
function A to B when: (1) there is a cycle in the call graph and
(2) the distance from A to B is greater than 1 in the call graph
and (3) there is a node in the path from A to B, not including
A or B, that does not access parameters in the stack.

(cherry picked from FBD25315977)
2020-12-03 20:09:32 -08:00
Alexander Shaposhnikov e067f2adf4 Inject instrumentation's global dtor on MachO
Summary:
This diff is a preparation for dumping the profile generated by BOLT's instrumenation on MachO.

1/  Function "bolt_instr_fini" is placed into the predefined section "__fini"

2/ In the instrumentation pass we create a symbol "bolt_instr_fini" and
replace the last global destructor with it.

This is a temporary solution, in the future we need to register bolt_instr_fini in addition to the existing destructors without dropping the last one.

(cherry picked from FBD25071864)
2020-11-19 18:18:28 -08:00
Alexander Shaposhnikov 1b258b8908 Refactor syscall wrappers for OSX
Summary: Refactor syscall wrappers for OSX.

(cherry picked from FBD25084642)
2020-11-19 14:56:45 -08:00
Amir Ayupov f9d00d418b [BOLT] Handle insertion of updated CFI at the first basic block
Summary:
Fix corner case of insertion of updated CFI with unset `PrevBB`.
Handle it in the same way as inserting past hot-cold split point.

(cherry picked from FBD24943911)
2020-11-17 18:40:19 -08:00
Alexander Shaposhnikov 1cf23e5ee8 Link the instrumentation runtime on OSX
Summary: Link the instrumentation runtime on OSX.

(cherry picked from FBD24390019)
2020-11-17 13:57:29 -08:00
Maksim Panchenko 7eaf63a118 [BOLT] Fix data race while running split functions pass
Summary:
In BinaryContext::calculateEmittedSize(), after the temporary code
emission, we have to perform a cleanup and mark all symbols used
during the emission as undefined and unregistered (so that we can emit
them again later). The cleanup is happening even for symbols that were
referenced and not defined by emitted code.

If all emitted symbols are local, there is no risk that one thread will
define a symbol while some other thread will undefine it in its cleanup
code. Such behavior is expected as local symbols can only be referenced
within the containing function and each function is processed in one
thread. However, secondary entry points have associated global symbols
and if we emit them, then it is possible for a thread to undefine
a symbol while the other thread had defined it and was in the process of
emitting the fragment with it. In such case, a data race may happen and
the thread that contains the definition of the symbol may define it
twice causing a redefinition error.

To avoid the data race, we skip the emission of secondary entry global
symbols when emitting code used only for the size estimation.

(cherry picked from FBD24986007)
2020-11-16 14:34:51 -08:00
Sergey Pupyrev 1e9b733008 a new version of hfsort+
Summary:
A faster and better version of function reordering:
- fixed a bug when some computed probabilities were negative;
- changed an O(n^2) loop to a priority queue to find a candidate of chains to merge

(cherry picked from FBD24571208)
2020-11-14 13:18:58 -08:00
Amir Ayupov 6401af89c7 [BOLT] Support jump tables in split fragments with entries pointing back to parent functions
Summary:
Support jump tables belonging to split fragments with entries
pointing back to parent functions.
While skipping such families of functions, make sure to use the
topmost fragment to ignore its fragments.

(cherry picked from FBD24907438)
2020-11-12 11:54:51 -08:00
Amir Ayupov e8234b3b98 [BOLT] Add invalid offset for a JT entry pointing to a fragment
Summary:
In a jump table identification, register an invalid offset for jump table
entries pointing to function fragments.
These invalid offsets have no effect other than padding the jump
table size, calculated as `max(OffsetEntries, Entries)`.
Correct jump table size is required in strict mode (enabled by default
in aggregation mode by `perf2bolt`) in accounting of all PC-relative
relocations in data.
Functions containing these jump tables with invalid offsets are
marked to be ignored immediately afterwards in
`populateJumpTables`.

(cherry picked from FBD24897464)
2020-11-12 11:54:44 -08:00
Amir Ayupov 157129b751 [BOLT] Debug logging in analyzeJumpTable
Summary:
Added debug logging in/around `analyzeJumpTable`:
- Dump jump table entries as they are being processed:
```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2)
  * Checking 0x428ff40 -> OK: real entry
  * Checking 0x428ff44 -> OK: real entry
  * Checking 0x428ff48 -> OK: real entry
  * Checking 0x428ff4c -> OK: real entry
  * Checking 0x428ff50 -> OK: real entry
  * Checking 0x428ff54 -> OK: address in split fragment
  * Checking 0x428ff58 -> OK: address in split fragment
  * Checking 0x428ff5c -> OK: address in split fragment
  * Checking 0x428ff60 -> OK: address in split fragment
  * Checking 0x428ff64 -> OK: real entry
  * Checking 0x428ff68 -> OK: real entry
  * Checking 0x428ff6c -> OK: real entry
  * Checking 0x428ff70 -> OK: real entry
BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2)
  * Checking 0x428ff74 -> OK: real entry
  ...
```
- Dump skipped functions:
```
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2)
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2)
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2)
```
- Dump values of unclaimed PC-relative relocations in data.

(cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
Amir Ayupov c0cb550536 Minimize X86/shrinkwrapping-critedge test case
Summary: Minimized test case while preserving the CFG subgraph with an issue

(cherry picked from FBD24871063)
2020-11-10 21:22:57 -08:00
Amir Ayupov e54d389799 [BOLT] Disable DynoStats printing after SCTC
Summary:
Introduce new BinaryFunction flag `IsCanonicalCFG`, which gets
unset by SCTC pass. Make DynoStats collection conditional on this
new flag.
SCTC leaves CFG in a state where branch counters of BBs with tail
calls/conditional tail calls are not available (except via annotations,
which get stripped by `lower-annotations`). Without branch
counters, DynoStats are invalid.

(cherry picked from FBD24558050)
2020-11-10 10:51:23 -08:00