Commit Graph

997 Commits

Author SHA1 Message Date
Rafael Auler 170f73ac9e [BOLT] Fix fix-branches in presence of JRCXZ and friends
Summary:
Do not fail/assert when trying to reorder blocks that terminate
with JRCXZ/JECXZ/LOOP instructions. We cannot invert the condition of
these instructions, so just treat them accordingly in fixBranches().

(cherry picked from FBD22487107)
2020-07-15 23:02:58 -07:00
Angélica Moreira 181327d763 [BOLT] Add the FeatureMiner pass to extract Calder's features.
(cherry picked from FBD19844247)
2020-07-07 23:01:22 -07:00
Tanvir Ahmed Khan f40ffa0dc8 Report stale sample count and percentage
Summary: This diff adds extra reporting of total number of stale branch samples for the binary.

(cherry picked from FBD22304965)
2020-07-06 21:35:44 -07:00
Maksim Panchenko 3e795c8a5f [BOLT] Ignore addresses from non-allocatable sections
Summary:
We've accidentally registered TBSS section address with a BinaryContext
resulting in addresses being attributed to it when
getSectionForAddress() was called.

(cherry picked from FBD22369323)
2020-07-06 14:39:44 -07:00
takh a9fac6a89f Support for CDF distribution of heatmap buckets
Summary: This diff adds the support for generating CDF distributions of heatmap buckets.

(cherry picked from FBD22128291)
2020-06-18 16:47:21 -07:00
Xun Li 84eae1a413 [Bolt] Improve coding style for runtime lib related code
Summary:
Reading through the LLVM coding standard again, realized a few places where I didn't follow the standard when coding. Addressing them:
1. prefer static functions over functions in unnamed namespace.
2. #include as little as possible in headers
3. Have vtable anchors.

(cherry picked from FBD22353046)
2020-07-02 14:28:13 -07:00
Maksim Panchenko e233dec467 [BOLT] Skip R_X86_64_PLT32 relocation verification
Summary:
R_X86_64_PLT32 relocations recorded by the linker may point to the PLT
section instead of being resolved to the symbol reported by the
relocation. Sometimes they could point to the symbol too. Disable
internal verification for this type of relocation.

Include a fix for symbol address calculation when it is based on the
extracted value. The truncation to the relocation size is needed if
the results overflows.

(cherry picked from FBD22317952)
2020-06-30 19:58:43 -07:00
Rafael Auler 26ad0bd951 [TESTS] Re-add issue20/issue26 tests
Summary:
Re-add tests removed because they used to depend on yaml2obj.
Rewrite them with an assembler (llvm-mc) and use the system linker to
produce a valid ELF as input to BOLT.

(cherry picked from FBD22323449)
2020-06-30 18:36:49 -07:00
Rafael Auler 41cb6b68ed Update X86/pre-aggregated-perf.test
Summary:
Add REQUIRED statement.

(cherry picked from FBD22290759)
2020-06-24 18:24:07 -07:00
Maksim Panchenko ffaba22476 [BOLT] Do not emit duplicate org symbols
Summary:
When adding symbols for patched functions, we may end up emitting
multiple symbols per function if the function has multiple names (e.g.
after identical code folding by the linker).

(cherry picked from FBD22294112)
2020-06-24 12:36:15 -07:00
Maksim Panchenko 250ca4082e [BOLT] Add static binary support
Summary:
Accept binaries without dynamic section/segment as a valid input.

Modify the check for invalid debug info "executables" that are result of
running "objcopy --only-keep-debug". Instead of checking for an empty
dynamic segment, check that ".text" is mapped into a valid segment.

Move SegmentMapInfo inside BinaryContext.

Fixes facebookincubator/BOLT#91

Temporarily removing issue*.test tests that use yaml2obj and operate on
fake binaries.

(cherry picked from FBD22271481)
2020-06-26 16:52:07 -07:00
Maksim Panchenko 94230a2c07 [perf2bolt] Relax rules for aggregation in strict mode
Summary:
While aggregating perf.data events, even in strict mode, there is no
need to process all functions since we are not generating an output
binary. However, it's still important to convert data for as many
functions as possible, even for ones with unknown internal control flow.

(cherry picked from FBD22248390)
2020-06-25 16:29:17 -07:00
Maksim Panchenko 4aaa8892dd [BOLT] Ignore duplicate relocations
Summary:
lld linker may emit static relocations against addresses that also have
dynamic relocations associated with them. When this happens, BOLT fails
to validate the extracted value at the address.

Read dynamic relocations in the binary and ignore static relocations at
addresses that have a duplicate dynamic relocation.

(cherry picked from FBD22192345)
2020-06-23 12:22:58 -07:00
Maksim Panchenko 13baf47a3c [BOLT] Add '-force-patch' to forcefully patch old entries
Summary:
The option is useful for debugging.

Also, print personality function when dumping a function.

(cherry picked from FBD22169482)
2020-06-22 13:08:28 -07:00
Maksim Panchenko 4946b881a8 [BOLT] Fix getNewValueForSymbol()
Summary:
getNewValueForSymbol() uses orc::RTDyldObjectLinkingLayer::findSymbol()
to resolve symbol values. The latter will always return JITSymbol,
even if there was no symbol defined. The address for the undefined
symbol will be zero, but some symbols could legally be resolved to zero
too.

We need to distinguish between real zero-valued symbols and symbols that
were not emitted and are not visible by orc::RTDyldObjectLinkingLayer.
If zero address is returned by ORC, check for a binary data with the
same name and use its address for the symbol resolution.

(cherry picked from FBD22175269)
2020-06-22 16:16:08 -07:00
Maksim Panchenko ae296ea665 [BOLT] Allow to overwrite -use-old-text option
(cherry picked from FBD22169409)
2020-06-22 14:05:19 -07:00
Maksim Panchenko 12b7987d4f [BOLT] Ignore functions that failed validation
Summary:
If a function failed internal calls validation, we can ignore it and
keep processing the binary.

(cherry picked from FBD22169381)
2020-06-22 12:59:03 -07:00
Maksim Panchenko efce443e0d [BOLT] Create entry points for internal refs from external code
Summary:
If we detect an internal function reference from code outside of the
function, then create an entry point at that location.

(cherry picked from FBD22169337)
2020-06-22 13:05:13 -07:00
Maksim Panchenko 0403adde32 [BOLT] Fixes for scanExternalRefs()
Summary:
In my previous commit, I've accidentally reverted the condition while
evaluating a branch target.

Also, do not emit instruction for relocation purposes in
scanExternalRefs() if there was no TargetSymbol set and we have not
produced new relocations.

(cherry picked from FBD22169317)
2020-06-22 12:50:49 -07:00
Maksim Panchenko 8374e8e3fe [BOLT] Properly register symbols at secondary entry points
Summary:
We may end up with a secondary entry symbol set to zero if there was no
symbol in the input file at the entry point address, and if we skipped
the function emission, e.g. if it was ignored. In that case, the symbol
should be properly initialized with a proper address.

(cherry picked from FBD22169167)
2020-06-22 12:37:48 -07:00
Maksim Panchenko 15fffe2824 [BOLT] Fix memory error
Summary: Fix for double-free I've introduced earlier.

(cherry picked from FBD22132595)
2020-06-18 20:59:01 -07:00
Maksim Panchenko db4642d0a6 [BOLT] Support -hot-text in lite mode
Summary: Update special symbol references in functions that are not emitted.

(cherry picked from FBD22120995)
2020-06-18 11:10:41 -07:00
Maksim Panchenko e7c3464226 [BOLT] Disable trapping on AVX-512 by default
Summary:

(cherry picked from FBD22118562)
2020-06-18 09:55:05 -07:00
Maksim Panchenko 0ce0bce9e7 [BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.

In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.

Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.

The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.

If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.

'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.

With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.

(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
Xun Li e22378d20a Be more flexible when locating runtime libs
Summary:
In some cases we install bolt binary into one level deeper in bin/, such as bin/install/, we need to go back one more level to find lib directory.

(cherry picked from FBD22070974)
2020-06-16 09:59:27 -07:00
Alexander Shaposhnikov 0823882d47 Link functions on MachO
Summary: Add first bits for linking functions on MachO.

(cherry picked from FBD21991721)
2020-06-12 20:16:27 -07:00
Alexander Shaposhnikov 7950e1e5bb Provide a redundant declaration of KernelBaseAddr
Summary: Adjust the code to make it buildable with clang-10.

(cherry picked from FBD22055933)
2020-06-15 16:06:07 -07:00
takh 48b71ad219 Generate heatmap for linux kernel
Summary:
This diff handles several challenges related to heatmap generation for Linux kernel (vmlinux elf file):
- If the input binary elf file contains the section `__ksymtab`, this diff assumes that this is the linux kernel `vmlinux` file and enables an extra flag `LinuxKernelMode`
- In `LinuxKernelMode`, we only support heat map generation right now, therefore it ensures that current BOLT mode is heat map generation. Otherwise, it exits with error.
- For some Linux symbol and section combinations, BOLT may not be able to find section for symbol (specially symbols that specifies the end of some section). For such cases, we show an warning message without exiting which was the previous behavior.
- Linux kernel elf file does not contain dynamic section, therefore, we don't exit when no dynamic section is found for linux kernel binary.
- Current `ParseMMap` logic does not work with linux kernel. MMap entries for linux kernel uses `PERF_RECORD_MMAP` format instead of typical `PERF_RECORD_MMAP2` format. Since linux kernel address mapping is absolute (same as specified in the ELF file), we avoid calling `ParseMMap` in linux kernel mode.
- Linux kernel entries are registered with PID -1, therefore `BinaryMMapInfo` lookup is not required for linux kernel entries. Similarly, `adjustLBR` is also not required.
- Default max address in linux kernel mode is highest unsigned 64-bit integer instead of current 4GBs.
- Added another new parameter for heatmap, `MinAddress`, in case of Linux kernel mode which is `KernelBaseAddress`, otherwise, it is 0. While registering Heatmap sample counts from LBR entries, any address lower than this `MinAddress` is ignored.
- `IgnoreInterruptLBR` is disabled in linux kernel mode to ensure that kernel entries are processed

Currently, linux kernel heat map also include heat map for Linux kernel modules that are not part of vmlinux elf file. This is intentional to identify other potential optimization opportunities. If reviewers think, those modules should be omitted, I will disable those modules based on highest end address of a vmlinux elf section.

(cherry picked from FBD21992765)
2020-06-10 23:00:39 -07:00
Maksim Panchenko 2d524fd5e2 [BOLT] Update section index for symbols from unemitted functions
Summary:
Under some conditions, e.g. while running in lite mode or when a
function is non-simple, BOLT may decide not to emit function code and
hence there's no need to update the symbol. However, since we change
section table, the corresponding section index may need an update.

Also, update section index for ICF symbols.

(cherry picked from FBD21970017)
2020-06-09 19:12:06 -07:00
Xun Li 9bd7161529 Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.

Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.

(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
Xun Li 00892a5fd0 Refactor runtime library
Summary:
As we are adding more types of runtime libraries, it would be better to move the runtime library out of RewriteInstance so that it could grow separately. This also requires splitting the current implementation of Instrumentation.cpp to two separate pieces, one as normal Pass, one as the runtime library. The Instrumentation Pass would pass over the generated data to the runtime library, which will use to emit binary and perform linking.

This patch does the following:
1. Turn Instrumentation class into an optimization pass. Register the pass in the pass manager instead of in RewriteInstance.
2. Split all the data that are generated by Instrumentation that's needed by runtime library into a separate data structure called InstrumentationSummary. At the creation of Instrumentation pass, we create an instance of such data structure, which will be moved over to the runtime at the end of the pass.
3. Added a runtime library member to BinaryContext. Set the member at the end of Instrumentation pass.
4. In BinaryEmitter, make BinaryContext to also emit runtime library binary.
5. Created a base class RuntimeLibrary, that defines the interface of a runtime library, along with a few common helper functions.
6. Created InstrumentationRuntimeLibrary which inherits from RuntimeLibrary, that does all the work (mostly copied over) for emit and linking.
7. Added a new directory called RuntimeLibs, and put all the runtime library related files into it.

(cherry picked from FBD21694762)
2020-05-21 14:28:47 -07:00
Alexander Shaposhnikov cd067ae1e8 Emit functions on MachO
Summary: Start emitting  functions (for MachO input binaries).

(cherry picked from FBD21721586)
2020-05-26 04:21:04 -07:00
Xun Li 2b65b3aa6b Use shuffle instead of random_shuffle
Summary: random_shuffle is deprecated in C++14.

(cherry picked from FBD21698180)
2020-05-21 16:46:27 -07:00
Xun Li 8a680745dd Remove const call to take_front
Summary:
take_front() is a const member of StringRef. Calling it does nothing.
This suggests that this line of code is useless, deleting it.
But it's good to double check, what was the original intention here?

(cherry picked from FBD21697637)
2020-05-21 16:25:05 -07:00
Maksim Panchenko 8729171182 [BOLT] Refactor profile-handling code
Summary:
This diff handles several issues related to profile reading and
handling:
  * Unifies interface used by 3 profile readers in ProfileReaderBase.
  * Adds automatic detection of the profile file contents.
  * Removes reader-specific fields from BinaryFunction and BinaryData.
    All the information is stored in instruction annotations.
  * Removes implicit memory dependencies in annotations on profile
    reader instance.
  * Adds lite mode support to YAML reader.
  * Moves profile reading code out of BinaryFunction.

(cherry picked from FBD21601411)
2020-05-07 23:00:29 -07:00
Maksim Panchenko cce49b9522 [BOLT] Remove StringRef from IndirectCallProfile
Summary:
IndirectCallProfile was holding to a StringRef from a profile reader
providing an implicit dependency on the reader.

(cherry picked from FBD21587101)
2020-05-14 17:34:20 -07:00
Rafael Auler f91d121eee [BOLT] Add option to tag version
Summary:
Add a dummy option in BOLT to allow us to write any string in
the bolt info section. This is accomplished since we record the complete
argv vector. This string used to tag this binary with any ID that can
later be associated with a specific BOLT invocation.

(cherry picked from FBD21441902)
2020-05-06 17:31:25 -07:00
Maksim Panchenko 689447bf10 [BOLT] Change .debug_line emission for non-simple functions
Summary:
We use a special routine to emit line info for functions that we do not
overwrite. The resulting DWARF was not quite efficient as we were
advancing addresses using a larger than needed opcodes. Since there were
only a few functions that we didn't emit/overwrite, it was not a big
issue.

However, in lite mode the majority of functions are not overwritten and
as a result, the inefficiency in debug line encoding got exposed and
binaries were getting larger than expected .debug_line sections.

Fix it by using more conventional line table opcodes for address
advancing.

(cherry picked from FBD21423074)
2020-05-05 23:56:50 -07:00
Maksim Panchenko 96c4168ddc [BOLT] Ignore kernel interrupts by default
(cherry picked from FBD21431563)
2020-05-06 11:52:16 -07:00
Xun Li 7b61bdf8ea Check runtime lib format within archiver
Summary: We only support linking ELF runtime library right now. If the library is an archiver, we check that each individual library inside the archiver is an ELF library.

(cherry picked from FBD21388672)
2020-05-04 13:57:21 -07:00
Maksim Panchenko 924d0bdb08 [BOLT] Introduce lite processing mode without relocations
Summary:
When optimizing a binary without relocations, we can skip processing
functions without profile (cold functions). By skipping processing of
cold functions, we reduce the processing time and memory. We call
such mode a lite mode, and it is enabled by default.

Some processing is still done for functions without profile even in lite
mode. scanExternalRefs() function is used to detect secondary entry
points to functions that are not marked in the symbol table.

Note that the no-relocation requirement is a temporary limitation
of the lite mode.

(cherry picked from FBD21366567)
2020-05-03 15:49:58 -07:00
Maksim Panchenko 04c5d4fcab [BOLT] Introduce isIgnored() function attribute
Summary:
Whenever a function is not meant for processing, e.g. when the user
requests to optimize only a subset of functions, mark the function as
ignored. Use this attribute instead of opts::shouldProcess().

(cherry picked from FBD21374806)
2020-05-03 13:54:45 -07:00
Maksim Panchenko 4e69764c65 [BOLT] Fix dyno stats after ICF in non-reloc mode
Summary:
The commit that fixed ICF determinism in non-relocation mode disabled
profile merging for functions. Dyno stats output needs to be updated to
reflect the lack of merge.

(cherry picked from FBD21366046)
2020-05-01 17:51:43 -07:00
Maksim Panchenko b62a1774af [BOLT] Cover PIC jump table reference in non-strict mode
Summary:
In non-strict relocation mode it was possible to miss a jump table
reference leading to incorrect code.

(cherry picked from FBD21251467)
2020-04-26 17:51:07 -07:00
Maksim Panchenko ac36e17a73 [BOLT][BFC] Refactor code for adding secondary function entries
Summary:
In non-relocation mode, the code for marking a function non-simple was
decoupled from the code that added new entry points.  Fix that.

(cherry picked from FBD21264247)
2020-04-27 13:40:53 -07:00
Maksim Panchenko 5296b6d12a [BOLT] Change symbol handling for secondary function entries
Summary:
Some functions could be called at an address inside their function body.
Typically, these functions are written in assembly as C/C++ does not
have a multi-entry function concept. The addresses inside a function
body that could be referenced from outside are called secondary entry
points.

In BOLT we support processing functions with secondary/multiple entry
points. We used to mark basic blocks representing those entry points
with a special flag. There was only one problem - each basic block has
exactly one MCSymbol associated with it, and for the most efficient
processing we prefer that symbol to be local/temporary. However, in
certain scenarios, e.g. when running in non-relocation mode, we need
the entry symbol to be global/non-temporary.

We could create global symbols for secondary points ahead of time when
the entry point is marked in the symbol table. But not all such entries
are properly marked. This means that potentially we could discover an
entry point only after disassembling the code that references it, and
it could happen after a local label was already created at the same
location together with all its references. Replacing the local symbol
and updating the references turned out to be an error-prone process.

This diff takes a different approach. All basic blocks are created with
permanently local symbols. Whenever there's a need to add a secondary
entry point, we create an extra global symbol or use an existing one
at that location. Containing BinaryFunction maps a local symbol of a
basic block to the global symbol representing a secondary entry point.
This way we can tell if the basic block is a secondary entry point,
and we emit both symbols for all secondary entry points. Since secondary
entry points are quite rare, the overhead of this approach is minimal.

Note that the same location could be referenced via local symbol from
inside a function and via global entry point symbol from outside.
This is true for both primary and secondary entry points.

(cherry picked from FBD21150193)
2020-04-19 22:29:54 -07:00
Maksim Panchenko ac1af09e82 [BOLT][NFC] Change wording while reporting functions stats
Summary:

(cherry picked from FBD21242167)
2020-04-24 16:36:22 -07:00
Maksim Panchenko fbca177a83 [BOLT] Speedup PLT processing
Summary:
With larger PLT sizes, linear PLT symbol name lookup becomes a
bottleneck.

(cherry picked from FBD21223695)
2020-04-23 21:29:10 -07:00
Maksim Panchenko 0ea98d1f0b [BOLT] Option to fail if invalid profile detected
Summary:
Add an option to fail processing of the input binary if the profile
is not accurate:

  -stale-threshold=<uint>
    - maximum percentage of stale functions to tolerate (default: 100)

Default (100) means never to fail.

A function profile is considered stale if any branch in its profile
has invalid source or destination.

Use `-stale-threshold=0` to fail if any staleness is detected in the
profile.

(cherry picked from FBD21189036)
2020-04-22 15:09:49 -07:00
Maksim Panchenko 33e0b2aa58 [BOLT] Do not emit old .eh_frame in relocation mode
Summary:
In relocation mode, there is no use for old .eh_frame section. Moreover,
it can interfere with new EH frames via .eh_frame_hdr when the original
.text is reused.

(cherry picked from FBD21120070)
2020-04-19 12:55:43 -07:00