Commit Graph

906 Commits

Author SHA1 Message Date
takh 0033a7612d Linux kernel marker to update special sections
Summary: This diff adds SDT marker like LK marker to update special lk sections

(cherry picked from FBD22932157)
2020-08-04 13:50:00 -07:00
Maksim Panchenko 8f2a962866 [perf2bolt] Fix for SKL bug workaround
Summary:
Some LBR traces contain less than 16/32 entries. When the first trace is
less than the standard length, we fail to enable SKL bug workaround with
BAT mode enabled. The result is a bad profile from perf2bolt. The fix
is to check the length of every trace, not just the first one.

Issue facebookincubator/BOLT#94

(cherry picked from FBD22917971)
2020-08-04 10:59:37 -07:00
Amir Ayupov 8f7cb54ae5 Added execution count threshold option
Summary:
Added execution count threshold option (execution-count-threshold) controlling the optimizations that are sensitive to the accuracy of the profiling data:
- BB reordering
- function splitting
- frame opts
- shrink wrapping
- indirect call promotion

(cherry picked from FBD22682171)
2020-07-27 18:07:18 -07:00
Rafael Auler c6799a689d [BOLT] Fix stack alignment for runtime lib
Summary:
Right now, the SAVE_ALL sequence executed upon entry of both
of our runtime libs (hugify and instrumentation) will cause the stack to
not be aligned at a 16B boundary because it saves 15 8-byte regs. Change
the code sequence to adjust for that. The compiler may generate code
that assumes the stack is aligned by using movaps instructions, which
will crash.

(cherry picked from FBD22744307)
2020-07-27 16:52:51 -07:00
Rafael Auler ed02946281 [BOLT] Fix hot_end symbol update with user function order
Summary:
If no profile data is provided, but only a user-provided order
file for functions, fix the placement of the __hot_end symbol.

(cherry picked from FBD22713265)
2020-07-24 10:28:36 -07:00
Amir Ayupov 6b89a9cb44 Handle intra-function call in instrumentOneTarget
Summary: Added handling of intra-function calls (internal control transfer by call instruction) to instrumentOneTarget

(cherry picked from FBD22606033)
2020-07-17 23:16:56 -07:00
Amir Ayupov f7d4bed9d1 Extracted sequence insertion function into helper function
Summary: Factored out common code from multiple places into a helper function

(cherry picked from FBD22606101)
2020-07-17 23:16:52 -07:00
Maksim Panchenko 937244b4f2 [BOLT] Allow to specify -reorder-functions option multiple times
Summary: Need to be able to override the option.

(cherry picked from FBD22583585)
2020-07-17 10:08:51 -07:00
Rafael Auler 6c8fc28892 Revert "[BOLT] Add the FeatureMiner pass to extract Calder's features."
This reverts commit 2476f46af02ccce04e9ed456462dd098460e4e1f.

Reviewed By: maks

(cherry picked from FBD28111787)
2020-07-16 17:35:55 -07:00
Rafael Auler 170f73ac9e [BOLT] Fix fix-branches in presence of JRCXZ and friends
Summary:
Do not fail/assert when trying to reorder blocks that terminate
with JRCXZ/JECXZ/LOOP instructions. We cannot invert the condition of
these instructions, so just treat them accordingly in fixBranches().

(cherry picked from FBD22487107)
2020-07-15 23:02:58 -07:00
Angélica Moreira 181327d763 [BOLT] Add the FeatureMiner pass to extract Calder's features.
(cherry picked from FBD19844247)
2020-07-07 23:01:22 -07:00
Tanvir Ahmed Khan f40ffa0dc8 Report stale sample count and percentage
Summary: This diff adds extra reporting of total number of stale branch samples for the binary.

(cherry picked from FBD22304965)
2020-07-06 21:35:44 -07:00
Maksim Panchenko 3e795c8a5f [BOLT] Ignore addresses from non-allocatable sections
Summary:
We've accidentally registered TBSS section address with a BinaryContext
resulting in addresses being attributed to it when
getSectionForAddress() was called.

(cherry picked from FBD22369323)
2020-07-06 14:39:44 -07:00
takh a9fac6a89f Support for CDF distribution of heatmap buckets
Summary: This diff adds the support for generating CDF distributions of heatmap buckets.

(cherry picked from FBD22128291)
2020-06-18 16:47:21 -07:00
Xun Li 84eae1a413 [Bolt] Improve coding style for runtime lib related code
Summary:
Reading through the LLVM coding standard again, realized a few places where I didn't follow the standard when coding. Addressing them:
1. prefer static functions over functions in unnamed namespace.
2. #include as little as possible in headers
3. Have vtable anchors.

(cherry picked from FBD22353046)
2020-07-02 14:28:13 -07:00
Maksim Panchenko e233dec467 [BOLT] Skip R_X86_64_PLT32 relocation verification
Summary:
R_X86_64_PLT32 relocations recorded by the linker may point to the PLT
section instead of being resolved to the symbol reported by the
relocation. Sometimes they could point to the symbol too. Disable
internal verification for this type of relocation.

Include a fix for symbol address calculation when it is based on the
extracted value. The truncation to the relocation size is needed if
the results overflows.

(cherry picked from FBD22317952)
2020-06-30 19:58:43 -07:00
Rafael Auler 26ad0bd951 [TESTS] Re-add issue20/issue26 tests
Summary:
Re-add tests removed because they used to depend on yaml2obj.
Rewrite them with an assembler (llvm-mc) and use the system linker to
produce a valid ELF as input to BOLT.

(cherry picked from FBD22323449)
2020-06-30 18:36:49 -07:00
Rafael Auler 41cb6b68ed Update X86/pre-aggregated-perf.test
Summary:
Add REQUIRED statement.

(cherry picked from FBD22290759)
2020-06-24 18:24:07 -07:00
Maksim Panchenko ffaba22476 [BOLT] Do not emit duplicate org symbols
Summary:
When adding symbols for patched functions, we may end up emitting
multiple symbols per function if the function has multiple names (e.g.
after identical code folding by the linker).

(cherry picked from FBD22294112)
2020-06-24 12:36:15 -07:00
Maksim Panchenko 250ca4082e [BOLT] Add static binary support
Summary:
Accept binaries without dynamic section/segment as a valid input.

Modify the check for invalid debug info "executables" that are result of
running "objcopy --only-keep-debug". Instead of checking for an empty
dynamic segment, check that ".text" is mapped into a valid segment.

Move SegmentMapInfo inside BinaryContext.

Fixes facebookincubator/BOLT#91

Temporarily removing issue*.test tests that use yaml2obj and operate on
fake binaries.

(cherry picked from FBD22271481)
2020-06-26 16:52:07 -07:00
Maksim Panchenko 94230a2c07 [perf2bolt] Relax rules for aggregation in strict mode
Summary:
While aggregating perf.data events, even in strict mode, there is no
need to process all functions since we are not generating an output
binary. However, it's still important to convert data for as many
functions as possible, even for ones with unknown internal control flow.

(cherry picked from FBD22248390)
2020-06-25 16:29:17 -07:00
Maksim Panchenko 4aaa8892dd [BOLT] Ignore duplicate relocations
Summary:
lld linker may emit static relocations against addresses that also have
dynamic relocations associated with them. When this happens, BOLT fails
to validate the extracted value at the address.

Read dynamic relocations in the binary and ignore static relocations at
addresses that have a duplicate dynamic relocation.

(cherry picked from FBD22192345)
2020-06-23 12:22:58 -07:00
Maksim Panchenko 13baf47a3c [BOLT] Add '-force-patch' to forcefully patch old entries
Summary:
The option is useful for debugging.

Also, print personality function when dumping a function.

(cherry picked from FBD22169482)
2020-06-22 13:08:28 -07:00
Maksim Panchenko 4946b881a8 [BOLT] Fix getNewValueForSymbol()
Summary:
getNewValueForSymbol() uses orc::RTDyldObjectLinkingLayer::findSymbol()
to resolve symbol values. The latter will always return JITSymbol,
even if there was no symbol defined. The address for the undefined
symbol will be zero, but some symbols could legally be resolved to zero
too.

We need to distinguish between real zero-valued symbols and symbols that
were not emitted and are not visible by orc::RTDyldObjectLinkingLayer.
If zero address is returned by ORC, check for a binary data with the
same name and use its address for the symbol resolution.

(cherry picked from FBD22175269)
2020-06-22 16:16:08 -07:00
Maksim Panchenko ae296ea665 [BOLT] Allow to overwrite -use-old-text option
(cherry picked from FBD22169409)
2020-06-22 14:05:19 -07:00
Maksim Panchenko 12b7987d4f [BOLT] Ignore functions that failed validation
Summary:
If a function failed internal calls validation, we can ignore it and
keep processing the binary.

(cherry picked from FBD22169381)
2020-06-22 12:59:03 -07:00
Maksim Panchenko efce443e0d [BOLT] Create entry points for internal refs from external code
Summary:
If we detect an internal function reference from code outside of the
function, then create an entry point at that location.

(cherry picked from FBD22169337)
2020-06-22 13:05:13 -07:00
Maksim Panchenko 0403adde32 [BOLT] Fixes for scanExternalRefs()
Summary:
In my previous commit, I've accidentally reverted the condition while
evaluating a branch target.

Also, do not emit instruction for relocation purposes in
scanExternalRefs() if there was no TargetSymbol set and we have not
produced new relocations.

(cherry picked from FBD22169317)
2020-06-22 12:50:49 -07:00
Maksim Panchenko 8374e8e3fe [BOLT] Properly register symbols at secondary entry points
Summary:
We may end up with a secondary entry symbol set to zero if there was no
symbol in the input file at the entry point address, and if we skipped
the function emission, e.g. if it was ignored. In that case, the symbol
should be properly initialized with a proper address.

(cherry picked from FBD22169167)
2020-06-22 12:37:48 -07:00
Maksim Panchenko 15fffe2824 [BOLT] Fix memory error
Summary: Fix for double-free I've introduced earlier.

(cherry picked from FBD22132595)
2020-06-18 20:59:01 -07:00
Maksim Panchenko db4642d0a6 [BOLT] Support -hot-text in lite mode
Summary: Update special symbol references in functions that are not emitted.

(cherry picked from FBD22120995)
2020-06-18 11:10:41 -07:00
Maksim Panchenko e7c3464226 [BOLT] Disable trapping on AVX-512 by default
Summary:

(cherry picked from FBD22118562)
2020-06-18 09:55:05 -07:00
Maksim Panchenko 0ce0bce9e7 [BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.

In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.

Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.

The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.

If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.

'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.

With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.

(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
Xun Li e22378d20a Be more flexible when locating runtime libs
Summary:
In some cases we install bolt binary into one level deeper in bin/, such as bin/install/, we need to go back one more level to find lib directory.

(cherry picked from FBD22070974)
2020-06-16 09:59:27 -07:00
Alexander Shaposhnikov 0823882d47 Link functions on MachO
Summary: Add first bits for linking functions on MachO.

(cherry picked from FBD21991721)
2020-06-12 20:16:27 -07:00
Alexander Shaposhnikov 7950e1e5bb Provide a redundant declaration of KernelBaseAddr
Summary: Adjust the code to make it buildable with clang-10.

(cherry picked from FBD22055933)
2020-06-15 16:06:07 -07:00
takh 48b71ad219 Generate heatmap for linux kernel
Summary:
This diff handles several challenges related to heatmap generation for Linux kernel (vmlinux elf file):
- If the input binary elf file contains the section `__ksymtab`, this diff assumes that this is the linux kernel `vmlinux` file and enables an extra flag `LinuxKernelMode`
- In `LinuxKernelMode`, we only support heat map generation right now, therefore it ensures that current BOLT mode is heat map generation. Otherwise, it exits with error.
- For some Linux symbol and section combinations, BOLT may not be able to find section for symbol (specially symbols that specifies the end of some section). For such cases, we show an warning message without exiting which was the previous behavior.
- Linux kernel elf file does not contain dynamic section, therefore, we don't exit when no dynamic section is found for linux kernel binary.
- Current `ParseMMap` logic does not work with linux kernel. MMap entries for linux kernel uses `PERF_RECORD_MMAP` format instead of typical `PERF_RECORD_MMAP2` format. Since linux kernel address mapping is absolute (same as specified in the ELF file), we avoid calling `ParseMMap` in linux kernel mode.
- Linux kernel entries are registered with PID -1, therefore `BinaryMMapInfo` lookup is not required for linux kernel entries. Similarly, `adjustLBR` is also not required.
- Default max address in linux kernel mode is highest unsigned 64-bit integer instead of current 4GBs.
- Added another new parameter for heatmap, `MinAddress`, in case of Linux kernel mode which is `KernelBaseAddress`, otherwise, it is 0. While registering Heatmap sample counts from LBR entries, any address lower than this `MinAddress` is ignored.
- `IgnoreInterruptLBR` is disabled in linux kernel mode to ensure that kernel entries are processed

Currently, linux kernel heat map also include heat map for Linux kernel modules that are not part of vmlinux elf file. This is intentional to identify other potential optimization opportunities. If reviewers think, those modules should be omitted, I will disable those modules based on highest end address of a vmlinux elf section.

(cherry picked from FBD21992765)
2020-06-10 23:00:39 -07:00
Maksim Panchenko 2d524fd5e2 [BOLT] Update section index for symbols from unemitted functions
Summary:
Under some conditions, e.g. while running in lite mode or when a
function is non-simple, BOLT may decide not to emit function code and
hence there's no need to update the symbol. However, since we change
section table, the corresponding section index may need an update.

Also, update section index for ICF symbols.

(cherry picked from FBD21970017)
2020-06-09 19:12:06 -07:00
Xun Li 9bd7161529 Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.

Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.

(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
Xun Li 00892a5fd0 Refactor runtime library
Summary:
As we are adding more types of runtime libraries, it would be better to move the runtime library out of RewriteInstance so that it could grow separately. This also requires splitting the current implementation of Instrumentation.cpp to two separate pieces, one as normal Pass, one as the runtime library. The Instrumentation Pass would pass over the generated data to the runtime library, which will use to emit binary and perform linking.

This patch does the following:
1. Turn Instrumentation class into an optimization pass. Register the pass in the pass manager instead of in RewriteInstance.
2. Split all the data that are generated by Instrumentation that's needed by runtime library into a separate data structure called InstrumentationSummary. At the creation of Instrumentation pass, we create an instance of such data structure, which will be moved over to the runtime at the end of the pass.
3. Added a runtime library member to BinaryContext. Set the member at the end of Instrumentation pass.
4. In BinaryEmitter, make BinaryContext to also emit runtime library binary.
5. Created a base class RuntimeLibrary, that defines the interface of a runtime library, along with a few common helper functions.
6. Created InstrumentationRuntimeLibrary which inherits from RuntimeLibrary, that does all the work (mostly copied over) for emit and linking.
7. Added a new directory called RuntimeLibs, and put all the runtime library related files into it.

(cherry picked from FBD21694762)
2020-05-21 14:28:47 -07:00
Alexander Shaposhnikov cd067ae1e8 Emit functions on MachO
Summary: Start emitting  functions (for MachO input binaries).

(cherry picked from FBD21721586)
2020-05-26 04:21:04 -07:00
Xun Li 2b65b3aa6b Use shuffle instead of random_shuffle
Summary: random_shuffle is deprecated in C++14.

(cherry picked from FBD21698180)
2020-05-21 16:46:27 -07:00
Xun Li 8a680745dd Remove const call to take_front
Summary:
take_front() is a const member of StringRef. Calling it does nothing.
This suggests that this line of code is useless, deleting it.
But it's good to double check, what was the original intention here?

(cherry picked from FBD21697637)
2020-05-21 16:25:05 -07:00
Maksim Panchenko 8729171182 [BOLT] Refactor profile-handling code
Summary:
This diff handles several issues related to profile reading and
handling:
  * Unifies interface used by 3 profile readers in ProfileReaderBase.
  * Adds automatic detection of the profile file contents.
  * Removes reader-specific fields from BinaryFunction and BinaryData.
    All the information is stored in instruction annotations.
  * Removes implicit memory dependencies in annotations on profile
    reader instance.
  * Adds lite mode support to YAML reader.
  * Moves profile reading code out of BinaryFunction.

(cherry picked from FBD21601411)
2020-05-07 23:00:29 -07:00
Maksim Panchenko cce49b9522 [BOLT] Remove StringRef from IndirectCallProfile
Summary:
IndirectCallProfile was holding to a StringRef from a profile reader
providing an implicit dependency on the reader.

(cherry picked from FBD21587101)
2020-05-14 17:34:20 -07:00
Rafael Auler f91d121eee [BOLT] Add option to tag version
Summary:
Add a dummy option in BOLT to allow us to write any string in
the bolt info section. This is accomplished since we record the complete
argv vector. This string used to tag this binary with any ID that can
later be associated with a specific BOLT invocation.

(cherry picked from FBD21441902)
2020-05-06 17:31:25 -07:00
Maksim Panchenko 689447bf10 [BOLT] Change .debug_line emission for non-simple functions
Summary:
We use a special routine to emit line info for functions that we do not
overwrite. The resulting DWARF was not quite efficient as we were
advancing addresses using a larger than needed opcodes. Since there were
only a few functions that we didn't emit/overwrite, it was not a big
issue.

However, in lite mode the majority of functions are not overwritten and
as a result, the inefficiency in debug line encoding got exposed and
binaries were getting larger than expected .debug_line sections.

Fix it by using more conventional line table opcodes for address
advancing.

(cherry picked from FBD21423074)
2020-05-05 23:56:50 -07:00
Maksim Panchenko 96c4168ddc [BOLT] Ignore kernel interrupts by default
(cherry picked from FBD21431563)
2020-05-06 11:52:16 -07:00
Xun Li 7b61bdf8ea Check runtime lib format within archiver
Summary: We only support linking ELF runtime library right now. If the library is an archiver, we check that each individual library inside the archiver is an ELF library.

(cherry picked from FBD21388672)
2020-05-04 13:57:21 -07:00
Maksim Panchenko 924d0bdb08 [BOLT] Introduce lite processing mode without relocations
Summary:
When optimizing a binary without relocations, we can skip processing
functions without profile (cold functions). By skipping processing of
cold functions, we reduce the processing time and memory. We call
such mode a lite mode, and it is enabled by default.

Some processing is still done for functions without profile even in lite
mode. scanExternalRefs() function is used to detect secondary entry
points to functions that are not marked in the symbol table.

Note that the no-relocation requirement is a temporary limitation
of the lite mode.

(cherry picked from FBD21366567)
2020-05-03 15:49:58 -07:00