Commit Graph

997 Commits

Author SHA1 Message Date
Maksim Panchenko 23edb3ed9c [BOLT] Option to control .text alignment
Summary:
Add option `-align-text=<n>` to control .text alignment within a
segment. Set to page size by default.

(cherry picked from FBD21120063)
2020-04-19 15:02:50 -07:00
Maksim Panchenko 10245b5c5b [BOLT] Emit ICF symbols for large functions
Summary:
In non-relocation mode, make sure we emit extra symbols for a folded
function even if the function was not overwritten due to its large
size.

(cherry picked from FBD21080467)
2020-04-16 00:05:01 -07:00
Maksim Panchenko 606532bdf1 [BOLT] Fix .eh_frame update with ICF in non-relocation mode
Summary:
In a rare case, we may fold a function and fail to emit it in
non-relocation mode due to a function size increase. At the same time,
the function that the original function was folded into could have been
successfully emitted, e.g. because it was split in the presence of a
profile information.

Later, because the function was not emitted, we have to use its original
.eh_frame entry in the preserved .eh_frame section. However, that entry
is no longer referencing the original function, but the function that
the original was folded into. This happens since the original symbol gets
emitted at the other function location. As a result, .eh_frame entry for
the folded function is missing.

To prevent incorrect update of the original .eh_frame, create
relocations against absolute values. This guarantees preservation of the
section contents while updating pc-relative references.

(cherry picked from FBD21061130)
2020-04-16 00:02:35 -07:00
Maksim Panchenko 1be7a82540 [BOLT] Speedup RTDyld external symbol resolution
Summary:
RuntimeDyldImpl::resolveExternalSymbols() some time ago used to call
getSymbolAddress() while in the second loop. That call could have
modified the contents of ExternalSymbolRelocations that the loop was
iterating over. Thus the code was written in a way that erased the
processed entry on every loop iteration and reset the map
iterator. With large number of entries in ExternalSymbolRelocations the
loop code becomes a performance bottleneck.

Since getSymbolAddress() is no longer used, the
ExternalSymbolRelocations could be iterated in a straightforward way and
the map cleared before the function exit.

(cherry picked from FBD21057058)
2019-11-11 13:29:46 -08:00
Rafael Auler 6dbd15bc01 [BOLT-X86] Fix instrumentation issue with indirect calls
Summary:
Indirect calls that use RSP to compute the target address would
break in instrumentation mode because we were adding instructions that
changed the stack pointer. Fix this.

(cherry picked from FBD20883791)
2020-04-06 17:38:11 -07:00
Maksim Panchenko 401fa5b493 [BOLT] Further speedup ICF
Summary:
Further speedup ICF by applying stricter rules for congruent functions.
While checking symbolic operands in congruent functions, consider
operands congruent only if they are equal or reference functions
with identical hashes, i.e. potentially foldable functions.
Note that jump table operands are handled as a special case.

(cherry picked from FBD20912054)
2020-04-07 22:10:12 -07:00
Maksim Panchenko ee0371ad97 [BOLT] Speedup ICF by better function hashing
Summary:
Too many hash collisions may cause ICF to run slowly.

We used to hash BinaryFunction only looking at instruction opcodes,
ignoring instruction operands. With many almost identical functions,
such approach may lead to long ICF processing time. By including
operands into the hash, we reduce the number of collisions and
improve the runtime often by a factor of 2 or more.

(cherry picked from FBD20888957)
2020-04-07 00:21:37 -07:00
Maksim Panchenko abda7dc6a7 [BOLT] Fix ICF non-determinism in non-relocation mode
Summary:
ICF may fold functions in arbitrary order when running multi-threaded.
This is fine in relocation mode as we end up with just one function
holding all function symbols.

However, in non-relocation mode we keep all function bodies, and if we
keep merging profiles in non-deterministic order, we end up with
functions with non deterministic profiles. The fix for non-relocation
mode is to not merge profiles as the factual new profile could be
different from the merged one since both function instances are
potentially callable.

Additionally, emit extra symbols for ICF functions in non-relocation
mode to make it possible to track the folding.

(cherry picked from FBD20889866)
2020-04-04 20:12:38 -07:00
Maksim Panchenko b08d82d91b [BOLT] Verify exceptions action table equivalence in ICF
Summary:
Some functions may have exactly the same code and exception handlers.
However, their action tables could be different leading to mismatching
semantics. We should verify their equivalence while running ICF.

(cherry picked from FBD20889035)
2020-03-30 19:08:24 -07:00
Maksim Panchenko 58b0d9e7b0 [BOLT][DWARF] Add support for base address in DWARF location lists
Summary:
The version of LLVM that we are based on lacks the support for base
address in DWARF location lists. Add the missing pieces.

(cherry picked from FBD20640784)
2020-03-24 22:05:37 -07:00
Maksim Panchenko bbbf679b42 [BOLT] Refactor ELF symbol table rewriting code
Summary:
Make ELF symbol table rewriting code more structured. While at it,
remove symbols from non-allocatable sections.

(cherry picked from FBD20243386)
2020-02-26 20:43:18 -08:00
Maksim Panchenko a07f1a26e7 [BOLT] Refactor section prefixes
(cherry picked from FBD20400886)
2020-03-11 15:51:32 -07:00
Maksim Panchenko 1f3e351a9c [BOLT] Refactor code and data emission code
Summary:
Consolidate code and data emission code in ELF-independent
BinaryEmitter. The high-level interface includes only two
functions emitBinaryContext() and emitFunctionBody() used
by RewriteInstance and BinaryContext respectively.

(cherry picked from FBD20332901)
2020-03-06 15:06:37 -08:00
Maksim Panchenko 74a2777c54 [BOLT] Refactor ELF parts of instrumentation code
Summary:
This is a prerequisite for larger emitter refactoring.

Since .dynamic is read unconditionally, add an error message if the
section is missing, or the size of the section is zero.

(cherry picked from FBD20331735)
2020-03-08 19:04:39 -07:00
Maksim Panchenko af553124d3 [BOLT] Refactor emission of original .eh_frame
Summary:
There is no need to treat the emission of the original `.eh_frame`
section as a special case.

(cherry picked from FBD20323360)
2020-03-07 11:19:09 -08:00
Alexander Shaposhnikov e3654fc274 [BOLT] Uniquify names of local symbols
Summary:
1. Uniquify names of local symbols.
2. Handle aliases.

(cherry picked from FBD20270196)
2020-03-04 18:36:44 -08:00
Alexander Shaposhnikov 842a25f785 [BOLT] Mark functions containing data as non-simple
Summary: Temporarily mark functions containing data as non-simple.

(cherry picked from FBD20213279)
2020-03-02 22:41:12 -08:00
Maksim Panchenko cb9c991dcb [BOLT] Remove allow-section-relocations option
Summary: The option is not used. Remove all related code.

(cherry picked from FBD20237859)
2020-03-03 15:51:24 -08:00
Maksim Panchenko c7e012e145 [BOLT][NFC] Get rid of BestFit parameter
Summary: The parameter is no longer used.

(cherry picked from FBD20236516)
2020-03-03 14:28:42 -08:00
Alexander Shaposhnikov b0cbb60165 [BOLT] Fix begin decrementing
Summary: Fix begin decrementing.

(cherry picked from FBD20232474)
2020-03-03 13:36:32 -08:00
Maksim Panchenko d89bb53afa [BOLT][NFC] Factor out relocation processing
(cherry picked from FBD20087297)
2020-02-24 17:10:02 -08:00
Rafael Auler 340da8f294 [BOLT] Fix shrink wrapping to check pops
Summary:
Shrink wrapping has a mode where it will directly move push
pop pairs, instead of replacing them with stores/loads. This is an
ambitious mode that is triggered sometimes, but whenever matching with
a push, it would operate with the assumption that the restoring
instruction was a pop, not a load, otherwise it would assert. Fix this
assertion to bail nicely back to non-pushpop mode (use regular store and
load instructions).

(cherry picked from FBD20085905)
2020-02-18 16:00:40 -08:00
Maksim Panchenko 2df4e7b99e [BOLT][NFC] Minor refactoring of RewriteInstance
(cherry picked from FBD20087424)
2020-02-24 17:12:41 -08:00
Maksim Panchenko 495761dc70 [BOLT][NFC] Remove unused BinarySection member functions
(cherry picked from FBD20087243)
2020-02-24 16:56:45 -08:00
Maksim Panchenko 3b45212e84 [BOLT] Delete ExecutableFileMemoryManager::registerNoteSection()
Summary: The interface is no longer in use.

(cherry picked from FBD20070558)
2020-02-24 09:40:32 -08:00
Alexander Shaposhnikov 01b7c90242 [BOLT] Add missing override
Summary: Add missing override in X86MCPlusBuilder.cpp.

(cherry picked from FBD20064222)
2020-02-23 22:27:28 -08:00
Maksim Panchenko be43f89c4f [BOLT][llvm] Update llvm.patch
Summary:

(cherry picked from FBD20063562)
2020-02-23 19:51:33 -08:00
Alexander Shaposhnikov 76aa1c26aa [BOLT] Enable reversing the order of basic blocks
Summary: Enable reversing the order of basic blocks.

(cherry picked from FBD19943692)
2020-02-17 13:35:09 -08:00
Alexander Shaposhnikov 4ad5048393 [BOLT] Add first bits to build CFG
Summary: Add first bits to build CFG.

(cherry picked from FBD19943472)
2020-02-17 12:18:42 -08:00
Alexander Shaposhnikov 5b64bf2128 [BOLT] Disassemble functions from a MachO binary
Summary: Add first bits to disassemble functions from a MachO binary.

(cherry picked from FBD19900493)
2020-02-11 14:30:33 -08:00
Rafael Auler a9d85413ac [BOLT] Emit long nops by default
Summary:
Change our X86 target to use long nops by default. In general,
BOLT does not put nops into the instruction stream that is going to be
executed, since it doesn't align basic blocks, only functions. Since we
rebased BOLT, our relationship with MCAssembler changed because it
stopped using multibyte nops and we never needed to revisit that. But it
makes a difference if we want to mitigate perf issues with the Intel
JCC erratum, since the nops inserted are going to be decoded and
executed. To make MCAssembler emit long nops again, we need to explictly
set mattr (Features) of the X86 target.

(cherry picked from FBD19987277)
2020-02-19 16:13:58 -08:00
Maksim Panchenko 9711286858 [BOLT] Get rid of BinarySection::IsLocal
Summary: The flag is no longer used/needed.

(cherry picked from FBD19951571)
2020-02-18 09:20:17 -08:00
Alexander Shaposhnikov 16630f5c58 [BOLT] Factor out NameResolver from RewriteInstance
Summary: Factor out the helper class NameResolver from the class RewriteInstance.

(cherry picked from FBD19943916)
2020-02-17 14:37:46 -08:00
Alexander Shaposhnikov 754b6569f6 [BOLT] Add missing std::move
Summary: Add missing std::move in the method BinaryFunction::addAlternativeName

(cherry picked from FBD19944661)
2020-02-17 17:53:12 -08:00
Alexander Shaposhnikov 36cf37c4c1 [BOLT] Add initial bits for parsing MachO files
Summary: Start adding initial bits for MachO, this diff contains some small preparations for finding functions inside a MachO binary, this will be done in the next diff. The concept of a section in the MachO world is quite different from ELF, nevertheless, for functions for now it more or less fits into the current picture (in BOLT), but things will diverge more significantly a bit later.

(cherry picked from FBD19648161)
2020-01-30 13:10:48 -08:00
Rafael Auler 58a129a602 [BOLT] Move peepholes pass after sctc
Summary:
There are two peephole subpasses, remove-double-jumps and
remove-useless-conditional-branches, that operates by reading branches
directly, which makes them tricky to run before fix-branches. In the
case of remove-double-jumps, it will even lead to suboptimal code if
the patched branch was going to be removed by fix-branches when the
target is the fall-through. If the final target is a tail call, it will
lead to a broken CFG in the worst case. Fix this by moving these passes
after SCTC, which already produces CFGs with conditional tail calls.

(cherry picked from FBD18795592)
2019-12-03 12:28:22 -08:00
Rafael Auler c82e7fd1cc [BOLT] Decoder cache friendly alignment wrt Intel JCC Erratum
Summary:
This diff ports reviews.llvm.org/D70157 to our LLVM tree, which
makes the integrated assembler able to align X86 control-flow changing
instructions in a way to reduce the performance impact of the ucode
update on Intel processors that implement the JCC erratum mitigation.

See white paper "Mitigations for Jump Conditional Code Erratum" by Intel
published November 2019.

To port this patch, I changed classifySecondInstInMacroFusion to analyze
instruction opcodes directly instead of analyzing the CondCond operand
(in more recent versions of LLVM, all conditional branches share the
same opcode, but with a different conditional operand). I also pulled to
our tree Alignment.h as a dependency, and the macroop analyzing helpers.

x86-align-branch-boundary and -x86-align-branch are the two flags that
control nop insertion to avoid disabling the decoder cache, following
the original patch. In BOLT, I added the flag
x86-align-branch-boundary-hot-only to request the alignment to only be
applied to hot code, which is turned on by default. The reason is
because such alignment  is expensive to perform on large modules, but if
we limit it to hot code, the relaxation pass runtime becomes tolerable.

(cherry picked from FBD19828850)
2020-02-10 18:50:53 -08:00
Alexander Shaposhnikov d5b8fc8fbe [BOLT] Make the methods isText/isData more robust
Summary: Make the methods isText/isData work for MachO.

(cherry picked from FBD19849460)
2020-02-11 17:54:48 -08:00
Alexander Shaposhnikov c3c4b15a2e [BOLT] Remove BinaryContext::getFunctionData
Summary:
In this diff we refactor the code around getting the original binary encoding of function's body.
The main changes are: remove BinaryContext::getFunctionData, remove the parameter of the method BinaryFunction::disassemble, introduce BinaryFunction::getData.

(cherry picked from FBD19824368)
2020-02-10 15:35:11 -08:00
Maksim Panchenko 41de03b8e9 [BOLT] Fix section names under `-generate-link-sections`
Summary:
Use proper function while printing modified function name to file.

(cherry picked from FBD19791847)
2020-02-07 09:39:38 -08:00
Rafael Auler 0080d74506 [BOLT] Fix issue with strict and builtin_unreachable
Summary:
In strict mode, a jump table with targets generated by
builtin_unreachable (located at the very end of the function) was
asserting when being recreated by postProcessIndirectBranches. Fix
this.

(cherry picked from FBD19614981)
2020-01-28 18:38:10 -08:00
Maksim Panchenko d57513e4ab [BOLT] Fix symbol table issue with ICF
Summary: Not all symbol table entries were updated after ICF.

(cherry picked from FBD19319685)
2020-01-08 13:32:59 -08:00
Maksim Panchenko ac697b7d3a [BOLT] Replace list of Names with Symbols for BinaryFunction
Summary:
BinaryFunction used to have a list of Names associated with its main
entry point. However, the function is primarily identified by its
corresponding symbol or symbols, and these symbols are available as we
are creating them for a corresponding BinaryData object.

There's also no reason to emit symbols for alternative function names
(aliases), so change the code to only emit needed symbols.

When we emit a cold fragment for a function, only emit one cold symbol
for the fragment instead of one per every main entry symbol/name.

When we match a symbol to an entry point in the function, with this
change we can first go through the list of main entry symbols (now that
they are available).

(cherry picked from FBD19426709)
2020-01-13 11:56:59 -08:00
Alexander Shaposhnikov 7a59783d7a [BOLT] Move createBinaryContext to BinaryContext
Summary:
1. Move createBinaryContext to BinaryContext.
1. Add support for nonlinux triples in createBinaryContext.
2. Remove unnecessary std::move in DWARFRewriter.cpp.

(cherry picked from FBD19421314)
2020-01-15 15:23:45 -08:00
Rafael Auler 961d3d02d8 [BOLT] Move postProcessEntryPoints after disassembly
Summary:
Call postProcessEntryPoints only after all functions have been
disassembled and all interprocedural references have been processed,
when all possible entry points have been accounted for. This makes our
detection of bad entries more robust as it does not depend on the order
of the functions any more.

(cherry picked from FBD19404767)
2020-01-14 17:12:03 -08:00
Maksim Panchenko 0283271f29 [BOLT] Do no report error on mismatched instruction encoding
Summary:
When the validation of instruction encoding fails but we are able to
continue processing the binary, do no report an error. Report encoding
format only under `-v=1`.

(cherry picked from FBD19376531)
2020-01-13 11:24:10 -08:00
Maksim Panchenko 45b27d7b44 [BOLT] Get rid of Names in BinaryData
Summary:
For BinaryData, we used to maintain a vector of StringRef names and also
a vector of pointers to MCSymbol's associated with the data. There was
an unnecessary duplication of information and an associated overhead of
keeping it in sync. Fix it by removing Names and using Symbols wherever
Names were used.

Also merge two variants of registerNameAtAddress() and remove
unreachable/dead code in the process.

(cherry picked from FBD19359123)
2020-01-10 16:17:47 -08:00
Maksim Panchenko 088e3c032a [BOLT] Improve handling of secondary function entry points
Summary:
"Fix symbol table entries for secondary entries" diff broke the inliner.

Fix the breakage and make the discovery of secondary entry points more
accurate.

Add ability to BinaryContext::getFunctionForSymbol() to return an entry
point discriminator and use it instead of calling getEntryForSymbol()
and isSecondaryEntry(). This is the preferred way since
getFunctionForSymbol() is thread-safe.

(cherry picked from FBD19295983)
2020-01-06 14:57:15 -08:00
Alexander Shaposhnikov 8c7f524afb [BOLT] Fix build of the runtime on OSX
Summary: Fix the compilation error on OSX

(cherry picked from FBD19269806)
2020-01-02 16:20:13 -08:00
Rafael Auler de284bc510 [BOLT] Fix symbol table entries for secondary entries
Summary:
Commit "Support full instrumentation" changed the map
SymbolToFunction in BinaryContext to map secondary entries of functions
too. This introduced unexpected behavior in our symbol table rewriting
logic, which caused it to mistakenly write them with the address of the
original function. Fix the behavior of getBinaryFunctionAtAddress to
correct this. Also fix other users of SymbolToFunction to ensure they
are not accidentally using secondary entries when they shouldn't.

(cherry picked from FBD19168319)
2019-12-18 12:14:42 -08:00