Summary:
Add first bits to support emitting instrumented code on MachO.
This diff enables us to instrument branches / emit counters.
(cherry picked from FBD24255164)
Summary:
On targets that support it, emit size of the emitted function symbol.
At the moment there's no use for the size except that it is visible in a
temporary .o file symbol table.
(cherry picked from FBD24246177)
Summary: _end is "defined" but its address doesn't belong to any section. This diff adds special handling for this symbol.
(cherry picked from FBD24249120)
Summary:
Append ".cold.0" suffix to the original part of the name, such that
"foo/1" becomes "foo.cold.0/1" instead of "foo/1.cold.0".
(cherry picked from FBD24246112)
Summary:
At the moment we are not using PatchEntries pass in non-relocation mode
on ELF. However, we will use it on MachO.
(cherry picked from FBD24235271)
Summary: Add ToolPath field to MachORewriteInstance. This will enable us to locate the runtime library relative to the tool's location.
(cherry picked from FBD24183448)
Summary:
Do not mix relocation codes from different archs. Even though
they do not intersect at the moment, this could easily introduce bugs
once new relocations are supported (for example, ILP32 for AArch64).
(cherry picked from FBD24169425)
Summary:
This diff adds a command line option to disable the check of overlapping elements in Mach-O parsing. This check in its current form is prohibitively expensive for large binaries.
A long-term fix would be to reimplement the check in a more efficient manner (and contribute it to the upstream).
(cherry picked from FBD24109468)
Summary:
In analyzeRelocations, we extract the result of the relocation
from binary code to recreate the target of it in a few special cases.
For R_X86_64_32S relocations, however, we were neglecting the
possibility of the encoded value in the instruction to be negative.
(cherry picked from FBD24096347)
Summary:
This patch fixes the assertion failure during instrumentation.
The assertion is raised by `getInstructionAtOffset` , which expects `CurrentState` to be either `Disassembled` or `CFG`.
The function is called from `postProcessEntryPoints`, which goes over Labels and performs a series of checks. The checks call BinaryFunction methods `setSimple(false)` or `setIgnored()`.
However, if `setIgnored` is invoked, it resets the state to `Empty`. Thus subsequent call to `getInstructionAtOffset` will fail.
(cherry picked from FBD24005197)
Summary:
Enable initial support for reading and patching special Linux kernel sections.
Author: Tanvir Ahmed Khan <takh@fb.com>
GitHub Author: takhandipu
(cherry picked from FBD22998869)
Summary:
Whenever we search for a function based on its address in the input
binary, we now always return a corresponding fragment for split
functions. If the user needs an access to the main fragment, they can
call getTopmostFragment().
(cherry picked from FBD23670311)
Summary:
Sections that do not originate from the input binary will have an
input address set to zero and thus do not have to be mapped.
Mapping such sections caused a build time regression in non-relocation
mode.
(cherry picked from FBD23670334)
Summary:
Fix issue with splitting critical edges originating at
the same BB in ShrinkWrapping::splitFrontierCritEdges.
Splitting of critical edges originating at the same FromBB
wasn't handled correctly as the Frontier at index corresponding
to FromBB was overwritten with basic blocks created for
multiple DestinationBBs.
(cherry picked from FBD23232398)
Summary:
Right now, if activity is recorded in cold parts, we write to
the .fdata file the ".cold" name instead of the correct name of the
function. Fix this.
(cherry picked from FBD23148705)
Summary:
When the input file is processed by BOLT, we cannot save profile in YAML
format as it requires CFG representation of functions.
(cherry picked from FBD22941794)
Summary:
Add first bits to support emitting more than 255 sections.
Update llvm.patch to include the changes in MachOObjectFile.cpp.
(cherry picked from FBD22655053)
Summary:
Some LBR traces contain less than 16/32 entries. When the first trace is
less than the standard length, we fail to enable SKL bug workaround with
BAT mode enabled. The result is a bad profile from perf2bolt. The fix
is to check the length of every trace, not just the first one.
Issue facebookincubator/BOLT#94
(cherry picked from FBD22917971)
Summary:
Added execution count threshold option (execution-count-threshold) controlling the optimizations that are sensitive to the accuracy of the profiling data:
- BB reordering
- function splitting
- frame opts
- shrink wrapping
- indirect call promotion
(cherry picked from FBD22682171)
Summary:
Right now, the SAVE_ALL sequence executed upon entry of both
of our runtime libs (hugify and instrumentation) will cause the stack to
not be aligned at a 16B boundary because it saves 15 8-byte regs. Change
the code sequence to adjust for that. The compiler may generate code
that assumes the stack is aligned by using movaps instructions, which
will crash.
(cherry picked from FBD22744307)
Summary:
If no profile data is provided, but only a user-provided order
file for functions, fix the placement of the __hot_end symbol.
(cherry picked from FBD22713265)
Summary: Added handling of intra-function calls (internal control transfer by call instruction) to instrumentOneTarget
(cherry picked from FBD22606033)
Summary:
Do not fail/assert when trying to reorder blocks that terminate
with JRCXZ/JECXZ/LOOP instructions. We cannot invert the condition of
these instructions, so just treat them accordingly in fixBranches().
(cherry picked from FBD22487107)
Summary:
We've accidentally registered TBSS section address with a BinaryContext
resulting in addresses being attributed to it when
getSectionForAddress() was called.
(cherry picked from FBD22369323)
Summary:
Reading through the LLVM coding standard again, realized a few places where I didn't follow the standard when coding. Addressing them:
1. prefer static functions over functions in unnamed namespace.
2. #include as little as possible in headers
3. Have vtable anchors.
(cherry picked from FBD22353046)
Summary:
R_X86_64_PLT32 relocations recorded by the linker may point to the PLT
section instead of being resolved to the symbol reported by the
relocation. Sometimes they could point to the symbol too. Disable
internal verification for this type of relocation.
Include a fix for symbol address calculation when it is based on the
extracted value. The truncation to the relocation size is needed if
the results overflows.
(cherry picked from FBD22317952)
Summary:
Re-add tests removed because they used to depend on yaml2obj.
Rewrite them with an assembler (llvm-mc) and use the system linker to
produce a valid ELF as input to BOLT.
(cherry picked from FBD22323449)
Summary:
When adding symbols for patched functions, we may end up emitting
multiple symbols per function if the function has multiple names (e.g.
after identical code folding by the linker).
(cherry picked from FBD22294112)
Summary:
Accept binaries without dynamic section/segment as a valid input.
Modify the check for invalid debug info "executables" that are result of
running "objcopy --only-keep-debug". Instead of checking for an empty
dynamic segment, check that ".text" is mapped into a valid segment.
Move SegmentMapInfo inside BinaryContext.
Fixesfacebookincubator/BOLT#91
Temporarily removing issue*.test tests that use yaml2obj and operate on
fake binaries.
(cherry picked from FBD22271481)
Summary:
While aggregating perf.data events, even in strict mode, there is no
need to process all functions since we are not generating an output
binary. However, it's still important to convert data for as many
functions as possible, even for ones with unknown internal control flow.
(cherry picked from FBD22248390)
Summary:
lld linker may emit static relocations against addresses that also have
dynamic relocations associated with them. When this happens, BOLT fails
to validate the extracted value at the address.
Read dynamic relocations in the binary and ignore static relocations at
addresses that have a duplicate dynamic relocation.
(cherry picked from FBD22192345)
Summary:
getNewValueForSymbol() uses orc::RTDyldObjectLinkingLayer::findSymbol()
to resolve symbol values. The latter will always return JITSymbol,
even if there was no symbol defined. The address for the undefined
symbol will be zero, but some symbols could legally be resolved to zero
too.
We need to distinguish between real zero-valued symbols and symbols that
were not emitted and are not visible by orc::RTDyldObjectLinkingLayer.
If zero address is returned by ORC, check for a binary data with the
same name and use its address for the symbol resolution.
(cherry picked from FBD22175269)