Summary:
When placing restore instructions in the shrink wrapping pass,
we typically put them right before the last instruction of a block at
the dominance frontier. If this instruction happened to have a prefix,
because the MC lib separates prefix into separate MCInsts, we would
accidentally put a load between a prefix and another instruction. Fix
this.
(cherry picked from FBD24295324)
Summary:
In analyzeRelocations, we extract the result of the relocation
from binary code to recreate the target of it in a few special cases.
For R_X86_64_32S relocations, however, we were neglecting the
possibility of the encoded value in the instruction to be negative.
(cherry picked from FBD24096347)
Summary:
Fix issue with splitting critical edges originating at
the same BB in ShrinkWrapping::splitFrontierCritEdges.
Splitting of critical edges originating at the same FromBB
wasn't handled correctly as the Frontier at index corresponding
to FromBB was overwritten with basic blocks created for
multiple DestinationBBs.
(cherry picked from FBD23232398)
Summary:
Right now, the SAVE_ALL sequence executed upon entry of both
of our runtime libs (hugify and instrumentation) will cause the stack to
not be aligned at a 16B boundary because it saves 15 8-byte regs. Change
the code sequence to adjust for that. The compiler may generate code
that assumes the stack is aligned by using movaps instructions, which
will crash.
(cherry picked from FBD22744307)
Summary:
If no profile data is provided, but only a user-provided order
file for functions, fix the placement of the __hot_end symbol.
(cherry picked from FBD22713265)
Summary:
Do not fail/assert when trying to reorder blocks that terminate
with JRCXZ/JECXZ/LOOP instructions. We cannot invert the condition of
these instructions, so just treat them accordingly in fixBranches().
(cherry picked from FBD22487107)
Summary:
Re-add tests removed because they used to depend on yaml2obj.
Rewrite them with an assembler (llvm-mc) and use the system linker to
produce a valid ELF as input to BOLT.
(cherry picked from FBD22323449)
Summary:
Some functions could be called at an address inside their function body.
Typically, these functions are written in assembly as C/C++ does not
have a multi-entry function concept. The addresses inside a function
body that could be referenced from outside are called secondary entry
points.
In BOLT we support processing functions with secondary/multiple entry
points. We used to mark basic blocks representing those entry points
with a special flag. There was only one problem - each basic block has
exactly one MCSymbol associated with it, and for the most efficient
processing we prefer that symbol to be local/temporary. However, in
certain scenarios, e.g. when running in non-relocation mode, we need
the entry symbol to be global/non-temporary.
We could create global symbols for secondary points ahead of time when
the entry point is marked in the symbol table. But not all such entries
are properly marked. This means that potentially we could discover an
entry point only after disassembling the code that references it, and
it could happen after a local label was already created at the same
location together with all its references. Replacing the local symbol
and updating the references turned out to be an error-prone process.
This diff takes a different approach. All basic blocks are created with
permanently local symbols. Whenever there's a need to add a secondary
entry point, we create an extra global symbol or use an existing one
at that location. Containing BinaryFunction maps a local symbol of a
basic block to the global symbol representing a secondary entry point.
This way we can tell if the basic block is a secondary entry point,
and we emit both symbols for all secondary entry points. Since secondary
entry points are quite rare, the overhead of this approach is minimal.
Note that the same location could be referenced via local symbol from
inside a function and via global entry point symbol from outside.
This is true for both primary and secondary entry points.
(cherry picked from FBD21150193)
Summary:
Indirect calls that use RSP to compute the target address would
break in instrumentation mode because we were adding instructions that
changed the stack pointer. Fix this.
(cherry picked from FBD20883791)
Summary:
Shrink wrapping has a mode where it will directly move push
pop pairs, instead of replacing them with stores/loads. This is an
ambitious mode that is triggered sometimes, but whenever matching with
a push, it would operate with the assumption that the restoring
instruction was a pop, not a load, otherwise it would assert. Fix this
assertion to bail nicely back to non-pushpop mode (use regular store and
load instructions).
(cherry picked from FBD20085905)
Summary:
I noticed when setting up a new repository for bolt that bolt tests
would fail unexpectedly when running `ninja check-bolt` and
`ninja check-llvm`. This turns out to be because dependencies for bolt
binaries were not specified in the CMake configuration so they were not
built before running the tests. This diff adds the dependencies to the
CMake configuration for check-bolt and check-llvm so that bolt binaries
are built before running tests.
(cherry picked from FBD17919505)
Summary:
The regular perf2bolt aggregation job is to read perf output directly.
However, if the data is coming from a database instead of perf, one
could write a query to produce a pre-aggregated file. This function
deals with this case.
The pre-aggregated file contains aggregated LBR data, but without binary
knowledge. BOLT will parse it and, using information from the
disassembled binary, augment it with fall-through edge frequency
information. After this step is finished, this data can be either
written to disk to be consumed by BOLT later, or can be used by BOLT
immediately if kept in memory.
File format syntax:
{B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count>
[<mispred_count>]
B - indicates an aggregated branch
F - an aggregated fall-through (trace)
f - an aggregated fall-through with external origin - used to disambiguate
between a return hitting a basic block head and a regular internal
jump to the block
<start_id> - build id of the object containing the start address. We can
skip it for the main binary and use "X" for an unknown object. This will
save some space and facilitate human parsing.
<start_offset> - hex offset from the object base load address (0 for the
main executable unless it's PIE) to the start address.
<end_id>, <end_offset> - same for the end address.
<count> - total aggregated count of the branch or a fall-through.
<mispred_count> - the number of times the branch was mispredicted.
Omitted for fall-throughs.
Example
F 41be50 41be50 3
F 41be90 41be90 4
f 41be90 41be90 7
B 4b1942 39b57f0 3 0
B 4b196f 4b19e0 2 0
(cherry picked from FBD8887182)
Summary:
When a given function B, located after function A, references
one of A's basic blocks, it registers a new global symbol at the
reference address and update A's Labels vector via
BinaryFunction::addEntryPoint(). However, we don't update A's branch
targets at this point. So we end up with an inconsistent CFG, where the
basic block names are global symbols, but the internal branch operands
are still referencing the old local name of the corresponding blocks
that got promoted to an entry point. This patch fix this by detecting
this situation in addEntryPoint and iterating over all instructions,
looking for references to the old symbol and replacing them to use the
new global symbol (since this is now an entry point).
Fixesfacebookincubator/BOLT#26
(cherry picked from FBD8728407)
Summary:
While removing unreachable blocks, we may decide to remove a
block that is listed as a target in a jump table entry. If we do that,
this label will be then undefined and LLVM assembler will crash.
Mitigate this for now by not removing such blocks, as we don't support
removing unnecessary jump tables yet.
Fixesfacebookincubator/BOLT#20
(cherry picked from FBD8730269)
Summary:
Create folders and setup to make LIT run BOLT-only tests. Add
a test example. This will add a new make/ninja rule "check-bolt" that
the user can invoke to run LIT on this folder.
(cherry picked from FBD8595786)