ICP peel for inline mode only makes sense for calls, not jump tables.
Plus, add a check that the Target BinaryFunction is found.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128404
The SplitFunctions pass does not distinguish between various splitting
modes anymore. This change updates the command line interface to
reflect this behavior by deprecating values passed to the
--split-function option.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128558
DWARF 5 added two new attributes DW_AT_call_pc and DW_AT_call_return_pc.
Adding support for them.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D128526
Add functionality to allow splitting code with C++ exceptions in shared
libraries and PIEs. To overcome a limitation in exception ranges format,
for functions with fragments spanning multiple sections, add trampoline
landing pads in the same section as the corresponding throwing range.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D127936
Allow cold fragment to get new address.
Our previous assumption is that a fragment (.cold) is only reached
through the main fragment of same function. In addition, .cold fragment
must be reached through either (a) direct transfer, or (b) split jump
table. For (a), we perform a simple fix-up. For (b), we currently mark
all relevant fragments as non-simple. Therefore, there is no need to
get new address for .cold fragment.
This is not always the case, as function entry can be rarely executed,
and is placed in .text.cold segment. Essentially we cannot tell which
the source-level function entry is based on hot and cold segments,
so we must treat each fragment a function on its own. Therfore, we
remove the assertion that a function entry cannot be cold fragment.
Test Plan:
```
ninja check-bolt
```
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D128111
Resolve a crash related to split functions
Due to split function optimization, a function can be divided to two
fragments, and both fragments can access same jump table. This
violates the assumption that a jump table can only have one parent
function, which causes a crash during instrumentation.
We want to support the case: different functions cannot access same
jump tables, but different fragments of same function can!
As all fragments are from same function, we point JT::Parent to one
specific fragment. Right now it is the first disassembled fragment, but
we can point it to the function's main fragment later.
Functions are disassembled sequentially. Previously, at the end of
processing a function, JT::OffsetEntries is cleared, so other fragment
can no longer reuse JT::OffsetEntries. To extend the support for split
function, we only clear JT::OffsetEntries after all functions are
disassembled.
Let say A.hot and A.cold access JT of three targets {X, Y, Z}, where
X and Y are in A.hot, and Z is in A.cold. Suppose that A.hot is
disassembled first, JT::OffsetEntries = {X',Y',INVALID_OFFSET}. When
A.cold is disassembled, it cannot reuse JT::OffsetEntries above due to
different fragment start. A simple solution:
A.hot = {X',Y',INVALID_OFFSET}
A.cold = {INVALID_OFFSET, INVALID_OFFSET, INVALID_OFFSET}
We update the assertion to allow different fragments of same function
to get the same JumpTable object.
Potential improvements:
A.hot = {X',Y',INVALID_OFFSET}
A.cold = {INVALID_OFFSET, INVALID_OFFSET, Z'}
The main issue is A.hot and A.cold have separate CFGs, thus jump table
targets are still constrained within fragment bounds.
Future improvements:
A.hot = {X, Y, Z}
A.cold = {X, Y, Z}
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D127924
Temporary fix for missing entry offset when creating address
translation tables (BAT) after D127935 landed. Will later work on
assigning a more reasonable offset different than zero.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D128092
BC::printInstruction(s) has many uses of Function ptr if it's available:
# printing CFI instructions (unconditional)
# printing debug line information (-print-debug-info)
# printing instruction relocations (-print-relocations)
Enable these uses by passing Function ptr from the primary printing entry point:
BinaryBasicBlock::dump.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D126916
Supress failed to analyze relocations warning for R_AARCH64_LD_PREL_LO19
relocation. This relocation is mostly used to get value stored in CI and
we don't process it since we are caluclating target address using the
instruction value in evaluateMemOperandTarget().
Differential Revision: https://reviews.llvm.org/D127413
Replace a single dash with a double dash for options that have more
than a single letter.
llvm-bolt-wrapper.py has special treatment for output options such as
"-o" and "-w" causing issues when a single dash is used, e.g. for
"-write-dwp". The wrapper can be fixed as well, but using a double dash
has other advantages as well.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D127538
Mark fragments related to split jump table as non-simple.
A function could be splitted into hot and cold fragments. A split jump table is
challenging for correctly reconstructing control flow graphs, so it was marked
as ignored. This update marks those fragments as non-simple, allowing them
to be printed and partial control flow graph construction.
Test Plan:
```
llvm-lit -a tools/bolt/test/X86/split-func-icf.s
```
This test has two functions (main, main2), each has a jump table target to the
same cold portion main2.cold.1(*2). We try to print out only this cold portion.
If it is ignored, it cannot be printed. If it is non-simple, it can be printed. We
verify that it can be printed.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D127464
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.
Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.
@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.
Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.
To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.
For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D117977
This patch adds getFirstInstructionOffset method for BinaryFunction
which is used to properly handle cases where data is at zero offset in
a function. The main change is that we add basic block at first
instruction offset when disassembling, which prevents assertion
failures in buildCFG.
Reviewed By: yota9, rafauler
Differential Revision: https://reviews.llvm.org/D127111
The linker can convert instructions with GOTPCRELX relocations into a
form that uses an absolute addressing with an immediate. BOLT needs to
recognize such conversions and symbolize the immediates.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126747
I put it into wrong directory. As the result it is failing for aarch64. Moving
the test under X86. Orignial diff D126999.
Differential Revision: https://reviews.llvm.org/D127417
Some of the passes that calculates tentative layout like LongJmp and
Golang are expecting that only functions with valid index will be
located in hot text section. But currently functions with valid profiles
and not set index are breaking this logic, to fix this we can move the
hasValidProfile() condition from AssignSections pass to ReorderFunctions.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D127223
# Stash local changes before checkout.
# Print a message that the source repository revision has been changed, with
instructions to switch back.
# Make the script executable.
# Print sample instructions how to run bolt tests.
# Assume that llvm-bolt-wrapper script is in the same source directory.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126941
Emit warning when using deprecated option '-reorder-blocks=cache+'.
Auto switch to option '-reorder-blocks=ext-tsp'.
Test Plan:
```
ninja check-bolt
```
Added a new test cache+-deprecated.test.
Run and verify that the upstream tests are passed.
Reviewed By: rafauler, Amir, maksfb
Differential Revision: https://reviews.llvm.org/D126722
Clang's doxygen documentation specifies LLVM revision. Do the same for BOLT.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126912
Use color coding to distinguish nodes:
- Entry nodes have bold border
- Scalar (non-loopy) code is milk white
- Outer loops are light yellow
- Innermost loops are light blue
`-print-loops` needs to be enabled to provide BinaryLoopInfo.
Examples:
{F23170673}
{F23170680}
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126248
Reuse the option `-dot-tooltip-code` to put block instructions into the label.
This way, the instructions are displayed by default when used with dot viewer.
When the .dot file is used with dot2html, instructions are hidden by default,
and are shown by clicking on a node.
{F23169510}
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126237
When we generate split dwarf with -fdebug-types-section we will have
.debug_types.dwo sections. These go into TU Index when we run llvm-dwp. BOLT was
not handling DWP input correctly with this section.
Added support for handling DWP with TU Index as an input and output for DWARF4.
Added support for handling DWP with TU Index as an input for DWARF5
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D126087
Replace "cache+" with "ext-tsp" in all BOLT tests
Test Plan:
```
ninja check-bolt
grep -rnw . -e "cache+"
```
no more tests containing "cache+"
"cache+" and "ext-tsp" are aliases
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126714
After D126484, order in .debug-line-str and .debug-line is different. Changed
test accordingly.
Differential Revision: https://reviews.llvm.org/D126733
Summary:
While disassembling instructions, we need to replace certain immediate
operands with symbols. This symbolizing process relies on reading
relocations against instructions. However, some X86 instructions can
have multiple immediate operands and up to two relocations against
them. Thus, correctly matching a relocation to an operand is not
always possible without knowing the operand offset within the
instruction.
Luckily, LLVM provides an interface for passing the required info from
the disassembler via a virtual MCSymbolizer class. Creating a
target-specific version allows a precise matching of relocations to
operands.
This diff adds X86MCSymbolizer class that performs X86-specific
symbolizing (currently limited to non-branch instructions).
Reviewers: yota9, Amir, ayermolo, rafauler, zr33
Differential Revision: https://reviews.llvm.org/D120928
Fix BOLT's constant island mapping when a constant island marked by $d
spans multiple functions. Currently, because BOLT only marks the
constant island in the first function where $d is located, if the next
function contains data at its start, BOLT will miss the data and try
to disassemble it. This patch adds code to explicitly go through all
symbols between $d and $x markers and mark their respective offsets as
data, which stops BOLT from trying to disassemble data. It also adds
MarkerType enum and refactors related functions.
Reviewed By: yota9, rafauler
Differential Revision: https://reviews.llvm.org/D126177
and recursively merge all files under said directory. This is similar
to `llvm-profdata merge`.
Differential Revision: https://reviews.llvm.org/D126695
This reverts commit 3988bd1398.
Did not build on this bot:
https://lab.llvm.org/buildbot#builders/215/builds/6372
/usr/include/c++/9/bits/predefined_ops.h:177:11: error: no match for call to
‘(llvm::less_first) (std::pair<long unsigned int, llvm::bolt::BinaryBasicBlock*>&, const std::pair<long unsigned int, std::nullptr_t>&)’
177 | { return bool(_M_comp(*__it, __val)); }
One could reuse this functor instead of rolling out your own version.
There were a couple other cases where the code was similar, but not
quite the same, such as it might have an assertion in the lambda or other
constructs. Thus, I've not touched any of those, as it might change the
behavior in some way.
As per https://discourse.llvm.org/t/submitting-simple-nfc-patches/62640/3?u=steakhal
Chris Lattner
> LLVM intentionally has a “yes, you can apply common sense judgement to
> things” policy when it comes to code review. If you are doing mechanical
> patches (e.g. adopting less_first) that apply to the entire monorepo,
> then you don’t need everyone in the monorepo to sign off on it. Having
> some +1 validation from someone is useful, but you don’t need everyone
> whose code you touch to weigh in.
Differential Revision: https://reviews.llvm.org/D126068
Fix a bug where shrink-wrapping would use wrong stack offsets
because the stack was being aligned with an AND instruction, hence,
making its true offsets only available during runtime (we can't
statically determine where are the stack elements and we must give up
on this case).
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D126110
The function is only used inside AArch64MCPlusBuilder class, there are no uses
of it as a BinaryFunction method.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126220
Add a new testcase that reproduces a bug when BOLTing current
trunk LLD bootstrapped with trunk clang. This makes it official
that we do not support this transformation but are working on
it. When the support is ready, XFAIL should be removed.
Reviewed By: maksfb, Amir, yota9
Differential Revision: https://reviews.llvm.org/D125843
Addresses the warnings emitted by Apple Clang 13.1.6 (Xcode 13.3.1).
Tip @tschuett issue #55404.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D125733
Split up the BinaryLoop header and move BinaryDominatorTree into its own header,
preparing it for a standalone use.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D125664
When a profile is collected in a BOLTed binary, the generated
profile is tagged with a header string "boltedcollection" in the first
line of the fdata file. Fix merge-fdata to recognize this header
string and preserve it into the output.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D125591
Address warnings in Release build without assertions.
Tip @tschuett for reporting the issue #55404.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D125475
As BOLT support for monolithic and split DWARF5 is added, remove DWARF version
override for BOLT tests.
Reviewed By: ayermolo
Differential Revision: https://reviews.llvm.org/D125366
Add an option to only peel ICP targets that can be subsequently inlined.
Yet there's no guarantee that they will be inlined.
The mode is independent from the heuristic used to choose ICP targets: by exec
count, mispredictions, or memory profile.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124900
This patch adds a new feature to bolt heatmap to print the hotness of each section in terms of the percentage of samples within that section.
Sample output generated for the clang binary:
Section Name, Begin Address, End Address, Percentage Hotness
.text, 0x1a7b9b0, 0x20a2cc0, 1.4709
.init, 0x20a2cc0, 0x20a2ce1, 0.0001
.fini, 0x20a2ce4, 0x20a2cf2, 0.0000
.text.unlikely, 0x20a2d00, 0x431990c, 0.3061
.text.hot, 0x4319910, 0x4bc6927, 97.2197
.text.startup, 0x4bc6930, 0x4c10c89, 0.0058
.plt, 0x4c10c90, 0x4c12010, 0.9974
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124412
Account for cross-compilation build scenarios (X86 to ARM, Linux
to Windows, etc).
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124712
`getInliningInfo` is useful in other passes that need to check inlining
eligibility for some function. Move the declaration and InliningInfo definition
out of Inliner class. Prepare for subsequent use in ICP.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124899
Caching behavior of `getAliases` causes a failure in unit tests where two
MCPlusBuilder objects are created corresponding to AArch64 and X86:
the alias cache is created for AArch64 but then used for X86.
https://lab.llvm.org/staging/#/builders/211/builds/126
The issue only affects unit tests as we only construct one MCPlusBuilder
for ELF binary.
Resolve the issue by moving alias bitvectors to MCPlusBuilder object.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D124942
Rename `opts::IndirectCallPromotion*` to `opts::ICP*`, making option naming
uniform and easier to follow.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124879
Address X86 tests failures on AArch64 builder:
https://lab.llvm.org/staging/#/builders/211/builds/82
Inputs fail to cross-compile due to a missing header:
```
/usr/include/stdio.h:27:10: fatal error: 'bits/libc-header-start.h' file not found
#include <bits/libc-header-start.h>
```
As inputs are linked with `-nostdlib` anyway, don't include stdio.h.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D124863
Address X86 tests failures on AArch64 builder:
https://lab.llvm.org/staging/#/builders/211/builds/82
Inputs fail to cross-compile due to a missing header:
```
/usr/include/stdio.h:27:10: fatal error: 'bits/libc-header-start.h' file not found
#include <bits/libc-header-start.h>
```
As inputs are linked with `-nostdlib` anyway, don't include stdio.h.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D124863
This patch fixes a warning from -Wunused-but-set-variable
MismatchedBranches are counted, but are never reported.
Since evaluateProfileData() should already identify and report
these cases, we can safely remove the unused variable.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124588
We don't actually depend on entire X86/AArch64 components that pull in CodeGen,
SelectionDAG etc., just the Desc part with opcode and other definitions.
Note that it doesn't decouple BOLT from these components - we still pull in X86
and AArch64 from top-level llvm-bolt dependencies as we use assembler and
disassembler. It's difficult to reduce these as this requires non-trivial
changes to X86/AArch64 components themselves (e.g. moving out AsmPrinter).
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124206
The relocation value is calculated using the formula S + A - P,
the verification of the value is performed by inversely calculating the location address
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D124270
Added implementation to support DWARF5 in monolithic mode.
Next step DWARF5 split dwarf support.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D121876
LLVM with LTO can generate function names in the form
func.llvm.<number>, where <number> could vary based on the compilation
environment. As a result, if a profiled binary originated from a
different build than a corresponding binary used for BOLT optimization,
then profiles for such LTO functions will be ignored.
To fix the problem, use "fuzzy" matching with "func.llvm.*" form.
Reviewed By: yota9, Amir
Differential Revision: https://reviews.llvm.org/D124117
Looks like implementation in llvm changed, and now we need to process error
being returned.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D124133
Handle the case where LLVM_REVISION is undefined (due to LLVM_APPEND_VC_REV=OFF
or otherwise) by setting "<unknown>" value as before D123549.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D123852
When processing profile data for shared object or PIE, perf2bolt needs
to calculate base address of the binary based on the map info reported
by the perf tool. When the mapping data provided is for the second
(or any other than the first) segment and the segment's file offset
does not match its memory offset, perf2bolt uses wrong assumption
about the binary base address.
Add a function to calculate binary base address using the reported
memory mapping and use the returned base for further address
adjustments.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D123755
The ld might relax ADRP+ADD or ADRP+LDR sequences to the ADR+NOP, add
the new case to the skipRelocation for aarch64.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D123334
Align with an upstream change D120305 to make PIE the default on linux-gnu.
Add `-no-pie` to tests that require it.
Reviewed By: maksfb, yota9
Differential Revision: https://reviews.llvm.org/D123329
BOLT expects PC-relative relocations in data sections to reference code
and the relocated data to form a jump table. However, there are cases
where PC-relative addressing is used for data-to-data references
(e.g. clang-15 can generate such code). BOLT should recognize and ignore
such relocations. Otherwise, they will be considered relocations not
claimed by any jump table and cause a failure in the strict mode.
Reviewed By: yota9, Amir
Differential Revision: https://reviews.llvm.org/D123650
tls-lld test might be broken since compiler might optimize plt function
call and use address directly from got table. The test is removed since
plt-gnu-ld checks the same functionality + versioning symbol matching,
no need to keep both of the tests.
The toolchain might optimize relocations in runtime-relocs test, replace
the test compilation with yaml files.
Differential Revision: https://reviews.llvm.org/D123332
Merging multiple legacy profiles (produced by instrumentation BOLT) can
easily reach GiBs. Let merge-fdata compact the profiles during merge to
significantly reduce space usage.
Differential Revision: https://reviews.llvm.org/D123513
Returning `std::array<uint8_t, N>` is better ergonomics for the hashing functions usage, instead of a `StringRef`:
* When returning `StringRef`, client code is "jumping through hoops" to do string manipulations instead of dealing with fixed array of bytes directly, which is more natural
* Returning `std::array<uint8_t, N>` avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a `StringRef`
As part of this patch also:
* Introduce `TruncatedBLAKE3` which is useful for using BLAKE3 as the hasher type for `HashBuilder` with non-default hash sizes.
* Make `MD5Result` inherit from `std::array<uint8_t, 16>` which improves & simplifies its API.
Differential Revision: https://reviews.llvm.org/D123100
Add !isTailCall in isUnconditionalBranch check in order to sync the x86
and aarch64 and fix the fixDoubleJumps pass on aarch64.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D122929
The bfd linker adds the symbol versioning string to the symbol name in symtab.
Skip the versioning part in order to find the registered PLT function.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D122039
Check for supported target architecture instead of the host arch when
deciding to execute non-runtime tests.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D122498
Read static relocs on the same address, as dynamic in order to update
constant island data address properly.
Differential Revision: https://reviews.llvm.org/D122100
Check that the function will be emitted in the final binary. Preserving
old function address is needed in case it is PLT trampiline, that is
currently not moved by the BOLT.
Differential Revision: https://reviews.llvm.org/D122098
BOLT treats aarch64 objects located in text as empty functions with
contant islands. Emit them with at least 8-byte alignment to the new
text section.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D122097
AArch64 requires CI to be aligned to 8 bytes due to access instructions
restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.
Differential Revision: https://reviews.llvm.org/D122065
It seems the earlier implementation does not follow the description
in LoopRotationPass.h: It rotates loops even if they are already laid out
correctly. The diff adjusts the behaviour.
Given that the impact of LoopInversionPass is minor, this change won't
yield significant perf differences. Tested on clang-10: there seems to be a
0.1%-0.3% cpu win and a small reduction of branch misses.
**Before:**
BOLT-INFO: 120 Functions were reordered by LoopInversionPass
**After:**
BOLT-INFO: 79 Functions were reordered by LoopInversionPass
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D121921
Run tentativeLayoutRelocMode twice only if UseOldText option was passed.
Refactor BF loop to break on condtition met.
Differential Revision: https://reviews.llvm.org/D121825
Remove tables from X86MCPlusBuilder, make use of llvm::X86 mnemonic tables.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D121573
Since LLVM MC now preserves redundant AdSize override prefix (0x67), remove it
in BOLT explicitly (-x86-strip-redundant-adsize, on by default).
Test Plan:
`bin/llvm-lit -a bolt/test/X86/addr32.s`
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120975
The BinaryEmitter uses opts::AlignText value to align the hot text
section. Also check that the opts::AlignText is at least
equal opts::AlignFunctions for the same reason, as described in D121392.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D121728
The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the minimum alignment of such sections to the maximum
possible function alignment explicitly.
Differential Revision: https://reviews.llvm.org/D121392
This clarifies that this is an LLVM specific variable and avoids
potential conflicts with other projects.
Differential Revision: https://reviews.llvm.org/D119918
For AArch64 in some cases/some distributions ld uses 64K alignment of LOAD segments by default.
Reviewed By: yota9, maksfb
Differential Revision: https://reviews.llvm.org/D119267
The aarch64 uses the trampolines located in .iplt section, which
contains plt-like trampolines on the value stored in .got. In this case
we don't have JUMP_SLOT relocation, but we have a symbol that belongs to
ifunc trampoline, so use it and set set plt symbol for such functions.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D120850
Address fuzzer crash on malformed input:
```
BOLT-ERROR: cannot get section contents for .dynsym: The end of the file was unexpectedly encountered.
```
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D121068
This patch enables PLT analysis for aarch64. It is used by the static
relocations in order to provide final symbol address of PLT entry for some
instructions like ADRP.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D118088
Reassigning the operand didn't update the operand type which resulted in an
assertion (`Assertion `isReg() && "This is not a register operand!"' failed.`)
Reset the instruction instead.
Test Plan:
```
ninja check-bolt
...
PASS: BOLT-Unit :: Core/./CoreTests/X86/MCPlusBuilderTester.ReplaceRegWithImm/0 (90 of 136)
```
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120263
We were not handling correctly conversion from DW_AT_high_pc into DW_AT_ranges,
when size of DW_AT_high_pc is not 4/8 bytes.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D120528
PC-relative memory operand could reference a different object from
the one located at the target address, e.g. when a negative offset
is used. Check relocations for the real referenced object.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120379
Further improve error handling in BOLT by reporting `RewriteInstance` errors in
a library and fuzzer-friendly way instead of exiting.
Follow-up to D119658
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120224
Fix data race reported by ThreadSanitizer in clang.test:
```
ThreadSanitizer: data race /data/llvm-project/bolt/lib/Passes/ShrinkWrapping.cpp:1359:28
in llvm::bolt::ShrinkWrapping::moveSaveRestores()
```
The issue is with incrementing global counters from multiple threads.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D120218
Refactor createBinaryContext and RewriteInstance/MachORewriteInstance
constructors to report an error in a library and fuzzer-friendly way instead of
returning a nullptr or exiting.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D119658
This patch changes patchELFAllocatableRelaSections from going through
old relocations sections and update the relocation offsets to emitting
the relocations stored in binary sections. This is needed in case we
would like to remove and add dynamic relocations during BOLT work and it
is used by golang support pass. Note: Currently we emit relocations in
the old sections, so the total number of them should be equal or less
of old number.
Testing: No special tests are neeeded, since this patch does not fix
anything or add new functionality (it only prepares to add). Every
PIC-compiled test binary will use this code and thus become a test.
But just in case the aarch64 dynamic relocations tests were added.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117612
Added ability to append new entries to DIE. This is useful to standadize DWARF4
Split Dwarf, and simplify implementation of DWARF5.
Multiple DIEs can share an abbrev. So currently limitation is that only unique
Attributes can be added.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D119577
After "Remove caching of ranges/abbrevs" patch the dwarf offsets are a
bit changed and the subprograms high pc is replaced with AT_RANGES.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D119733
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:
llvm/DebugInfo/DWARF/DWARFContext.h no longer includes:
- "llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h"
- "llvm/DebugInfo/DWARF/DWARFCompileUnit.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAranges.h"
- "llvm/DebugInfo/DWARF/DWARFDebugFrame.h"
- "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
- "llvm/DebugInfo/DWARF/DWARFDebugMacro.h"
- "llvm/DebugInfo/DWARF/DWARFGdbIndex.h"
- "llvm/DebugInfo/DWARF/DWARFSection.h"
- "llvm/DebugInfo/DWARF/DWARFTypeUnit.h"
- "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
Plus llvm/Support/Errc.h not included by a bunch of llvm/DebugInfo/DWARF/DWARF*.h files
Preprocessed lines to build llvm on my setup:
after: 1065629059
before: 1066621848
Which is a great diff!
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119723
When a jump table is recovered in postProcessIndirectBranches(),
successors for the containing basic block are added in random order.
Make the order deterministic.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D119672
Most notably,
llvm/Object/Binary.h no longer includes llvm/Support/MemoryBuffer.h
llvm/Object/MachOUniversal*.h no longer include llvm/Object/Archive.h
llvm/Object/TapiUniversal.h no longer includes llvm/Object/TapiFile.h
llvm-project preprocessed size:
before: 1068185081
after: 1068324320
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119457
We are moving away from building the runtimes with LLVM_ENABLE_PROJECTS,
however the documentation was largely outdated. This commit updates all
the documentation I could find to use LLVM_ENABLE_RUNTIMES instead of
LLVM_ENABLE_PROJECTS for building runtimes.
Note that in the near future, libcxx, libcxxabi and libunwind will stop
supporting being built with LLVM_ENABLE_PROJECTS altogether. I don't know
what the plans are for other runtimes like libc, openmp and compiler-rt,
so I didn't make any changes to the documentation that would imply
something for those projects.
Once this lands, I will also cherry-pick this on the release/14.x branch
to make sure that LLVM's documentation is up-to-date and reflects what
we intend to support in the future.
Differential Revision: https://reviews.llvm.org/D119351
Removing caching of ranges/abbrevs to simplify the code.
Before we were doing it to get around a gdb limitation.
FBD34015613
Reviewed By: Amir, maksfb
Differential Revision: https://reviews.llvm.org/D119276
Add the script to set up llvm-bolt-wrapper. The intended use is to run NFC
checks manually and automatically on a buildbot.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D118516
Summary:
- variable 'TotalSize' set but not used
- variable 'TotalCallsTopN' set but not used
- use of bitwise '|' with boolean operands
Reviewed By: maksfb
FBD33911129
clang-10 complains about changed section flags in two tests:
- X86/shrinkwrapping.test
- X86/exceptions-args.test
Fix that by adding the missing flags.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D119014
Only enable --emit-relocs linker option for merge-fdata target if tests are enabled.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D118580
We can have a scenario where multiple CUs share an abbrev table.
We modify or don't modify one CU, which leads to other CUs having invalid abbrev section.
Example that caused it.
All of CUs shared the same abbrev table. First CU just had compile_unit and sub_program.
It was not modified. Next CU had DW_TAG_lexical_block with
DW_AT_low_pc/DW_AT_high_pc converted to DW_AT_low_pc/DW_AT_ranges.
We used unmodified abbrev section for first and subsequent CUs.
So when parsing subsequent CUs debug info was corrupted.
In this patch we will now duplicate all sections that are modified and are different.
This also means that if .debug_types is present and it shares Abbrev table, and
they usually are, we now can have two Abbrev tables. One for CU that was modified,
and unmodified one for TU.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D118517
Commit history in chronological order:
[BOLT] llvm-bolt-wrapper: added wrapper for bolt binary matching
Summary:
Wrapper to compare two versions of BOLT to see if they produce the same output
binary given the same input.
(cherry picked from FBD26626137)
[BOLT] llvm-bolt-wrapper: support for no-output tests and heatmap mode
Summary:
- Added an option `skip_binary_cmp` to support invocations that don't output
a binary
- Minor fixes for heatmap mode, timeout, log comparison
- Rearranged in-line config example to be copy-pasteable
(cherry picked from FBD26822016)
[BOLT] llvm-bolt-wrapper: merge stdout/stderr, search for config in script dir
(cherry picked from FBD27529335)
[BOLT] llvm-bolt-wrapper: handle /dev/null
Summary:
Fixed the wrapper to preserve `-o /dev/null` and skip binary matching for such
invocations.
(cherry picked from FBD28013747)
[BOLT] llvm-bolt-wrapper: handle cases where output binary doesn't exist
Summary:
Handle invocations where output binary is not generated (e.g. due to an expected
assertion or exit with BOLT-ERROR) and skip binary comparison in such cases.
(cherry picked from FBD28080158)
[BOLT] llvm-bolt-wrapper: handle boltdiff mode
Summary:
Handle `llvm-boltdiff` invocation similarly to `perf2bolt`
(cherry picked from FBD28080157)
[BOLT] llvm-bolt-wrapper: find section with mismatch
Summary:
For mismatching ELF files, find section with mismatch and print sections table
with highlighted mismatch section.
(cherry picked from FBD28087231)
[BOLT] llvm-bolt-wrapper: ignore-build-id in perf2bolt mode
Summary:
When perf2bolt fails to match build-id from perf output for cmp binary, we need
to use -ignore-build-id option to override the strict checking behavior.
(cherry picked from FBD28087232)
[BOLT] llvm-bolt-wrapper: suppress -bolt-info=0 in heatmap mode
Summary:
Heatmap mode is incompatible with `-bolt-info=0` used to suppress binary
differences. Remove it.
(cherry picked from FBD28087230)
[BOLT] llvm-bolt-wrapper: add config-generator mode
Summary:
llvm-bolt-wrapper config can be generated by the script itself.
It makes the workflow more reliable compared to preparing the config manually.
(cherry picked from FBD28358939)
[BOLT] llvm-bolt-wrapper: fix mismatch reporting
Summary:
1. Fixed header comparison issue where headers were skipped due to
`skip_end == 0` (`lst[:-n]` does not work if n==0).
2. Detect color support while printing mismatching section:
- use bold color if terminal supports ANSI escape codes,
- otherwise print ">" at mismatching section.
3. Remove extra 0x before mismatching offset.
(cherry picked from FBD28691979)
[BOLT] llvm-bolt-wrapper: handle perf2bolt tests with ignore-build-id
Summary:
`ignore-build-id` must be passed not more than once. Account for that.
(cherry picked from FBD29830266)
[BOLT] llvm-bolt-wrapper: fix running subprocesses in parallel
Summary:
The commands were running sequentially due to the use of blocking `communicate`
call, which is needed when stdout/stderr are directed to a pipe.
Fix this behavior by directing the output to a file.
(cherry picked from FBD29951863)
The aarch64 platform has special registers like X0_X1_X2_X3_X4_X5_X6_X7.
Using the downwards propagation this register will become a super
register for all X0..X7 and its super registers which is not right. This
patch replaces the downwards propagation with caching all the aliases using MCRegAliasIterator.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117394
Since we now re-write .debug_info the DWARF CU Offsets can change.
Just like for .debug_aranges the GDB Index will need to be updated.
Reviewed By: Amir, maksfb
Differential Revision: https://reviews.llvm.org/D118273
This patch reverts patch "DWARFv5 default: Switch bolt tests to use
DWARFv4 since Bolt doesn't support v5 yet" and places the -gdwarf-4 flag
to the global cflags config file.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D118283
This patch adds unit testing support for BOLT. In order to do this we will need at least do this changes on the code level:
* Make createMCPlusBuilder accessible externally
* Remove positional InputFilename argument to bolt utlity sources
And prepare the cmake and lit for the new tests.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb, Amir
Differential Revision: https://reviews.llvm.org/D118271
<memory> is no longer included as a result of 5f290c090a
("Move STLFunctionalExtras out of STLExtras").
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D118064