It seems the earlier implementation does not follow the description
in LoopRotationPass.h: It rotates loops even if they are already laid out
correctly. The diff adjusts the behaviour.
Given that the impact of LoopInversionPass is minor, this change won't
yield significant perf differences. Tested on clang-10: there seems to be a
0.1%-0.3% cpu win and a small reduction of branch misses.
**Before:**
BOLT-INFO: 120 Functions were reordered by LoopInversionPass
**After:**
BOLT-INFO: 79 Functions were reordered by LoopInversionPass
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D121921
Run tentativeLayoutRelocMode twice only if UseOldText option was passed.
Refactor BF loop to break on condtition met.
Differential Revision: https://reviews.llvm.org/D121825
Remove tables from X86MCPlusBuilder, make use of llvm::X86 mnemonic tables.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D121573
Since LLVM MC now preserves redundant AdSize override prefix (0x67), remove it
in BOLT explicitly (-x86-strip-redundant-adsize, on by default).
Test Plan:
`bin/llvm-lit -a bolt/test/X86/addr32.s`
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120975
The BinaryEmitter uses opts::AlignText value to align the hot text
section. Also check that the opts::AlignText is at least
equal opts::AlignFunctions for the same reason, as described in D121392.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D121728
The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the minimum alignment of such sections to the maximum
possible function alignment explicitly.
Differential Revision: https://reviews.llvm.org/D121392
For AArch64 in some cases/some distributions ld uses 64K alignment of LOAD segments by default.
Reviewed By: yota9, maksfb
Differential Revision: https://reviews.llvm.org/D119267
The aarch64 uses the trampolines located in .iplt section, which
contains plt-like trampolines on the value stored in .got. In this case
we don't have JUMP_SLOT relocation, but we have a symbol that belongs to
ifunc trampoline, so use it and set set plt symbol for such functions.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D120850
Address fuzzer crash on malformed input:
```
BOLT-ERROR: cannot get section contents for .dynsym: The end of the file was unexpectedly encountered.
```
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D121068
This patch enables PLT analysis for aarch64. It is used by the static
relocations in order to provide final symbol address of PLT entry for some
instructions like ADRP.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D118088
Reassigning the operand didn't update the operand type which resulted in an
assertion (`Assertion `isReg() && "This is not a register operand!"' failed.`)
Reset the instruction instead.
Test Plan:
```
ninja check-bolt
...
PASS: BOLT-Unit :: Core/./CoreTests/X86/MCPlusBuilderTester.ReplaceRegWithImm/0 (90 of 136)
```
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120263
We were not handling correctly conversion from DW_AT_high_pc into DW_AT_ranges,
when size of DW_AT_high_pc is not 4/8 bytes.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D120528
PC-relative memory operand could reference a different object from
the one located at the target address, e.g. when a negative offset
is used. Check relocations for the real referenced object.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120379
Further improve error handling in BOLT by reporting `RewriteInstance` errors in
a library and fuzzer-friendly way instead of exiting.
Follow-up to D119658
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120224
Fix data race reported by ThreadSanitizer in clang.test:
```
ThreadSanitizer: data race /data/llvm-project/bolt/lib/Passes/ShrinkWrapping.cpp:1359:28
in llvm::bolt::ShrinkWrapping::moveSaveRestores()
```
The issue is with incrementing global counters from multiple threads.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D120218
Refactor createBinaryContext and RewriteInstance/MachORewriteInstance
constructors to report an error in a library and fuzzer-friendly way instead of
returning a nullptr or exiting.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D119658
This patch changes patchELFAllocatableRelaSections from going through
old relocations sections and update the relocation offsets to emitting
the relocations stored in binary sections. This is needed in case we
would like to remove and add dynamic relocations during BOLT work and it
is used by golang support pass. Note: Currently we emit relocations in
the old sections, so the total number of them should be equal or less
of old number.
Testing: No special tests are neeeded, since this patch does not fix
anything or add new functionality (it only prepares to add). Every
PIC-compiled test binary will use this code and thus become a test.
But just in case the aarch64 dynamic relocations tests were added.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117612
Added ability to append new entries to DIE. This is useful to standadize DWARF4
Split Dwarf, and simplify implementation of DWARF5.
Multiple DIEs can share an abbrev. So currently limitation is that only unique
Attributes can be added.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D119577
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:
llvm/DebugInfo/DWARF/DWARFContext.h no longer includes:
- "llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h"
- "llvm/DebugInfo/DWARF/DWARFCompileUnit.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAranges.h"
- "llvm/DebugInfo/DWARF/DWARFDebugFrame.h"
- "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
- "llvm/DebugInfo/DWARF/DWARFDebugMacro.h"
- "llvm/DebugInfo/DWARF/DWARFGdbIndex.h"
- "llvm/DebugInfo/DWARF/DWARFSection.h"
- "llvm/DebugInfo/DWARF/DWARFTypeUnit.h"
- "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
Plus llvm/Support/Errc.h not included by a bunch of llvm/DebugInfo/DWARF/DWARF*.h files
Preprocessed lines to build llvm on my setup:
after: 1065629059
before: 1066621848
Which is a great diff!
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119723
When a jump table is recovered in postProcessIndirectBranches(),
successors for the containing basic block are added in random order.
Make the order deterministic.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D119672
Removing caching of ranges/abbrevs to simplify the code.
Before we were doing it to get around a gdb limitation.
FBD34015613
Reviewed By: Amir, maksfb
Differential Revision: https://reviews.llvm.org/D119276
Summary:
- variable 'TotalSize' set but not used
- variable 'TotalCallsTopN' set but not used
- use of bitwise '|' with boolean operands
Reviewed By: maksfb
FBD33911129
We can have a scenario where multiple CUs share an abbrev table.
We modify or don't modify one CU, which leads to other CUs having invalid abbrev section.
Example that caused it.
All of CUs shared the same abbrev table. First CU just had compile_unit and sub_program.
It was not modified. Next CU had DW_TAG_lexical_block with
DW_AT_low_pc/DW_AT_high_pc converted to DW_AT_low_pc/DW_AT_ranges.
We used unmodified abbrev section for first and subsequent CUs.
So when parsing subsequent CUs debug info was corrupted.
In this patch we will now duplicate all sections that are modified and are different.
This also means that if .debug_types is present and it shares Abbrev table, and
they usually are, we now can have two Abbrev tables. One for CU that was modified,
and unmodified one for TU.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D118517
The aarch64 platform has special registers like X0_X1_X2_X3_X4_X5_X6_X7.
Using the downwards propagation this register will become a super
register for all X0..X7 and its super registers which is not right. This
patch replaces the downwards propagation with caching all the aliases using MCRegAliasIterator.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117394
Since we now re-write .debug_info the DWARF CU Offsets can change.
Just like for .debug_aranges the GDB Index will need to be updated.
Reviewed By: Amir, maksfb
Differential Revision: https://reviews.llvm.org/D118273
This patch adds unit testing support for BOLT. In order to do this we will need at least do this changes on the code level:
* Make createMCPlusBuilder accessible externally
* Remove positional InputFilename argument to bolt utlity sources
And prepare the cmake and lit for the new tests.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb, Amir
Differential Revision: https://reviews.llvm.org/D118271
This patch fixes the removal of unreachable uncondtional branch located
after return instruction.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D117677
Summary:
Move the annotation to avoid dynamic memory allocations.
Improves the CPU time of instrumenting a large binary by 1% (+-0.8%, p-value 0.01)
Test Plan: NFC
Reviewers: maksfb
FBD30091656