This was recently introduced in GNU linkers and it makes sense for
ld.lld to have the same support. This implementation omits checking if
the input string is valid json to reduce size bloat.
Differential Revision: https://reviews.llvm.org/D131439
D74537 introduced a bug: if `(config->andFeatures & GNU_PROPERTY_AARCH64_FEATURE_1_PAC) != 0`
with -z pac-plt unspecified, we incorrectly use AArch64BtiPac, whose writePlt will make
out-of-bounds write after the .plt section. This is often benign because the
output section after .plt will usually overwrite the content.
This is very difficult to test without D131247 (Parallelize writes of different OutputSections).
Some tests (e.g. aarch64-feature-pac.s) segfault in libstdc++ _GLIBCXX_DEBUG
builds (enabled by LLVM_ENABLE_EXPENSIVE_CHECKS).
dyn_cast<ThunkSection> is incorrectly true for any SyntheticSection. std::merge
transitively calls mergeCmp(x, x) (due to __glibcxx_requires_irreflexive_pred)
and will segfault in `ta->getTargetInputSection()`. The dyn_cast<ThunkSection>
issue should be eventually fixed properly, bug `a != b` is robust enough for now.
D91426 makes .got possibly empty while needed. If .got and .data have the same
address, and .got's content is written after .data, the first word of .data will
be corrupted.
The bug is not testable without D131247.
This implements the last step of
https://discourse.llvm.org/t/parallel-input-file-parsing/60164 for the ELF port.
For an ELF object file, we previously did: parse, (parallel) initializeLocalSymbols, (parallel) postParseObjectFile.
Now we do: parse, (parallel) initSectionsAndLocalSyms, (parallel) postParseObjectFile.
initSectionsAndLocalSyms does most of input section initialization.
The sequential `parse` does SHT_ARM_ATTRIBUTES/SHT_RISCV_ATTRIBUTES/SHT_GROUP initialization for now.
Performance linking some programs with --threads=8 (glibc 2.33 malloc and mimalloc):
* clang: 1.05x as fast with glibc malloc, 1.03x as fast with mimalloc
* chrome: 1.04x as fast with glibc malloc, 1.03x as fast with mimalloc
* internal search program: 1.08x as fast with glibc malloc, 1.05x as fast with mimalloc
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D130810
makeThreadLocal/makeThreadLocalN are moved from D130810 ([ELF] Parallelize input
section initialization) here to make D130810 more focused on the refactor:
* COFF has some needs for multiple linker contexts. D108850 partially removed
global states from lldCommon but left the global variable `lctx`.
* To the best of my knowledge, all multiple-linker-context feature requests to
ELF are more from user convenience, with no very strong argument.
* In practice, ELF port is very difficult to remove global states without
introducing significant performance regression/hurting code readability.
* Per-thread allocators from D122922/D123879 are too expensive and will not
really benefit ELF.
This patch adds a simple thread_local based makeThreadLocal to
lld/Common/Memory.h. It will enable further optimization in ELF.
I went over the output of the following mess of a command:
`(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z | parallel --xargs -0 cat | aspell list --mode=none --ignore-case | grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n | grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)`
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Differential Revision: https://reviews.llvm.org/D130982
Linux Standard Base Core Specification says that CIE/FDE is padded to an
addressing unit size boundary, but in practice GNU assembler/LLVM integrated
assembler pad FDE/CIE to 4 and the last FDE to 8 on 64-bit systems.
In addition, GNU ld doesn't pad to 8, so let's drop excess padding, too.
If the assembler provides aligned pieces, the output will be aligned.
Noticed .eh_frame size reduction for 3 executables: 0.3% (chrome), 4.7% (clang),
7.6% (an internal program).
* Inline getReloc
* Fold the UINT32_MAX length check into the section size check.
This transformation is valid because we don't support .eh_frame input sections
larger than 32-bit (unrealistic even for large code models).
This simplifies code, removes a read32 (for id==0 check), and makes it feasible
to combine some operations in EhInputSection::split and EhFrameSection::addRecords.
Mostly NFC, but fixes "Relocation not in any piece" assertion failure in an
erroneous case when a relocation offset precedes all CIE/FDE pices.
inputSections temporarily contains EhInputSection objects mainly for
combineEhSections. Place EhInputSection objects into a new vector
ehInputSections instead of inputSections.
Similarly to -o output directories will not be created so -Map being
copied verbatim will likely cause ld.lld @response.txt to fail.
Differential Revision: https://reviews.llvm.org/D130681
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406