Try to revert D113741 once again.
This also reverts 0ac75e82ff (D114705)
as it causes LLDB's lldb-api.lang/cpp/nsimport.TestCppNsImport.py test
failure w/o D113741.
This reverts commit f9607d45f3.
Differential Revision: https://reviews.llvm.org/D116225
I added `PPC32Got2Section` D62464 to support .got2 but did not implement .got2
in another output section.
PR52799 has a linker script placing .got2 in .rodata, which causes a null
pointer dereference because a MergeSyntheticSection's file is nullptr.
Add the support.
The new `lazy` state is the inverse of the previous `LazyObjFile::extracted`.
There are many advantages:
* previously when a LazyObjFile was extracted, a new ObjFile/BitcodeFile was created; now the file is reused, just with `lazy` cleared
* avoid the confusing transfer of `symbols` from LazyObjFile to the new file
* the `incompatible file:` diagnostic is unified with `is incompatible with`
* simpler code, smaller executable (6200+ bytes smaller on x86-64)
* make eager parsing feasible (for parallel section/symbol table initialization)
Currently the singleton `config` is assigned by `config = make<Configuration>()`
and (if `canExitEarly` is false) destroyed by `lld::freeArena`.
`make<Configuration>` allocates a stab with `malloc(4096)`. This both wastes
memory and bloats the executable (every type instantiates `BumpPtrAllocator`
which costs more than 1KiB code on x86-64).
(No need to worry about `clang::no_destroy`. Regular invocations (`canExitEarly`
is true) call `_Exit` via llvm::sys::Process::ExitNoCleanup.)
Reviewed By: lichray
Differential Revision: https://reviews.llvm.org/D116143
Older Go cmd/link used SHT_PROGBITS for .init_array .
Work around the lack of https://golang.org/cl/373734 for a while.
It does not generate .fini_array or .preinit_array
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured
```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```
This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.
* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.
With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115993
Summary: When disassembling, symbolize a branch target operand
to print a label instead of a real address.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D114492
In 20a895c4be, we introduce `finalizeOptimizationRemarks()` to make sure we flush the diagnostic remarks file in case the linker doesn't call the global destructors before exiting.
In https://reviews.llvm.org/D73597, we add optimization remarks for removed functions for debugging or for detecting dead code.
But there is a case, if PreOptModuleHook or PostInternalizeModuleHook is defined (e.g. `--plugin-opt=emit-llvm` is passed to linker), we do not call `finalizeOptimizationRemarks()`, therefore we will get an incomplete optimization remarks file.
This patch make sure we flush the diagnostic remarks file when PreOptModuleHook or PostInternalizeModuleHook is defined.
Reviewed By: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D115417
writeSections is typically a bottleneck.
This was used to track down the following bottlenecks:
* Output section .rela.dyn (9115d75117)
* Output section .debug_str (3aae04c744)
* posix_fallocate is slow for Linux tmpfs: D115957
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115984
GCC's powerpc32 port predefines `PPC` as a macro in GNU C++ mode in some configurations (Linux,
FreeBSD, and some others. See `builtin_define_std ("PPC"); ` in gcc/config/rs6000).
```
% powerpc-linux-gnu-g++ -E -dM -xc++ /dev/null -o - | grep -w PPC
#define PPC 1
```
Fixes https://bugs.gentoo.org/829599
Reviewed By: thesamesam
Differential Revision: https://reviews.llvm.org/D116017
This decreases struct sizes and usually decreases the lld executable
size (39KiB for my x86-64 executable) (unless in some cases smaller
SmallVector leads to more inlining, e.g. StringTableBuilder).
For --gdb-index, there may be memory usage saving.
Fixes#52778.
Probably fixes Chromium crashing on startup on macOS 10.15 (and older) systems
when building with LTO, but I haven't verified that yet.
Differential Revision: https://reviews.llvm.org/D115949
Matches llvm's and clang's /test/CMakeLists.txt, makes it easier to
see in diffs which deps get added, and makes it easier to see if
a given dependency is present or not.
No behavior change.
Only called once. Moving to OutputSections.cpp can make it inlined.
finalizeInputSections can be very hot, especially in -O1 links with much debug info.
Everyone uses -l -L instead of the long option counterparts.
Make help messages attach to -L -l and (--reproduce) use them for response.txt
command line options.
Calling `Allocate` with 0 size (when .symtab is absent, e.g.
`invalid/mips-invalid-options-descriptor.test`) may return a nullptr, which will
crash with -fsanitize=null (the underlying `Allocate` function is
LLVM_ATTRIBUTE_RETURNS_NONNULL).
The SHT_GNU_version index is 16-bit, so the 32-bit value is a waste.
Technically non-default version index 0x7fff uses version index 0xffff,
but it is impossible in practice.
This change decreases sizeof(SymbolUnion) from 80 to 72 on ELF64 platforms.
Memory usage decreases by 1% when linking a large executable.
For large applications that write to map files, writing map files can take quite
a bit of time. Sorting the biggest contributors to link times, writing map files
ranks in at 2nd place, with load input files being the biggest contributor of
link times. Avoiding writing map files on the critical path (and having its own
thread) saves ~2-3 seconds when linking chromium framework on a 16-Core
Intel Xeon W.
```
base diff difference (95% CI)
sys_time 1.617 ± 0.034 1.657 ± 0.026 [ +1.5% .. +3.5%]
user_time 28.536 ± 0.245 28.609 ± 0.180 [ -0.1% .. +0.7%]
wall_time 23.833 ± 0.271 21.684 ± 0.194 [ -9.5% .. -8.5%]
samples 31 24
```
Reviewed By: #lld-macho, oontvoo, int3
Differential Revision: https://reviews.llvm.org/D115416
* Avoid the name truncation quirk in SymbolTable::insert: the truncated name will be replaced by @@ again.
* Allow foo and foo@@v1 in different files to be diagnosed as duplicate definition error (GNU ld behavior)
* Avoid potential redundant strlen on symbol name due to StringRefZ in ObjFile<ELFT>::initializeSymbols
Sorting the prefixes by decreasing frequency can improve performance.
.gcc_except_table is relatively frequent, so move it ahead.
.ctors and .dtors mostly disappear and should be the last.
SHT_GNU_verdef is typically small, so it's unnecessary to reserve the vector.
While here, fix a hypothetical issue when SHT_GNU_verdef has non-increasing
version indexes, which don't happen with GNU ld, gold, ld.lld's output.
My x86-64 lld executable is 256 bytes smaller.
sizeof(ObjFile<ELF64LE>) is decreased from 344 to 272 on an ELF64 system.
In a large link with 30000 ObjFiles, this may be 2+MiB saving.
Change std::vector members to SmallVector, and std::string members to
SmallString<0> (these members typically don't benefit from small string optimization).
On Linux x86-64 the lld executable is ~6k smaller.
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
Add missing coverage exposed by D114783.
There should be no associated IRELATIVE, otherwise (a) glibc ld.so may
crash (b) it wastes space (c) unused IPLT causes confusion.
If a copy related symbol (say `copy`) is referenced in two .o
files, this change removes a duplicated line from the -Map output:
```
202470 202470 1 1 .bss.rel.ro
202470 202470 1 1 <internal>:(.bss.rel.ro)
202470 202470 1 1 copy
removed 202470 202470 1 1 copy
```
Differential Revision: https://reviews.llvm.org/D115697
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.
Reviewed By: ikudrin, peter.smith
Differential Revision: https://reviews.llvm.org/D115603
(Fixed an issue about GOT on a copy relocated alias.)
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
This reverts commit fc33861d48.
`replaceWithDefined` should copy needsGot, otherwise an alias for a copy
relocated symbol may not have GOT entry if its needsGot was originally true.
lld only needs DIContext.h which it gets through Symbolize.h -> SymbolizableModule.h -> DIContext.h. This replaces it with a direct include of DIContext.h to avoid any confusion and pulling in unnecessary headers.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D115659
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make parallel relocation scanning possible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
* Make GOT deduplication feasible
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
An unstable sort suffices. In a large link (11.06s), this decreases .rela.dyn
writeTo time from 1.52s to 0.81s, resulting in 6% total time speedup (the
benefit will greatly dilute if --pack-dyn-relocs=relr becomes prevailing).
Encoding the dynamic relocations then sorting raw Elf_Rel/Elf_Rela doesn't seem
to improve much (doing that would require code duplicate because of
Elf_Rel/Elf_Rela plus unfortunate mips64le), so don't do that.
1. After D113241, we have the section address easily accessible and no
longer need to iterate across the LC_SEGMENT commands to emit
LC_DATA_IN_CODE.
2. There's no need to store a pointer to the data in code entries during
the parse step; we can just look it up as part of the output step.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D115556
... only whether they have more than zero. This simplifies the code slightly.
I've also moved the field into the ConcatInputSection subclass since it doesn't
actually get used by the other InputSections.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D115539
This fixes an issue introduced in D101996.
A weak reference in a shared library could be incorrectly reported if
there is another library that has a strong reference to the same symbol.
Differential Revision: https://reviews.llvm.org/D115041
Enable the pdbpagesize flag to allow linking of PDB files > 4GB.
Also includes a couple small fixes to change to uint64_t to support the
larger file sizes. I updated the max file size check in MSFBuilder.cpp
to take into account the page size.
Differential Revision: https://reviews.llvm.org/D115051
We were fetching archive symbols too eagerly, bloating binary size as well as
just screwing up binaries that expected to look up certain symbols only at
runtime.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D115092
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
Mach-O LLD uses the buffer identifier of the memory buffer backing an object
file to generate stabs which are used by `dsymutil` to find the object file for
dSYM generation.
When using thinLTO, these buffers are provided by the cache which initially
saves them to disk as temporary files beginning with "Thin-" but renames them
to persistent files beginning with "llvmcache-" before the buffer is provided
to the cache user.
However, the buffer is created before the file is renamed and is given the temp
file's name as an identifier. This causes the generated stabs to point to
nonexistent files.
This change names the buffer with the eventual persistent filename. I think
this is safe because failing to rename the temp file is a fatal error.
Differential Revision: https://reviews.llvm.org/D115055
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
PLT usage needs the first 12 bytes of the .got section. We need to keep .got and
DT_GOT_PPC even if .got/_GLOBAL_OFFSET_TABLE_ are not referenced (large PIC code
may only reference .got2), which is the case in OpenBSD's ld.so, leading
to a misleading error, "unsupported insecure BSS PLT object".
Fix this by adding R_PPC32_PLTREL to the list of hasGotOffRel.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114982
During the llvm round table it was generally agreed that the newer macho
lld implementation is feature complete enough to replace the old
implementation entirely. This will reduce confusion for new users who
aren't aware of the history.
Differential Revision: https://reviews.llvm.org/D114842
This is very similar to https://reviews.llvm.org/D103557 but applies to
symbols which are undefined at link time rather than compile time.
We already have code that handles symbols which were defined at link
time but dead stripped by `--gc-sections` (See
`test/wasm/debug-removed-fn.ll`). In that case the symbols are not live
(!isLive()). However, we can also have live symbols (which are
references by the program) but which are undefined at link time and are
imported by the linker.
In the test case here the symbol `undef` is used but is not defined
in the program but is imported by the linker due to the
`--import-undefined` flag.
Fixes: https://github.com/emscripten-core/emscripten/issues/15528
Differential Revision: https://reviews.llvm.org/D114921
When a comdat symbol is defined in both bitcode and regular object
files, which are contained in the same archive, the linker could lose
the flag that the symbol is used in the regular object file and allow
LTO to internalize it, which led to "error: undefined symbol".
The issue was introduced in D79300.
Differential Revision: https://reviews.llvm.org/D114801
This reverts the PPC64PCRelLongBranchThunk part from D86706.
PPC64PCRelLongBranchThunk is the same as PPC64R12SetupStub.
Use `__gep_setup_` instead of `__long_branch_pcrel_` for the stub symbol name
as it more closely indicates the operation.
(Note: GNU ld uses `*.long_branch.*` and `*.plt_branch.*`).
Reviewed By: NeHuang, nemanjai
Differential Revision: https://reviews.llvm.org/D114656
There is a trend of having more optional options (usually security
hardening related) like -z cet-report=, -z bti-report=, -z force-bti.
If ld.lld 14.0.0 uses a warning, in 15/16/17/... timeframe when people
add new options to software, they can worry less about linker errors on ld.lld 14.0.0.
In some cases `-z foo` does essential work where a silent ignore can be
problematic, but the user has received a warning. From my observation, the
doing-essential-work `-z foo` is much fewer than the converse. In addition,
the user who cares can use `--fatal-warnings` (Note: GNU ld doesn't upgrade warnings to errors).
It is unclear whether we need something like `clang -Wunknown-warning-option`.
If we ever run into unfortunate transition like `-z start-stop-gc`, the
affected software (e.g. ldc is a compiler which passes linker options to the underlying ld)
can blindly add the `-z` option, without worrying it may cause a linker error to LLD 14.0.0.
Reviewed By: jrtc27, peter.smith
Differential Revision: https://reviews.llvm.org/D114748