Patch by Robert O'Callahan.
Rust projects tend to link in all object files from all dependent
libraries and rely on --gc-sections to strip unused code and data.
Unfortunately --gc-sections doesn't currently strip any debuginfo
associated with GC'ed sections, so lld links in the full debuginfo from
all dependencies even if almost all that code has been discarded. See
https://github.com/rust-lang/rust/issues/56068 for some details.
Properly stripping debuginfo for discarded sections would be difficult,
but a simple approach that helps significantly is to mark debuginfo
sections as live only if their associated object file has at least one
live code/data section. This patch does that. In a (contrived but not
totally artificial) Rust testcase linked above, it reduces the final
binary size from 46MB to 5.1MB.
Differential Revision: https://reviews.llvm.org/D54747
llvm-svn: 358069
lld's mark-sweep garbage collector was written in the visitor pattern.
There are functions that traverses a given graph, and the functions calls
callback functions to dispatch according to node type.
The code was originaly pretty simple, and lambdas worked pretty
well. However, as we add more features to the garbage collector, that became
more like a callback hell. We now have a callback function that wraps
another callback function, for example. It is not easy to follow the flow of
the control.
This patch rewrites it as a regular class. What was once a lambda is now a
regular class member function. I think this change fixes the readability
issue.
No functionality change intended.
Differential Revision: https://reviews.llvm.org/D59800
llvm-svn: 356966
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
In glibc, libc.so is a linker script with an as-needed dependency on ld-linux-x86-64.so.2
GROUP ( /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/x86_64-linux-gnu/libc_nonshared.a AS_NEEDED ( /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ) )
ld-linux-x86-64.so.2 (as-needed) defines some symbols which resolve undefined references in libc.so.6, it will therefore be added as a DT_NEEDED entry, which isn't necessary.
The test case as-needed-not-in-regular.s emulates the libc.so scenario, where ld.bfd and gold don't add DT_NEEDED for a.so
The relevant code in gold/resolve.cc:
// If we have a non-WEAK reference from a regular object to a
// dynamic object, mark the dynamic object as needed.
if (to->is_from_dynobj() && to->in_reg() && !to->is_undef_binding_weak())
to->object()->set_is_needed();
in_reg() appears to do something similar to IsUsedInRegularObj.
This patch makes lld do the similar thing, but moves the check from
addShared to a later stage MarkLive where all symbols are scanned.
Reviewers: ruiu, pcc, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D55902
llvm-svn: 349849
Previously, we uncompress all compressed sections before doing anything.
That works, and that is conceptually simple, but that could results in
a waste of CPU time and memory if uncompressed sections are then
discarded or just copied to the output buffer.
In particular, if .debug_gnu_pub{names,types} are compressed and if no
-gdb-index option is given, we wasted CPU and memory because we
uncompress them into newly allocated bufers and then memcpy the buffers
to the output buffer. That temporary buffer was redundant.
This patch changes how to uncompress sections. Now, compressed sections
are uncompressed lazily. To do that, `Data` member of `InputSectionBase`
is now hidden from outside, and `data()` accessor automatically expands
an compressed buffer if necessary.
If no one calls `data()`, then `writeTo()` directly uncompresses
compressed data into the output buffer. That eliminates the redundant
memory allocation and redundant memcpy.
This patch significantly reduces memory consumption (20 GiB max RSS to
15 Gib) for an executable whose .debug_gnu_pub{names,types} are in total
5 GiB in an uncompressed form.
Differential Revision: https://reviews.llvm.org/D52917
llvm-svn: 343979
Currently, LLD marks all non-allocatable sections except SHF_REL[A] as Live
when doing GC.
This can be a reason of the crash when SHF_LINK_ORDER sections
are involved, because their parents can be dead.
We should do GC for them correctly. The patch implements it.
Differential revision: https://reviews.llvm.org/D46880
llvm-svn: 332589
Now that getSectionPiece is fast (uses a hash) it is probably OK to
split merge sections early.
The reason I want to do this is to split eh_frame sections in the same
place.
This does mean that we have to decompress early. Given that the only
compressed sections are debug info, I don't think we are missing much.
It is a small improvement: 0.5% on the geometric mean.
llvm-svn: 331058
This should resolve the issue that lld build fails in some hosts
that uses case-insensitive file system.
Differential Revision: https://reviews.llvm.org/D43788
llvm-svn: 326339
Previously wasm used a separate header to declare markLive
and ELF used to declare ICF. This change makes each backend
consistently declare these in their own headers.
Differential Revision: https://reviews.llvm.org/D43529
llvm-svn: 325631
Currently, archive file name is missing in this message. In general,
we should avoid constructing strings in an ad-hoc manner and instead
use toString() to get consistent output strings.
Differential Revision: https://reviews.llvm.org/D43420
llvm-svn: 325416
Now that gc sections runs after linker defined symbols are added it
can see symbols that point to an OutputSection.
Should fix a bot failure.
llvm-svn: 320412
Since MarkLive.cpp is the place where we set Live flags for
other sections, it looks correct to do that there.
Benefit is that we stop spreading GC logic outsize of MarkLive.cpp.
Differential revision: https://reviews.llvm.org/D40454
llvm-svn: 319435
This includes a fix to mark copy reloc aliases as used.
Original message:
[ELF] Do not keep symbols if they referenced only from discarded sections.
This patch also ensures that in case of "--as-needed" is used,
DT_NEEDED entries are not created if they are required only by
these eliminated symbols.
llvm-svn: 319215
This patch also ensures that in case of "--as-needed" is used,
DT_NEEDED entries are not created if they are required only by
these eliminated symbols.
Differential Revision: https://reviews.llvm.org/D38790
llvm-svn: 319008
Now that DefinedRegular is the only remaining derived class of
Defined, we can merge the two classes.
Differential Revision: https://reviews.llvm.org/D39667
llvm-svn: 317448
Now that we have only SymbolBody as the symbol class. So, "SymbolBody"
is a bit strange name now. This is a mechanical change generated by
perl -i -pe s/SymbolBody/Symbol/g $(git grep -l SymbolBody lld/ELF lld/COFF)
nd clang-format-diff.
Differential Revision: https://reviews.llvm.org/D39459
llvm-svn: 317370
SymbolBody and Symbol were separated classes due to a historical reason.
Symbol used to be a pointer to a SymbolBody, and the relationship
between Symbol and SymbolBody was n:1.
r2681780 changed that. Since that patch, SymbolBody and Symbol are
allocated next to each other to improve memory locality, and they have
1:1 relationship now. So, the separation of Symbol and SymbolBody no
longer makes sense.
This patch merges them into one class. In order to avoid updating too
many places, I chose SymbolBody as a unified name. I'll rename it Symbol
in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D39406
llvm-svn: 317006
DSO is short for dynamic shared object, so the function name was a
little confusing because it sounded like it didn't work when we were
a creating statically-linked executable or something.
What we mean by "DSO" here is the current output file that we are
creating. Thus the new name. Alternatively, we could call it the current
ELF module, but "module" is a overloaded word, so I avoided that.
llvm-svn: 316809
This moves reporting of garbage collected sections right after
we do GC. That simplifies things.
Differential revision: https://reviews.llvm.org/D39058
llvm-svn: 316759
This is "Bug 34836 - --gc-sections remove relocations from --emit-relocs",
When --emit-relocs is used, LLD currently always drops SHT_REL[A] sections from
output if --gc-sections is present. Patch fixes the issue.
Differential revision: https://reviews.llvm.org/D38724
llvm-svn: 316418
ScriptConfiguration was a class to contain parsed results of
linker scripts. LinkerScript is a class to interpret it.
That ditinction was needed because we haven't instantiated
LinkerScript early (because, IIRC, LinkerScript class was a
ELFT template function). So, when we parse linker scripts,
we couldn't directly store the result to a LinkerScript instance.
Now, that limitation is gone. We instantiate LinkerScript
at the very beginning of our main function. We can directly
store parse results to a LinkerScript instance.
llvm-svn: 315403
The condition whether a section is alive or not by default
is becoming increasingly complex, so the decision of garbage
collection is spreading over InputSection.h and MarkLive.cpp,
which is not a good state.
This moves the code to MarkLive.cpp, to keep the file the central
place to make decisions about garbage collection.
llvm-svn: 315384
Convert all common symbols to regular symbols after scan.
This means that the downstream code does not to handle common symbols as a special case.
Differential Revision: https://reviews.llvm.org/D38137
llvm-svn: 314495
EhSectionPiece used to have a pointer to a section, but that pointer was
mostly redundant because we almost always know what the section is without
using that pointer. This patch removes the pointer from the struct.
This patch also use uint32_t/int32_t instead of size_t to represent
offsets that are hardly be larger than 4 GiB. At the moment, I think it is
OK even if we cannot handle .eh_frame sections larger than 4 GiB.
Differential Revision: https://reviews.llvm.org/D38012
llvm-svn: 313697
ResolvedReloc struct is always passed to a callback function and not
stored anywhere. That is, in effect, we use the struct to pack multiple
arguments to one argument, which doesn't make much sense. This patch
removes the struct and passes the members to the callback directly.
llvm-svn: 310620
Liveness is usually a notion of input sections, but this patch adds
"liveness" bit to common symbols because they don't belong to any
input section.
This patch is based on https://reviews.llvm.org/D36520
Differential Revision: https://reviews.llvm.org/D36546
llvm-svn: 310617
This is necessary to ensure that sections containing symbols referenced
from linker scripts (e.g. in data commands) don't get GC'd.
Differential Revision: https://reviews.llvm.org/D34195
llvm-svn: 305452
We had a few Config member functions that returns configuration values.
For example, we had is64() which returns true if the target is 64-bit.
The return values of these functions are constant and never change.
This patch is to compute them only once to make it clear that they'll
never change.
llvm-svn: 298168
__start_xxx symbol keeps section xxx alive only if it is not
SHF_LINK_ORDER. Such sections can be used for user metadata, when
__start_xxx is used to iterate over section contents at runtime, and
the liveness is determined solely by the linked (associated) section.
This was earlier implemented in r294592, and broken in r296723.
Differential Revision: https://reviews.llvm.org/D30964
llvm-svn: 298154
With this we have a single section hierarchy. It is a bit less code,
but the main advantage will be in a future patch being able to handle
foo = symbol_in_obj;
in a linker script. Currently that fails since we try to find the
output section of symbol_in_obj. With this we should be able to just
return an InputSection from the expression.
llvm-svn: 297313