My recent commits separated symbol resolution from the symbol table,
so the functions to resolve symbols are now in a somewhat wrong file.
This patch moves it to Symbols.cpp.
The functions are now member functions of the symbol.
This is code move change. I modified function names so that they are
appropriate as member functions, though. No functionality change
intended.
Differential Revision: https://reviews.llvm.org/D62290
llvm-svn: 361474
--{start,end}-lib give files grouped by the options the archive file
semantics. That is, each object file between them acts as if it were
in an archive file whose sole member is the file.
Therefore, files between --{start,end}-lib are linked to the final
output only if they are needed to resolve some undefined symbols.
Previously, the feature was implemented this way:
1. We read a symbol table and insert defined symbols to the symbol
table as lazy symbols.
2. If an undefind symbol is resolved to a lazy symbol, that lazy
symbol instantiate ObjFile class for that symbol, which re-insert
all defined symbols to the symbol table.
So, if an ObjFile is instantiated, defined symbols are inserted to the
symbol table twice. Since inserting long symbol names is not cheap,
there's a room to optimize here.
This patch optimzies it. Now, LazyObjFile remembers symbol handles and
passed them over to a new ObjFile instance, so that the ObjFile
doesn't insert the same strings.
Here is a quick benchmark to link clang. "Original" is the original
lld with unmodified command line options. For "Case 1" and "Case 2", I
extracted all files from archive files and replace .a's in a command
line with .o's wrapped with --{start,end}-lib. I used the original lld
for Case 1" and use this patch for Case 2.
Original: 5.892
Case 1: 6.001 (+1.8%)
Case 2: 5.701 (-3.2%)
So, interestingly, --{start,end}-lib are now faster than the regular
linking scheme with archive files. That's perhaps not too surprising,
though, because for regular archive files, we look up the symbol table
with the same string twice.
Differential Revision: https://reviews.llvm.org/D62188
llvm-svn: 361473
Rather than report "undefined symbol: ", give more informative message
about the object file that defines the discarded section.
In particular, PR41133, if the section is a discarded COMDAT, print the
section group signature and the object file with the prevailing
definition. This is useful to track down some ODR issues.
We need to
* add `uint32_t DiscardedSecIdx` to Undefined for this feature.
* make ComdatGroups public and change its type to DenseMap<CachedHashStringRef, const InputFile *>
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D59649
llvm-svn: 361359
This patch implements a limited form of autolinking primarily designed to allow
either the --dependent-library compiler option, or "comment lib" pragmas (
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in
C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically
add the specified library to the link when processing the input file generated
by the compiler.
Currently this extension is unique to LLVM and LLD. However, care has been taken
to design this feature so that it could be supported by other ELF linkers.
The design goals were to provide:
- A simple linking model for developers to reason about.
- The ability to to override autolinking from the linker command line.
- Source code compatibility, where possible, with "comment lib" pragmas in other
environments (MSVC in particular).
Dependent library support is implemented differently for ELF platforms than on
the other platforms. Primarily this difference is that on ELF we pass the
dependent library specifiers directly to the linker without manipulating them.
This is in contrast to other platforms where they are mapped to a specific
linker option by the compiler. This difference is a result of the greater
variety of ELF linkers and the fact that ELF linkers tend to handle libraries in
a more complicated fashion than on other platforms. This forces us to defer
handling the specifiers to the linker.
In order to achieve a level of source code compatibility with other platforms
we have restricted this feature to work with libraries that meet the following
"reasonable" requirements:
1. There are no competing defined symbols in a given set of libraries, or
if they exist, the program owner doesn't care which is linked to their
program.
2. There may be circular dependencies between libraries.
The binary representation is a mergeable string section (SHF_MERGE,
SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES
(0x6fff4c04). The compiler forms this section by concatenating the arguments of
the "comment lib" pragmas and --dependent-library options in the order they are
encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs
sections with the normal mergeable string section rules. As an example, #pragma
comment(lib, "foo") would result in:
.section ".deplibs","MS",@llvm_dependent_libraries,1
.asciz "foo"
For LTO, equivalent information to the contents of a the .deplibs section can be
retrieved by the LLD for bitcode input files.
LLD processes the dependent library specifiers in the following way:
1. Dependent libraries which are found from the specifiers in .deplibs sections
of relocatable object files are added when the linker decides to include that
file (which could itself be in a library) in the link. Dependent libraries
behave as if they were appended to the command line after all other options. As
a consequence the set of dependent libraries are searched last to resolve
symbols.
2. It is an error if a file cannot be found for a given specifier.
3. Any command line options in effect at the end of the command line parsing apply
to the dependent libraries, e.g. --whole-archive.
4. The linker tries to add a library or relocatable object file from each of the
strings in a .deplibs section by; first, handling the string as if it was
specified on the command line; second, by looking for the string in each of the
library search paths in turn; third, by looking for a lib<string>.a or
lib<string>.so (depending on the current mode of the linker) in each of the
library search paths.
5. A new command line option --no-dependent-libraries tells LLD to ignore the
dependent libraries.
Rationale for the above points:
1. Adding the dependent libraries last makes the process simple to understand
from a developers perspective. All linkers are able to implement this scheme.
2. Error-ing for libraries that are not found seems like better behavior than
failing the link during symbol resolution.
3. It seems useful for the user to be able to apply command line options which
will affect all of the dependent libraries. There is a potential problem of
surprise for developers, who might not realize that these options would apply
to these "invisible" input files; however, despite the potential for surprise,
this is easy for developers to reason about and gives developers the control
that they may require.
4. This algorithm takes into account all of the different ways that ELF linkers
find input files. The different search methods are tried by the linker in most
obvious to least obvious order.
5. I considered adding finer grained control over which dependent libraries were
ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this
is not necessary: if finer control is required developers can fall back to using
the command line directly.
RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html.
Differential Revision: https://reviews.llvm.org/D60274
llvm-svn: 360984
This is the last patch of the series of patches to make it possible to
resolve symbols without asking SymbolTable to do so.
The main point of this patch is the introduction of
`elf::resolveSymbol(Symbol *Old, Symbol *New)`. That function resolves
or merges given symbols by examining symbol types and call
replaceSymbol (which memcpy's New to Old) if necessary.
With the new function, we have now separated symbol resolution from
symbol lookup. If you already have a Symbol pointer, you can directly
resolve the symbol without asking SymbolTable to do that.
Now that the nice abstraction become available, I can start working on
performance improvement of the linker. As a starter, I'm thinking of
making --{start,end}-lib faster.
--{start,end}-lib is currently unnecessarily slow because it looks up
the symbol table twice for each symbol.
- The first hash table lookup/insertion occurs when we instantiate a
LazyObject file to insert LazyObject symbols.
- The second hash table lookup/insertion occurs when we create an
ObjFile from LazyObject file. That overwrites LazyObject symbols
with Defined symbols.
I think it is not too hard to see how we can now eliminate the second
hash table lookup. We can keep LazyObject symbols in Step 1, and then
call elf::resolveSymbol() to do Step 2.
Differential Revision: https://reviews.llvm.org/D61898
llvm-svn: 360975
Module IDs can appear in diagnostic messages.
This patch adds some auxiliary symbols to improve their readability.
Differential Revision: https://reviews.llvm.org/D61857
llvm-svn: 360858
Previously, we handled common symbols as a kind of Defined symbol,
but what we were doing for common symbols is pretty different from
regular defined symbols.
Common symbol and defined symbol are probably as different as shared
symbol and defined symbols are different.
This patch introduces CommonSymbol to represent common symbols.
After symbols are resolved, they are converted to Defined symbols
residing in a .bss section.
Differential Revision: https://reviews.llvm.org/D61895
llvm-svn: 360841
SymbolTable's add-family functions have lots of parameters because
when they have to create a new symbol, they forward given arguments
to Symbol's constructors. Therefore, the functions take at least as
many arguments as their corresponding constructors.
This patch simplifies the add-family functions. Now, the functions
take a symbol instead of arguments to construct a symbol. If there's
no existing symbol, a given symbol is memcpy'ed to the symbol table.
Otherwise, the functions attempt to merge the existing and a given
new symbol.
I also eliminated `CanOmitFromDynSym` parameter, so that the functions
take really one argument.
Symbol classes are trivially constructible, so looks like constructing
them to pass to add-family functions is as cheap as passing a lot of
arguments to the functions. A quick benchmark showed that this patch
seems performance-neutral.
This is a preparation for
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131902.html
Differential Revision: https://reviews.llvm.org/D61855
llvm-svn: 360838
The symbol table used to be a container of vectors of input files,
but that's no longer the case because the vectors are moved out of
SymbolTable and are now global variables.
Therefore, addFile doesn't have to belong to any class. This patch
moves the function out of the class.
This patch is a preparation for my RFC [1].
[1] http://lists.llvm.org/pipermail/llvm-dev/2019-April/131902.html
Differential Revision: https://reviews.llvm.org/D61854
llvm-svn: 360666
For partitions I intend to use the same set of version indexes in
each partition for simplicity. Since each partition will need its own
VersionNeedSection this will require moving the verneed tracking out of
VersionNeedSection. The way I've done this is to move most of the tracking
into SharedFile. What will eventually become the per-partition tracking
still lives in VersionNeedSection.
As a bonus the code gets a little simpler and more consistent with how we
handle verdef.
Differential Revision: https://reviews.llvm.org/D60307
llvm-svn: 357926
That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703
"wrong line number info for obj file compiled with -ffunction-sections"
bug. The problem happened with only .o files. If object file contains
several .text sections then line number information showed incorrectly.
The reason for this is that DwarfLineTable could not detect section which
corresponds to specified address(because address is the local to the
section). And as the result it could not select proper sequence in the
line table. The fix is to pass SectionIndex with the address. So that it
would be possible to differentiate addresses from various sections. With
this fix llvm-objdump shows correct line numbers for disassembled code.
Differential review: https://reviews.llvm.org/D58194
llvm-svn: 354972
Summary:
In ld.bfd/gold, --no-allow-shlib-undefined is the default when linking
an executable. This patch implements a check to error on undefined
symbols in a shared object, if all of its DT_NEEDED entries are seen.
Our approach resembles the one used in gold, achieves a good balance to
be useful but not too smart (ld.bfd traces all DSOs and emulates the
behavior of a dynamic linker to catch more cases).
The error is issued based on the symbol table, different from undefined
reference errors issued for relocations. It is most effective when there
are DSOs that were not linked with -z defs (e.g. when static sanitizers
runtime is used).
gold has a comment that some system libraries on GNU/Linux may have
spurious undefined references and thus system libraries should be
excluded (https://sourceware.org/bugzilla/show_bug.cgi?id=6811). The
story may have changed now but we make --allow-shlib-undefined the
default for now. Its interaction with -shared can be discussed in the
future.
Reviewers: ruiu, grimar, pcc, espindola
Reviewed By: ruiu
Subscribers: joerg, emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D57385
llvm-svn: 352826
Summary:
lld discards .gnu.linonce.* sections work around a bug in glibc.
https://sourceware.org/bugzilla/show_bug.cgi?id=20543
Unfortunately, the Linux kernel uses a section named
.gnu.linkonce.this_module to store infomation about kernel modules. The
kernel reads data from this section when loading kernel modules, and
errors if it fails to find this section. The current behavior of lld
discards this section when kernel modules are linked, so kernel modules
linked with lld are unloadable by the linux kernel.
The Linux kernel should use a comdat section instead of .gnu.linkonce.
The minimum version of binutils supported by the kernel supports comdat
sections. The kernel is also not relying on the old linkonce behavior;
it seems to have chosen a name that contains a deprecated GNU feature.
Changing the section name now in the kernel would require all kernel
modules to be recompiled to make use of the new section name. Instead,
rather than discarding .gnu.linkonce.*, let's discard the more specific
section name to continue working around the glibc issue while supporting
linking Linux kernel modules.
Link: https://github.com/ClangBuiltLinux/linux/issues/329
Reviewers: pcc, espindola
Reviewed By: pcc
Subscribers: nathanchance, emaste, arichardson, void, srhines
Differential Revision: https://reviews.llvm.org/D57294
llvm-svn: 352302
This does *not* implement full SHT_GROUP semantic, yet it is a simple step forward:
Sections within a group are still considered valid, but they do not behave as
specified by the standard in case of garbage collection.
Differential Revision: https://reviews.llvm.org/D56437
llvm-svn: 352068
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Patch by Michael Skvortsov!
This change adds a basic support for linking static MSP430 ELF code.
Implemented relocation types are intended to correspond to the BFD.
Differential revision: https://reviews.llvm.org/D56535
llvm-svn: 350819
This is https://bugs.llvm.org/show_bug.cgi?id=39493.
We crashed previously because did not handle /DISCARD/ properly
when -r was used. I think it is uncommon to use scripts with -r, though I see
nothing wrong to handle the /DISCARD/ so that we will not crash at least.
Differential revision: https://reviews.llvm.org/D53864
llvm-svn: 345819
Previously, we uncompress all compressed sections before doing anything.
That works, and that is conceptually simple, but that could results in
a waste of CPU time and memory if uncompressed sections are then
discarded or just copied to the output buffer.
In particular, if .debug_gnu_pub{names,types} are compressed and if no
-gdb-index option is given, we wasted CPU and memory because we
uncompress them into newly allocated bufers and then memcpy the buffers
to the output buffer. That temporary buffer was redundant.
This patch changes how to uncompress sections. Now, compressed sections
are uncompressed lazily. To do that, `Data` member of `InputSectionBase`
is now hidden from outside, and `data()` accessor automatically expands
an compressed buffer if necessary.
If no one calls `data()`, then `writeTo()` directly uncompresses
compressed data into the output buffer. That eliminates the redundant
memory allocation and redundant memcpy.
This patch significantly reduces memory consumption (20 GiB max RSS to
15 Gib) for an executable whose .debug_gnu_pub{names,types} are in total
5 GiB in an uncompressed form.
Differential Revision: https://reviews.llvm.org/D52917
llvm-svn: 343979
r320770 made LLD handle invalid DSOs where local symbols were found in
the global part of the symbol table. Unfortunately, it didn't handle the
case where those local symbols were also undefined, and r326242 exposed
an assertion failure in that case. Just warn on that case instead of
crashing, by moving the local binding check before the undefined symbol
addition.
The input file for the test is crafted by hand, since I don't know of
any tool that would produce such a broken DSO. I also don't understand
what it even means for a symbol to be undefined but have STB_LOCAL
binding - I don't think that combination makes any sense - but we have
found broken DSOs of this nature that we were linking against. I've
included detailed instructions on how to produce the DSO in the test.
Differential Revision: https://reviews.llvm.org/D52815
llvm-svn: 343745
This uses the call graph profile embedded in the object files to construct the call graph.
This is read from a SHT_LLVM_CALL_GRAPH_PROFILE (0x6fff4c02) section as (uint32_t, uint32_t, uint64_t) tuples as (from symbol index, to symbol index, weight).
Differential Revision: https://reviews.llvm.org/D45850
llvm-svn: 343552
Previously, if you invoke lld's `main` more than once in the same process,
the second invocation could fail or produce a wrong result due to a stale
pointer values of the previous run.
Differential Revision: https://reviews.llvm.org/D52506
llvm-svn: 343009
Summary: This protects lld from a null pointer dereference when a faulty input file has such invalid sh_link fields.
Reviewers: ruiu, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D51743
llvm-svn: 341611
Summary:
For -thinlto-object-suffix-replace=old\;new, in
tools/gold/gold-plugin.cpp, the thinlto object filename is Path minus
optional old suffix.
static std::string getThinLTOObjectFileName(StringRef Path, StringRef OldSuffix,
StringRef NewSuffix) {
if (OldSuffix.empty() && NewSuffix.empty())
return Path;
StringRef NewPath = Path;
NewPath.consume_back(OldSuffix);
std::string NewNewPath = NewPath;
NewNewPath += NewSuffix;
return NewNewPath;
}
Currently lld will error that the path does not end with old suffix.
This patch makes lld accept such paths but only add new suffix if Path
ends with old suffix. This fixes a link error where bitcode members in
an archive are regular LTO objects without old suffix.
Acording to tejohnson, this will "enable supporting mix and match of
minimized ThinLTO bitcode files with normal ThinLTO bitcode files in a
single link (where we want to apply the suffix replacement to the
minimized files, and just ignore it for the normal ThinLTO files)."
Reviewers: ruiu, pcc, tejohnson, espindola
Reviewed By: tejohnson
Subscribers: emaste, inglorion, arichardson, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51055
llvm-svn: 340364
Our code in LazyObjFile::parse() has an ELFT switch and
adds a lazy object by its ELFT kind.
Though it might be possible to add a file using a different
architecture and make LLD to silently accept it (if the file
is empty or contains only week symbols). That itself, not a
huge issue perhaps (because the error would be reported later
if the file is fetched), but still does not look clean and correct.
It is possible to report an error earlier and clean up the
code. That is what the patch does.
Ideally, we might want to reuse isCompatible from SymbolTable.cpp,
but it is static and accepts a file as an argument, what is not
convenient. Since such a situation should be rare, I think it
should be OK to go with the way chosen in this patch.
Differential revision: https://reviews.llvm.org/D50899
llvm-svn: 340257
This patch solves 2 problems:
1) It adds a test to check the line below:
https://github.com/llvm-mirror/lld/blob/master/ELF/InputFiles.cpp#L334
Test case contains SHT_GROUP section with a broken (0xFF) flag.
2) The patch fixes the case when we silently accepted such broken groups
in the case when there were no other objects with the same group signature.
llvm-svn: 339765
The Tag_ABI_VFP_args build attribute controls the procedure call standard
used for floating point parameters on ARM. The values are:
0 - Base AAPCS (FP Parameters passed in Core (Integer) registers
1 - VFP AAPCS (FP Parameters passed in FP registers)
2 - Toolchain specific (Neither Base or VFP)
3 - Compatible with all (No use of floating point parameters)
If the Tag_ABI_VFP_args build attribute is missing it has an implicit value
of 0.
We use the attribute in two ways:
- Detect a clash in calling convention between Base, VFP and Toolchain.
we follow ld.bfd's lead and do not error if there is a clash between an
implicit Base AAPCS caused by a missing attribute. Many projects
including the hard-float (VFP AAPCS) version of glibc contain assembler
files that do not use floating point but do not have Tag_ABI_VFP_args.
- Set the EF_ARM_ABI_FLOAT_SOFT or EF_ARM_ABI_FLOAT_HARD ELF header flag
for Base or VFP AAPCS respectively. This flag is used by some ELF
loaders.
References:
- Addenda to, and Errata in, the ABI for the ARM Architecture for
Tag_ABI_VFP_args
- Elf for the ARM Architecture for ELF header flags
Fixes PR36009
Differential Revision: https://reviews.llvm.org/D49993
llvm-svn: 338377