Only live symbols are written to the symbol table. Because isLive()
returned false if dead-stripping was disabled entirely, only
non-COMDAT sections were written to the symbol table. This patch fixes
the issue.
llvm-svn: 247856
Symbol table is now populated correctly, but some fields are missing,
they'll be added in the future. This patch also adds --discard-all
flag, which was the default behavior until now.
Differential Revision: http://reviews.llvm.org/D12874
llvm-svn: 247849
This patch defines ICF class and defines ICF-related functions as
members of the class. By doing this we can move code that are
related only to ICF from SectionChunk to the newly-defined class.
This also eliminates a global variable "NextID".
llvm-svn: 247802
This is a patch to make LLD to be on par with MSVC in terms of ICF
effectiveness. MSVC produces a 27.14MB executable when linking LLD.
LLD previously produced a 27.61MB when self-linking. Now the size
is reduced to 27.11MB. Note that without ICF the size is 29.63MB.
In r247387, I implemented an algorithm that handles section graphs
as cyclic graphs and merge them using SCC. The algorithm did not
always work as intended as I demonstrated in r247721. The new
algortihm implemented in this patch is different from the previous
one. If you are interested the details, you want to read the file
comment of ICF.cpp.
llvm-svn: 247770
In this test, we have two functions, foo and bar. MSVC linker can
choose one and discard the other using ICF. LLD cannot. I add this
test as a TODO.
foo and bar are conceptually equivalent to the following:
void foo() { foo(); }
void bar() { foo(); }
foo and bar are effectively the same function. If foo and bar are
compiled to the same instructions, both their contents (foo and bar)
and relocation targets (foo) become the same, so from the ICF point
of view, they are reducible. But their graphs are not isomorphic!
LLD's ICF algorithm cannot handle this case yet.
llvm-svn: 247721
The first test (call) is now the only test to use .text, which makes it
resistant to more tests being added.
llvm-objdump -d already prints the addresses of symbols. We can use that in the
tests.
llvm-svn: 247685
They are not fully functional yet, but this implements enough support for lld
itself to read them.
With that, delete the .so binary we were using for tests and start eating our
own dog food.
llvm-svn: 247487
Previously, LLD's ICF couldn't merge cyclic graphs. That was unfortunate
because, in COFF, cyclic graphs are not exceptional at all. That is
pretty common.
In this patch, sections are grouped by Tarjan's strongly connected
component algorithm to get acyclic graphs. And then we try to merge
SCCs whose outdegree is zero, and remove them from the graph. This
makes other SCCs to have outdegree zero, so we can repeat the
process until all SCCs are removed. When comparing two SCCs, we handle
cycles properly.
This algorithm works better than previous one. Previously, self-linking
produced a 29.0MB executable. It now produces a 27.7MB. There's still some
gap compared to MSVC linker which produces a 27.1MB executable for the
same input. So the gap is narrowed, but still LLD is not on par with MSVC.
I'll investigate that later.
llvm-svn: 247387
For now it includes every symbol in the regular table. Since we don't
create dynamic relocations yet, we don't have a good way of knowing which
symbols are actually needed.
llvm-svn: 247365
With this patch we create a dynamic string table (it is allocated, unlike
the regular one) and the dynamic section has a DT_STRTAB pointing to it.
llvm-svn: 247155
Identical COMDAT Folding is a feature to merge COMDAT sections
by contents. Two sections are considered the same if their contents,
relocations, attributes, etc, are all the same.
An interesting fact is that MSVC linker takes "iterations" parameter
for ICF because the algorithm they are using is iterative. Merging
two sections could make more sections to be mergeable because
different relocations could now point to the same section. ICF is
repeated until we get a convergence (until no section can be merged).
This algorithm is not fast. Usually it needs three iterations until a
convergence is obtained.
In the new algorithm implemented in this patch, we consider sections
and relocations as a directed acyclic graph, and we try to merge
sections whose outdegree is zero. Sections with outdegree zero are then
removed from the graph, which makes other sections to have outdegree
zero. We repeat that until all sections are processed. In this
algorithm, we don't iterate over the same sections many times.
There's an apparent issue in the algorithm -- the section graph is
not guaranteed to be acyclic. It's actually pretty often cyclic.
So this algorithm cannot eliminate all possible duplicates.
That's OK for now because the previous algorithm was not able to
eliminate cycles too. I'll address the issue in a follow-up patch.
llvm-svn: 246878
Previously, we calculated our own hash values for section contents.
Of coruse that's slow because we had to access all bytes in sections.
Fortunately, COFF objects usually contain hash values for COMDAT
sections. We can use that to speed up Identical COMDAT Folding.
llvm-svn: 246869
The option is added in MSVC 2015, and there's no documentation about
what the option is. This patch is to ignore the option for now, so that
at least LLD is usable with MSVC 2015.
llvm-svn: 246780
There were at least two issues with having them together:
* For compatibility checks, we only want to look at the ELF kind.
* Adding support for shared libraries should introduce one InputFile kind,
not 4.
llvm-svn: 246707
I don't understand why the previous code is pretty flaky and
the new code is at least less flaky, but the original test
occasionally failed on the second run of lib.exe.
My guess was that lib.exe was failing because the output of
the echo command executed immediately before lib.exe was not
flushed to a file, but as far as I can say, the file
descriptor is properly closed in TestRunner.py, so this's
probably not correct. Other theory is that, on Windows, file
output is not guaranteed to be visible to other processes even
if a process flushes file descriptors, but I'd think that's
unlikely. So honestly I don't know the cause yet.
llvm-svn: 246621
This patch fixes a subtle incompatibility with MSVC linker.
MSVC linker preserves the original spelling of a DLL in the
import descriptor table. LLD previously converted all
characters to lowercase. Usually this difference is benign,
but if a program explicitly checks for DLL file names, the
program could fail.
llvm-svn: 246620
The ELF spec says:
... if any reference to or definition of a name is a symbol with a
non-default visibility attribute, the visibility attribute must be
propagated to the resolving symbol in the linked object. If different
visibility attributes are specified for distinct references to or
definitions of a symbol, the most constraining visibility attribute
must be propagated to the resolving symbol in the linked object. The
attributes, ordered from least to most constraining, are:
STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.
llvm-svn: 246603
In r246424, I made a change that disables non-DLL to export
symbols. It turned out that the change was not correct. Both
DLLs and executables are able to export symbols (although the
latter is relatively rare). This change restores the feature.
llvm-svn: 246537
I have totally no idea why, but MSVC linker is sensitive about
file names of archive members. If we do not make import library
file names to the same as the DLL name, MSVC link *crashes*
when it is processing the library file. This patch is to set
the same name.
llvm-svn: 246535
The rules for dllexported symbols are overly complicated due to
x86 name decoration, fuzzy symbol resolution, and the fact that
one symbol can be resolved by so many different names. The rules
are probably intended to be "intuitive", so that users don't have
to understand the name mangling schemes, but it seems that it can
lead to unintended symbol exports.
To make it clear what I'm trying to do with this patch, let me
write how the export rules are subtle and complicated.
- x86 name decoration: If machine type is i386 and export name
is given by a command line option, like /export:foo, the
real symbol name the linker has to search for is _foo because
all symbols are decorated with "_" prefixes. This doesn't happen
on non-x86 machines. This automatic name decoration happens only
when the name is not C++ mangled.
However, the symbol name exported from DLLs are ones without "_"
on all platforms.
Moreover, if the option is given via .drectve section, no
symbol decoration is done (the reason being that the .drectve
section is created by a compiler and the compiler should always
know the exact name of the symbol, I guess).
- Fuzzy symbol resolution: In addition to x86 name decoration,
the linker has to look for cdecl or C++ mangled symbols
for a given /export. For example, it searches for not only
_foo but also _foo@<number> or ??foo@... for /export:foo.
Previous implementation didn't get it right. I'm trying to make
it as compatible with MSVC linker as possible with this patch
however the rules are. The new code looks a bit messy to me, but
I don't think it can be simpler due to the ad-hoc-ness of the rules.
llvm-svn: 246424
It is currently failing with "'__uncaught_exception': identifier not found"
error. I guess it is due to r246219 because after that change, eh.h is
included only when threading is enabled.
llvm-svn: 246416
Now that we print a symbol table and all symbol kinds are at least declared,
we can test all combinations that don't produce an error.
This also includes a few fixes to keep the test passing:
* Keep the strong symbol in a weak X strong pair
* Handle common symbols.
The common X common case will be finished in a followup patch.
llvm-svn: 246401
This is exposed via a new flag /opt:lldltojobs=N, where N is the number of
code generation threads.
Differential Revision: http://reviews.llvm.org/D12309
llvm-svn: 246342
Now it is possible to have mips-linux-gnu-ld executable and link MIPS 64-bit
little-endian binaries providing -melf64ltsmip command line argument.
llvm-svn: 246335
lib.exe has a feature to create import library files (which contain
short import files) from module-definition files. Previously, we were
using that feature, but it turned out that the feature is not complete
for us.
There seems no way to specify "Import Types" in module-definition file.
lib.exe always adds "_" to given symbols and specify IMPORT_NAME_UNDECORATE.
We need more fine-grainded control on that value.
This patch teaches LLD to create short import files itself.
We are still using lib.exe, but the use of the tool is limited to create
empty import library files. We then create short import files and add them
to the empty files as new members.
This patch does not intend to change the functionality. LLD produces
the same import libraries as before. I'll make another change to create
different import libraries in a follow-up patch.
llvm-svn: 246292
InputFiles.h:98:53: error: invalid use of incomplete type ‘class lld::elf2::SymbolBody’
return SymbolBodies[SymbolIndex - FirstNonLocal]->getReplacement();
llvm-svn: 246262
This is a basic implementation that allows lld to emit binaries
consumable by the HSA runtime.
Differential Revision: http://reviews.llvm.org/D11267
llvm-svn: 246155
ICF is a feature to merge sections not by name (which is the regular
COMDAT merging) but by contents. If two or more sections have the
identical contents and relocations, ICF merges them to save space.
Accessors or templated functions tend to have the same contents, and
ICF can hold them.
If we consider sections as vertices and relocations as edges, the
problem is to find as many isomorphic graphs as possile from input
graphs. MSVC linker is smart enough to identify isomorphic graphs
even if they contain circles (GNU gold cannot handle circles
according to http://research.google.com/pubs/pub36912.html, so this
is impressive).
Circular references are not uncommon in COFF object files.
One example is .pdata. .pdata sections contain exception handler info
for functions, so they naturally have relocations for the functions.
The functions in turn have references to the .pdata sections so that
the functions and their .pdata are linked together. As a result, they
form circles.
This is a test case for circular graphs. LLD is not able to handle
this test case yet. I'll add code soon.
llvm-svn: 245827
The old test files were just compiler outputs, so it was hard to
debug if something goes wrong. The new test file is carefully
hand-crafted to trigger ICF to avoid that.
llvm-svn: 245826
__NULL_IMPORT_DESCRIPTOR is a symbol used by MSVC liner to construct
the import descriptor table. We do not use the symbol. Previously,
we had code to skip that symbol. That code does not actually do
anything meaningful because no one is referencing the symbol, the
symbol would naturally be ignored. This patch stops recognizing
the symbol.
llvm-svn: 245280
Previously, weak external symbols could reference only symbols that
appeared before them. Although that covers almost all use cases
of weak externals, there are object files out there which contains
weak externals that have forward references.
This patch supports such weak externals.
llvm-svn: 245258
There are some DLLs whose initializers depends on other DLLs'
initializers. The initialization order matters for them.
MSVC linker uses the order of the libraries from the command line.
LLD used ASCII-betical order. So they were incompatible.
This patch makes LLD compatible with MSVC.
llvm-svn: 245201
With this patch only the name is set. I will set the other fields shortly.
For now the table doesn't include local symbols. This is equivalent to using
--discard-all with gnu ld. This is OK for now since the symbols are not
needed for execution and for testing symbol resolution we only need the
global symbols.
llvm-svn: 245044
This is more convenient than the offset from the start of the file as we
don't have to worry about it changing when we move the output section.
This is a port of r245008 from ELF.
llvm-svn: 245018
With OutputSection being a virtual interface, each concrete OutputSection
handles only one type of chunk and we don't need a base Chunk class.
So for we have a class that handles input sections and one that handles
the string table, but this extends naturally for other outputs (symbol table,
merging of SHF_MERGE sections, etc.).
llvm-svn: 244972
We were creating the string table in a completely ad hoc way. Now the
string table is an output section and gets its output offset set just like
any other section.
This opens the way for other linker created sections like the symbol table.
llvm-svn: 244969
This is a bit more c++ code, but:
* It is less machine code: lld's text is 44688 bytes smaller.
* It should be a bit more efficient in the non native endian case.
* It should be a bit more efficient on architectures with slow unaligned access.
llvm-svn: 244934
Add PT_PHDR segment depending on its availability in linker script's
PHDRS command, fallback if no linker script is given.
Handle FILEHDR, PHDRS and FLAGS attributes of program header.
Differential Revision: http://reviews.llvm.org/D11589
llvm-svn: 244743
Sections must start at page boundaries in memory, but they
can be aligned to sector boundaries (512-bytes) on disk.
We aligned them to 4096-byte boundaries even on disk, so we
wasted disk space a bit.
llvm-svn: 244691
A global variable "Driver" is to re-entry to the driver to parse a
.drectve section. Because the need is COFF-specific, we don't need
this variable for ELF.
llvm-svn: 244680
MSVC 2015's load configuration object (__load_config_used) contains
references to these symbols. I don't fully understand how it works,
but looks like these symbols are linker-defined ones. So I define them
here in the Driver. With this patch, LLD can self-host with MSVC 2015.
This patch is to link MSVC 2015-produced object files. It does not
implement Control Flow Protection. If I understand correctly, the
linker has to create a bitmap of function entry point addresses for
the CFG runtime. We don't do that yet. Produced executables will not
be protected by CFG.
llvm-svn: 244425
SymbolTable::find(mangle(X)) is equivalent to SymbolTable::findUnderscore(X)
except that the latter is slightly efficient as that doesn't allocate a new
string.
llvm-svn: 244377