Commit Graph

242 Commits

Author SHA1 Message Date
Rui Ueyama 65813edfe2 COFF: Make symbols satisfy weak ordering.
Previously, SymbolBody::compare(A, B) didn't satisfy weak ordering.
There was a case that A < B and B < A could have been true.
This is because we just pick LHS if A and B are consisdered equivalent.

This patch is to make symbols being weakly ordered. If A and B are
not tie, one of A < B && B > A or A > B && B < A is true.
This is not an improtant property for a single-threaded environment
because everything is deterministic anyways. However, in a multi-
threaded environment, this property becomes important.

If a symbol is defined or lazy, ties are resolved by its file index.
For simple types that we don't really care about their identities,
symbols are compared by their addresses.

llvm-svn: 241294
2015-07-02 20:33:48 +00:00
Rui Ueyama 3d4c69c04d COFF: Resolve AlternateNames using weak aliases.
Previously, we use SymbolTable::rename to resolve AlternateName symbols.
This patch is to merge that mechanism with weak aliases, so that we
remove that function.

llvm-svn: 241230
2015-07-02 02:38:59 +00:00
Rui Ueyama 4897596728 COFF: Chagne weak alias' type from SymbolBody** to SymbolBody*. NFC.
llvm-svn: 241198
2015-07-01 22:32:23 +00:00
Rui Ueyama 8d3010a1a6 COFF: Change the order of adding symbols to the symbol table.
Previously, the order of adding symbols to the symbol table was simple.
We have a list of all input files. We read each file from beginning of
the list and add all symbols in it to the symbol table.

This patch changes that order. Now all archive files are added to the
symbol table first, and then all the other object files are added.
This shouldn't change the behavior in single-threading, and make room
to parallelize in multi-threading.

In the first step, only lazy symbols are added to the symbol table
because archives contain only Lazy symbols. Member object files
found to be necessary are queued. In the second step, defined and
undefined symbols are added from object files. Adding an undefined
symbol to the symbol table may cause more member files to be added
to the queue. We simply continue reading all object files until the
queue is empty.

Finally, new archive or object files may be added to the queues by
object files' directive sections (which contain new command line
options).

The above process is repeated until we get no new files.

Symbols defined both in object files and in archives can make results
undeterministic. If an archive is read before an object, a new member
file gets linked, while in the other way, no new file would be added.
That is the most popular cause of an undeterministic result or linking
failure as I observed. Separating phases of adding lazy symbols and
undefined symbols makes that deterministic. Adding symbols in each
phase should be parallelizable.

llvm-svn: 241107
2015-06-30 19:35:21 +00:00
Peter Collingbourne f7b27d15f2 COFF: Implement SymbolBody::getDebugName() for DefinedBitcode symbols.
Differential Revision: http://reviews.llvm.org/D10827

llvm-svn: 241029
2015-06-30 00:47:52 +00:00
Peter Collingbourne 79cfd437b5 COFF: Use LTOModule::getLinkerOpts() instead of reading the linker directives ourselves.
llvm-svn: 241020
2015-06-29 23:26:28 +00:00
Rui Ueyama dae1661436 COFF: Split ObjectFile::createSymbolBody into small functions. NFC.
llvm-svn: 241011
2015-06-29 22:16:21 +00:00
Chandler Carruth ee5bf526eb [cleanup] Clean up the flow of creating a symbol body for regular symbols.
This uses a single cast and test to get the section for the symbol, and
uses the cast_or_null<> pattern throughout to handle the known type but
unknown non-null-ness.

No functionality changed.

Differential Revision: http://reviews.llvm.org/D10791

llvm-svn: 241000
2015-06-29 21:32:37 +00:00
Chandler Carruth 52eb355765 [opt] Inline a trivial lookup function into the header.
This function is actually *very* hot. It is hard to see currently
because the call graph is very recursive, but I'm working to remove that
and when I do this function becomes significantly higher on the profile
(up to 5%!) and so worth avoiding the call overhead.

No specific perf gain I can measure yet (below the noise), but likely to
have more impact as we stop cluttering the call graph.

Differential Revision: http://reviews.llvm.org/D10788

llvm-svn: 240873
2015-06-27 03:40:10 +00:00
Rui Ueyama 605e1f6b6c COFF: Avoid vector reallocation. NFC.
llvm-svn: 240859
2015-06-26 23:51:45 +00:00
Rui Ueyama ccde19d77e COFF: Fix local absolute symbols.
Absolute symbols were always handled as external symbols, so if two
or more object files define the same absolute symbol, they would
conflict even if the symbol is private to each file.
This patch fixes that bug.

llvm-svn: 240756
2015-06-26 03:09:23 +00:00
Rui Ueyama a29948873f COFF: Don't read non-x64 object files.
Currently the new LLD supports only x86-64.

llvm-svn: 240749
2015-06-26 00:42:21 +00:00
Rui Ueyama 68633f1719 COFF: Better error message for duplicate symbols.
Now the symbol table prints out not only symbol names but
also file names for duplicate symbols.

llvm-svn: 240719
2015-06-25 23:22:00 +00:00
Rui Ueyama 9b921e5dc9 COFF: Merge DefinedRegular and DefinedCOMDAT.
I split them in r240319 because I thought they are different enough
that we should treat them as different types. It turned out that
that was not a good idea. They are so similar that we ended up having
many duplicate code.

llvm-svn: 240706
2015-06-25 22:00:42 +00:00
Rui Ueyama fc510f4cf8 COFF: Devirtualize mark(), markLive() and isCOMDAT().
Only SectionChunk can be dead-stripped. Previously,
all types of chunks implemented these functions,
but their functions were blank.

Likewise, only DefinedRegular and DefinedCOMDAT symbols
can be dead-stripped. markLive() function was implemented
for other symbol types, but they were blank.

I started thinking that the change I made in r240319 was
a mistake. I separated DefinedCOMDAT from DefinedRegular
because I thought that would make the code cleaner, but now
we want to handle them as the same type here. Maybe we
should roll it back.

This change should improve readability a bit as this removes
some dubious uses of reinterpret_cast. Previously, we
assumed that all COMDAT chunks are actually SectionChunks,
which was not very obvious.

llvm-svn: 240675
2015-06-25 19:10:58 +00:00
Rui Ueyama ddf71fc370 COFF: Initial implementation of Identical COMDAT Folding.
Identical COMDAT Folding (ICF) is an optimization to reduce binary
size by merging COMDAT sections that contain the same metadata,
actual data and relocations. MSVC link.exe and many other linkers
have this feature. LLD achieves on per with MSVC in terms produced
binary size with this patch.

This technique is pretty effective. For example, LLD's size is
reduced from 64MB to 54MB by enaling this optimization.

The algorithm implemented in this patch is extremely inefficient.
It puts all COMDAT sections into a set to identify duplicates.
Time to self-link with/without ICF are 3.3 and 320 seconds,
respectively. So this option roughly makes LLD 100x slower.
But it's okay as I wanted to achieve correctness first.
LLD is still able to link itself with this optimization.
I'm going to make it more efficient in followup patches.

Note that this optimization is *not* entirely safe. C/C++ require
different functions have different addresses. If your program
relies on that property, your program wouldn't work with ICF.
However, it's not going to be an issue on Windows because MSVC
link.exe turns ICF on by default. As long as your program works
with default settings (or not passing /opt:noicf), your program
would work with LLD too.

llvm-svn: 240519
2015-06-24 04:36:52 +00:00
Peter Collingbourne bd3a29d063 COFF: Remove unused field SectionChunk::SectionIndex.
llvm-svn: 240512
2015-06-24 00:12:36 +00:00
Peter Collingbourne c7b685d997 COFF: Ignore debug symbols.
Differential Revision: http://reviews.llvm.org/D10675

llvm-svn: 240487
2015-06-24 00:05:50 +00:00
Rui Ueyama 6a60be7749 COFF: Add names for logging/debugging to COMDAT chunks.
Chunks are basically unnamed chunks of bytes, and we don't like
to give them names. However, for logging or debugging, we want to
know symbols names of functions for COMDAT chunks. (For example,
we want to print out "we have removed unreferenced COMDAT section
which contains a function FOOBAR.")

This patch is to do that.

llvm-svn: 240484
2015-06-24 00:00:52 +00:00
Rui Ueyama 617f5ccb5c COFF: Separate DefinedCOMDAT from DefinedRegular symbol type. NFC.
Before this change, you got to cast a symbol to DefinedRegular and then
call isCOMDAT() to determine if a given symbol is a COMDAT symbol.
Now you can just use isa<DefinedCOMDAT>().

As to the class definition of DefinedCOMDAT, I could remove duplicate
code from DefinedRegular and DefinedCOMDAT by introducing another base
class for them, but I chose to not do that to keep the class hierarchy
shallow. This amount of code duplication doesn't worth to define a new
class.

llvm-svn: 240319
2015-06-22 19:56:01 +00:00
Rui Ueyama e3a335076a COFF: Combine add{Object,Archive,Bitcode,Import} functions. NFC.
llvm-svn: 240229
2015-06-20 23:10:05 +00:00
Rui Ueyama efb7e1aa29 COFF: Fix a common symbol bug.
This is a case that one mistake caused a very mysterious bug.
I made a mistake to calculate addresses of common symbols, so
each common symbol pointed not to the beginning of its location
but to the end of its location. (Ouch!)

Common symbols are aligned on 16 byte boundaries. If a common
symbol is small enough to fit between the end of its real
location and whatever comes next, this bug didn't cause any harm.

However, if a common symbol is larger than that, its memory
naturally overlapped with other symbols. That means some
uninitialized variables accidentally shared memory. Because
totally unrelated memory writes mutated other varaibles, it was
hard to debug.

It's surprising that LLD was able to link itself and all LLD
tests except gunit tests passed with this nasty bug.

With this fix, the new COFF linker is able to pass all tests
for LLVM, Clang and LLD if I use MSVC cl.exe as a compiler.
Only three tests are failing when used with clang-cl.

llvm-svn: 240216
2015-06-20 07:21:57 +00:00
Rui Ueyama 29792a82a9 COFF: Cache Archive::Symbol::getName(). NFC.
getName() does strlen() on the symbol table, so it's not very fast.
It's not as bad as r239332 because the number of symbols exported
from archive files are fewer than object files, and they are usually
shorter, though.

llvm-svn: 240178
2015-06-19 21:25:44 +00:00
Rui Ueyama 223fe1b9e7 COFF: Fix unsafe memory access.
llvm-svn: 240046
2015-06-18 20:29:41 +00:00
Rui Ueyama ea63a28364 COFF: Handle both / and \ as path separator.
llvm-svn: 240042
2015-06-18 20:16:26 +00:00
Peter Collingbourne 8b2492f2a0 COFF: Implement DLL symbol exports for bitcode files.
Differential Revision: http://reviews.llvm.org/D10530

llvm-svn: 239994
2015-06-18 05:22:15 +00:00
Peter Collingbourne 1b6fd1f5fd COFF: Symbol resolution for common and comdat symbols defined in bitcode.
In the case where either a bitcode file and a regular file or two bitcode
files export a common or comdat symbol with the same name, the linker needs
to pick one of them following COFF semantics. This patch implements a design
for resolving such symbols that pushes most of the work onto either LLD's
regular mechanism for resolving common or comdat symbols or the IR linker's
mechanism for doing the same.

We modify SymbolBody::compare to always prefer non-bitcode symbols, so that
during the initial phase of symbol resolution, the symbol table always contains
a regular symbol in any case where we need to choose between a regular and
a bitcode symbol. In SymbolTable::addCombinedLTOObject, we force export
any bitcode symbols that were initially pre-empted by a regular symbol,
and later use SymbolBody::compare to choose between the regular symbol in
the symbol table and the regular symbol from the combined LTO object file.

This design seems to be sound, so long as the resolution mechanism is defined
to be commutative and associative modulo arbitrary choices between symbols
(which seems to be the case for COFF).

Differential Revision: http://reviews.llvm.org/D10329

llvm-svn: 239563
2015-06-11 21:49:54 +00:00
Peter Collingbourne df637ea289 COFF: Skip internal symbols in bitcode files.
Differential Revision: http://reviews.llvm.org/D10319

llvm-svn: 239338
2015-06-08 20:21:28 +00:00
Rui Ueyama 57fe78d339 COFF: Read symbol names lazily.
This change seems to make the linker about 10% faster.
Reading symbol name is not very cheap because it needs strlen()
on the string table. We were wasting time on reading non-external
symbol names that would never be used by the linker.

llvm-svn: 239332
2015-06-08 19:43:59 +00:00
Rui Ueyama 80141a4bcd COFF: Check for auxiliary symbol's type.
We forgot to check for auxiliary symbol's type. So we sometimes read
garbage as associative section definitions.

Associative sections are considered as not live themselves by the
garbage collector because they are live only when associaited sections
are live.

By reading more data (or garbage) as associative section definitions,
we treated more sections as non-GC-roots, that caused the linker to
discard too many sections by mistake. That caused another mysterious
bug (such as some global constructors don't run at all for some reason.)

llvm-svn: 239287
2015-06-08 05:00:42 +00:00
Rui Ueyama b4f791b510 COFF: Fix memory leak.
llvm-svn: 239272
2015-06-08 00:09:25 +00:00
Peter Collingbourne ace2f091fd COFF: Read linker directives from bitcode files.
Differential Revision: http://reviews.llvm.org/D10285

llvm-svn: 239212
2015-06-06 02:00:45 +00:00
Rui Ueyama 1db1ef9ab4 Use reinterpret_cast instead of const_cast and C-style cast.
llvm-svn: 238786
2015-06-01 21:49:21 +00:00
Rui Ueyama 81b030cbf6 COFF: Remove BitcodeFile::BitcodeFile(StringRef Filename).
In r238690, I made all files have only MemoryBufferRefs. This change
is to do the same thing for the bitcode file reader. Also updated
a few variable names to match with other code.

llvm-svn: 238782
2015-06-01 21:19:43 +00:00
Rui Ueyama fd99e01b91 COFF: Support import-by-ordinal DLL imports.
Symbols exported by DLLs can be imported not by name but by
small number or ordinal. Usually, symbols have both ordinals
and names, and in that case ordinals are called "hints" and
used by the loader as hints.

However, symbols can have only ordinals. They are called
import-by-ordinal symbols. You need to manage ordinals by hand
so that they will never change if you choose to use the feature.
But it's supposed to make dynamic linking faster because
it needs no string comparison. Not sure if that claim still
stands in year 2015, though. Anyways, the feature exists,
and this patch implements that.

llvm-svn: 238780
2015-06-01 21:05:27 +00:00
Peter Collingbourne 60c1616613 COFF: Initial implementation of link-time optimization.
This implementation is known to work in very simple cases (see new test case).

Differential Revision: http://reviews.llvm.org/D10115

llvm-svn: 238777
2015-06-01 20:10:10 +00:00
Denis Protivensky 6833690402 COFF: Fix warnings found by gcc
llvm-svn: 238734
2015-06-01 09:26:32 +00:00
Rui Ueyama 8fd9fb9857 COFF: Define an error category for the linker.
Instead of returning non-categorized errors, return categorized errors.
All uses of make_dynamic_error_code are removed.

Because we don't have error reporting mechanism, I just chose to print out
error messages to stderr, and then return an error object. Not sure if
that's the right thing to do, but at least it seems practical.

http://reviews.llvm.org/D10129

llvm-svn: 238714
2015-06-01 02:58:15 +00:00
Rui Ueyama d7c2f5847a COFF: Make the Driver own all MemoryBuffers. NFC.
Previously, a MemoryBuffer of a file was owned by each InputFile object.
This patch makes the Driver own all of them. InputFiles now have only
MemoryBufferRefs. This change simplifies ownership managment
(particularly for ObjectFile -- the object owned a MemoryBuffer only when
it's not created from an archive file, because in that case a parent
archive file owned the entire buffer. Now it owns nothing unconditionally.)

llvm-svn: 238690
2015-05-31 21:04:56 +00:00
Rui Ueyama c9bfe32010 COFF: Fill imort table HintName field.
Currently we set the field to zero, but as per the spec, we should
set numbers we read from import library files. The loader uses the
values as starting offsets for binary search when looking up imported
symbols from DLL.

llvm-svn: 238562
2015-05-29 15:45:35 +00:00
Rui Ueyama d52824d361 Rename InputFile::Name -> InputFile::Filename.
Other local variables shadowed the member variable.
Rename to make that a bit longer.

llvm-svn: 238478
2015-05-28 20:16:25 +00:00
Rui Ueyama 411c636081 COFF: Add a new PE/COFF port.
This is an initial patch for a section-based COFF linker.

The patch has 2300 lines of code including comments and blank lines.
Before diving into details, you want to start from reading README
because it should give you an overview of the design.

All important things are written in the README file, so I write
summary here.

- The linker is already able to self-link on Windows.

- It's significantly faster than the existing implementation.
  The existing one takes 5 seconds to link LLD on my machine,
  while the new one only takes 1.2 seconds, even though the new
  one is not multi-threaded yet. (And a proof-of-concept multi-
  threaded version was able to link it in 0.5 seconds.)

- It uses much less memory (250MB vs. 2GB virtual memory space
  to self-host).

- IMHO the new code is much simpler and easier to read than
  the existing PE/COFF port.

http://reviews.llvm.org/D10036

llvm-svn: 238458
2015-05-28 19:09:30 +00:00