Commit Graph

69 Commits

Author SHA1 Message Date
Rui Ueyama 4dbff20c91 COFF: Fix bug that not all symbols were written to symtab if /opt:noref.
Only live symbols are written to the symbol table. Because isLive()
returned false if dead-stripping was disabled entirely, only
non-COMDAT sections were written to the symbol table. This patch fixes
the issue.

llvm-svn: 247856
2015-09-16 21:40:47 +00:00
Rui Ueyama 4bce7bcc88 COFF: Output messages for /verbose to stdout instead of stderr.
This patch also makes the message less verbose.

llvm-svn: 247853
2015-09-16 21:30:40 +00:00
Rui Ueyama 6a883b086e COFF: Improve debug helper function.
SectionChunk::getDebugName crashed if the symbol was a nullptr.

llvm-svn: 245677
2015-08-21 07:01:10 +00:00
Rafael Espindola 5c546a1437 COFF: In chunks, store the offset from the start of the output section. NFC.
This is more convenient than the offset from the start of the file as we
don't have to worry about it changing when we move the output section.

This is a port of r245008 from ELF.

llvm-svn: 245018
2015-08-14 03:30:59 +00:00
Rafael Espindola b835ae8e4a Port the error functions from ELF to COFF.
This has a few advantages

* Less C++ code (about 300 lines less).
* Less machine code (about 14 KB of text on a linux x86_64 build).
* It is more debugger friendly. Just set a breakpoint on the exit function and
  you get the complete lld stack trace of when the error was found.
* It is a more robust API. The errors are handled early and we don't get a
  std::error_code hot potato being passed around.
* In most cases the error function in a better position to print diagnostics
  (it has more context).

llvm-svn: 244215
2015-08-06 14:58:50 +00:00
Rui Ueyama 67fcd1a0c7 COFF: Fix bad #includes.
Writer.h is intended to be included only by Writer.cpp and Driver.cpp.
Use of the header in other files are bad.

llvm-svn: 244106
2015-08-05 19:51:28 +00:00
Rui Ueyama ba7c041f21 COFF: ARM: Implepment BLX23T relocation and fix Branch20T.
I fed the same test to MSVC linker and got the same output,
so I believe this implementation is correct.

llvm-svn: 244102
2015-08-05 19:40:07 +00:00
Rui Ueyama 7b276e2fb4 COFF: Move code for Identical COMDAT Folding to ICF.cpp.
llvm-svn: 243701
2015-07-30 22:57:21 +00:00
Rui Ueyama f69ecc1212 COFF: Handle all COMDAT sections as non-GC root.
I don't remember why I thought that only functions are subject
of garbage collection, but the comment here said so, which is
not correct. Moreover, the code just below the comment does not
do what the comment says -- it handles non-COMDAT, non-function
sections as GC root. As a result, it just handles non-COMDAT
sections as GC root.

This patch cleans that up by removing SectionChunk::isRoot and
use isCOMDAT instead.

llvm-svn: 243700
2015-07-30 22:48:45 +00:00
Rui Ueyama 8bc43a142b COFF: ARM: Fix relocations to thumb code.
Windows ARM is the thumb ARM environment, and pointers to thumb code
needs to have its LSB set. When we apply relocations, we need to
adjust the LSB if it points to an executable section.

llvm-svn: 243560
2015-07-29 19:25:00 +00:00
Rui Ueyama eb26e1d03c COFF: Fix SECREL and SECTION relocations.
SECREL should sets the 32-bit offset of the target from the beginning
of *target's* output section. Previously, the offset from the beginning
of source's output section was used instead.

SECTION means the target section's index, and not the source section's
index. This patch fixes that issue too.

llvm-svn: 243535
2015-07-29 16:30:45 +00:00
Rui Ueyama 5e706b3ee3 COFF: Use short identifiers. NFC.
llvm-svn: 243229
2015-07-25 21:54:50 +00:00
Rui Ueyama 3dd9372d2b COFF: ARM: Support import functions.
llvm-svn: 243205
2015-07-25 03:39:29 +00:00
Rui Ueyama cde7a7907e COFF: ARM: Implement BLX23T relocation.
llvm-svn: 243204
2015-07-25 03:25:28 +00:00
Rui Ueyama 3d9c8639c3 COFF: ARM: Implement BRANCH24T relocation.
llvm-svn: 243202
2015-07-25 03:19:34 +00:00
Rui Ueyama 237fca1451 COFF: ARM: Implement MOV32T relocation.
llvm-svn: 243201
2015-07-25 03:03:46 +00:00
Rui Ueyama 3afd5bfd7b COFF: Handle base relocation as a tuple of relocation type and RVA. NFC.
On x64 and x86, we use only one base relocation type, so we handled
base relocations just as a list of RVAs. That doesn't work well for
ARM becuase we have to handle two types of base relocations on ARM.
This patch changes the type of base relocation from uint32_t to
{reltype, uint32_t} to make it easy to port this code to ARM.

llvm-svn: 243197
2015-07-25 01:44:32 +00:00
Rui Ueyama 28df04211c COFF: Split ImportThunkChunk into x86 and x64. NFC.
This change should make it easy to port this code to ARM.

llvm-svn: 243195
2015-07-25 01:16:06 +00:00
Rui Ueyama 1c341a54de COFF: Do not align import thunks on 16-byte boundaries on x86.
Looks like MSVC linker aligns them only on x64.

llvm-svn: 243194
2015-07-25 01:16:04 +00:00
Rui Ueyama 35ccb0f7d4 COFF: Don't assume !is64() means i386.
In many places we assumed that is64() means AMD64 and i386 otherwise.
This assumption is not sound because Windows also supports ARM.
The linker doesn't support ARM yet, but this is a first step.

llvm-svn: 243188
2015-07-25 00:20:06 +00:00
Rui Ueyama cd3f99b6c5 COFF: Implement Safe SEH support for x86.
An object file compatible with Safe SEH contains a .sxdata section.
The section contains a list of symbol table indices, each of which
is an exception handler function. A safe SEH-enabled executable
contains a list of exception handler RVAs. So, what the linker has
to do to support Safe SEH is basically to read the .sxdata section,
interpret the contents as a list of symbol indices, unique-fy and
sort their RVAs, and then emit that list to .rdata. This patch
implements that feature.

llvm-svn: 243182
2015-07-24 23:51:14 +00:00
Rui Ueyama 2296dc137c COFF: Fix base relocation type for x86.
llvm-svn: 243178
2015-07-24 23:24:45 +00:00
Rui Ueyama 3cb895c930 COFF: Fix __ImageBase symbol relocation.
__ImageBase is a special symbol whose value is the image base address.
Previously, we handled __ImageBase symbol as an absolute symbol.

Absolute symbols point to specific locations in memory and the locations
never change even if an image is base-relocated. That means that we
don't have base relocation entries for absolute symbols.

This is not a case for __ImageBase. If an image is base-relocated, its
base address changes, and __ImageBase needs to be shifted as well.
So we have to have base relocations for __ImageBase. That means that
__ImageBase is not really an absolute symbol but a different kind of
symbol.

In this patch, I introduced a new type of symbol -- DefinedRelative.
DefinedRelative is similar to DefinedAbsolute, but it has not a VA but RVA
and is a subject of base relocation. Currently only __ImageBase is of
the new symbol type.

llvm-svn: 243176
2015-07-24 22:58:44 +00:00
Rui Ueyama 33fb2cb11b COFF: Fix base relocations for __imp_ symbols on x86.
Because thunks for dllimported symbols contain absolute addresses on x86,
they need to be relocated at load-time. This bug was a cause of crashes
in DLL initialization routines.

llvm-svn: 242259
2015-07-15 00:25:38 +00:00
Rui Ueyama c851ccc3bd COFF: Fix locally-imported symbol's base relocations.
Base relocations are RVA and not VA, so we shouldn't add ImageBase.

llvm-svn: 241883
2015-07-10 04:30:54 +00:00
Rui Ueyama d4b351f0de COFF: Fix locally-imported symbol's size for x86.
llvm-svn: 241860
2015-07-09 21:15:58 +00:00
Rui Ueyama 93b4571187 COFF: Implement base relocations for x86.
With this patch, LLD is now able to self-link an .exe file for x86
that runs correctly, although I don't think some headers (particularly
SEH) are not correct. DLL support is coming soon.

llvm-svn: 241857
2015-07-09 20:36:59 +00:00
Rui Ueyama 7c3e23fffd COFF: Fix import thunks and name mangling for x86.
With this patch, LLD is now able to correctly link a "hello world"
program written in assembly for 32-bit x86.

llvm-svn: 241771
2015-07-09 01:25:49 +00:00
David Majnemer 2c345a337c COFF: Emit a symbol table if /debug is specified
Providing a symbol table in the executable is quite useful when
debugging a fully-linked executable without having to reconstruct one
from DWARF.

Differential Revision: http://reviews.llvm.org/D11023

llvm-svn: 241689
2015-07-08 16:37:50 +00:00
Rui Ueyama 4e1536c155 COFF: Fix AMD64_SECTION relocation.
llvm-svn: 241658
2015-07-08 01:47:28 +00:00
Rui Ueyama 11863b4ae1 COFF: Support x86 file header and relocations.
llvm-svn: 241657
2015-07-08 01:45:29 +00:00
Rui Ueyama 661a4e7ab6 COFF: Split writeTo in preparation for supporting 32-bit x86.
llvm-svn: 241638
2015-07-07 22:49:21 +00:00
Rui Ueyama 7a333c66be COFF: Fix locally-imported symbols.
Previously, pointers pointed by locally-imported symbols were broken.
It has only 4 bytes although the correct size is 8 byte. This patch
fixes that bug.

llvm-svn: 241295
2015-07-02 20:33:50 +00:00
Rui Ueyama 0744e87fad COFF: Rename getReplacement -> repl.
The previous name was too long to my taste.

llvm-svn: 241215
2015-07-02 00:21:11 +00:00
Chandler Carruth 59013c387e [opt] Replace the recursive walk for GC with a worklist algorithm.
This flattens the entire liveness walk from a recursive mark approach to
a worklist approach. It also sinks the worklist management completely
out of the SectionChunk and into the Writer by exposing the ability to
iterato over children of a chunk and over the symbol bodies of relocated
symbols. I'm not 100% happy with the API names, so suggestions welcome
there.

This allows us to use a single worklist for the entire recursive walk
and would also be a natural place to take advantage of parallelism at
some future point.

With this, we completely inline away the GC walk into the
Writer::markLive function and it makes it very easy to profile what is
slow. Currently, time is being wasted checking whether a Chunk isa
SectionChunk (it essentially always is), finding (or skipping)
a replacement for a symbol, and chasing pointers between symbols and
their chunks. There are a bunch of things we can do to fix this, and its
easier to do them after this change IMO.

This change alone saves 1-2% of the time for my self-link of lld.exe
(which I'm running and benchmarking on Linux ironically).

Perhaps more notably, we'll no longer blow out the stack for large
links. =]

Just as an FYI, at this point, I/O is starting to really dominate the
profile. Well over 10% of the time appears to be inside the kernel doing
page table silliness. I think a decent chunk of this can be nuked as
well, but it's a little odd as cross-linking in this way isn't really
the primary goal here.

Differential Revision: http://reviews.llvm.org/D10790

llvm-svn: 240995
2015-06-29 21:12:49 +00:00
Chandler Carruth be6e80b012 [opt] Hoist the call throuh SymbolBody::getReplacement out of the inline
method to get a SymbolBody and into the callers, and kill now dead
includes.

This removes the need to have the SymbolBody definition when we're
defining the inline method and makes it a better inline method. That was
the only reason for a lot of header includes here. Removing these and
using forward declarations actually uncovers a bunch of cross-header
dependencies that I've fixed while I'm here, and will allow me to
introduce some *important* inline code into Chunks.h that requires the
definition of ObjectFile.

No functionality changed at this point.

Differential Revision: http://reviews.llvm.org/D10789

llvm-svn: 240982
2015-06-29 18:50:11 +00:00
Rui Ueyama 871847e32d COFF: Fix ICF correctness bug.
When comparing two COMDAT sections, we need to take section values
and associative sections into account. This patch fixes that bug.
It fixes a crash bug of llvm-tblgen when linked with /opt:lldicf.

One thing I don't understand yet is that this logic seems to be
too strict. MSVC linker is able to create more compact executables
(which of course work correctly). With this ICF algorithm, LLD is
able to make executable smaller, but the outputs are larger than
MSVC's. There must be something I'm missing here.

llvm-svn: 240897
2015-06-28 01:30:54 +00:00
Rui Ueyama 7383562bc9 COFF: Align DLL import thunks on 16-byte boundaries.
llvm-svn: 240806
2015-06-26 18:28:56 +00:00
Rui Ueyama 9b921e5dc9 COFF: Merge DefinedRegular and DefinedCOMDAT.
I split them in r240319 because I thought they are different enough
that we should treat them as different types. It turned out that
that was not a good idea. They are so similar that we ended up having
many duplicate code.

llvm-svn: 240706
2015-06-25 22:00:42 +00:00
Rui Ueyama fc510f4cf8 COFF: Devirtualize mark(), markLive() and isCOMDAT().
Only SectionChunk can be dead-stripped. Previously,
all types of chunks implemented these functions,
but their functions were blank.

Likewise, only DefinedRegular and DefinedCOMDAT symbols
can be dead-stripped. markLive() function was implemented
for other symbol types, but they were blank.

I started thinking that the change I made in r240319 was
a mistake. I separated DefinedCOMDAT from DefinedRegular
because I thought that would make the code cleaner, but now
we want to handle them as the same type here. Maybe we
should roll it back.

This change should improve readability a bit as this removes
some dubious uses of reinterpret_cast. Previously, we
assumed that all COMDAT chunks are actually SectionChunks,
which was not very obvious.

llvm-svn: 240675
2015-06-25 19:10:58 +00:00
Rui Ueyama f34c088515 COFF: Simplify. NFC.
llvm-svn: 240666
2015-06-25 17:56:36 +00:00
Rui Ueyama c6fcfbc98a COFF: Use std::equal to compare two lists of relocations.
llvm-svn: 240665
2015-06-25 17:51:07 +00:00
Rui Ueyama 02c302790f COFF: Don't use COFFHeader->NumberOfRelocations.
The size of the field is 16 bit, so it's inaccurate if the
number of relocations in a section is more than 65535.

llvm-svn: 240661
2015-06-25 17:43:37 +00:00
Rui Ueyama 88e0f9206b COFF: Fix a bug of __imp_ symbol.
The change I made in r240620 was not correct. If a symbol foo is
defined, and if you use __imp_foo, __imp_foo symbol is automatically
defined as a pointer (not just an alias) to foo.

Now that we need to create a chunk for automatically-created symbols.
I defined LocalImportChunk class for them.

llvm-svn: 240622
2015-06-25 03:31:47 +00:00
Rui Ueyama 42aa00b34b COFF: Use COFFObjectFile::getRelocations(). NFC.
llvm-svn: 240614
2015-06-25 00:33:38 +00:00
Rui Ueyama cde92423d7 COFF: Cache raw pointers to relocation tables.
Getting an iterator to the relocation table is very hot operation
in the linker. We do that not only to apply relocations but also
to mark live sections and to do ICF.

libObject's interface is slow. By caching pointers to the first
relocation table entries makes the linker 6% faster to self-link.

We probably need to fix libObject as well.

llvm-svn: 240603
2015-06-24 23:03:17 +00:00
Rui Ueyama ddf71fc370 COFF: Initial implementation of Identical COMDAT Folding.
Identical COMDAT Folding (ICF) is an optimization to reduce binary
size by merging COMDAT sections that contain the same metadata,
actual data and relocations. MSVC link.exe and many other linkers
have this feature. LLD achieves on per with MSVC in terms produced
binary size with this patch.

This technique is pretty effective. For example, LLD's size is
reduced from 64MB to 54MB by enaling this optimization.

The algorithm implemented in this patch is extremely inefficient.
It puts all COMDAT sections into a set to identify duplicates.
Time to self-link with/without ICF are 3.3 and 320 seconds,
respectively. So this option roughly makes LLD 100x slower.
But it's okay as I wanted to achieve correctness first.
LLD is still able to link itself with this optimization.
I'm going to make it more efficient in followup patches.

Note that this optimization is *not* entirely safe. C/C++ require
different functions have different addresses. If your program
relies on that property, your program wouldn't work with ICF.
However, it's not going to be an issue on Windows because MSVC
link.exe turns ICF on by default. As long as your program works
with default settings (or not passing /opt:noicf), your program
would work with LLD too.

llvm-svn: 240519
2015-06-24 04:36:52 +00:00
Peter Collingbourne bd3a29d063 COFF: Remove unused field SectionChunk::SectionIndex.
llvm-svn: 240512
2015-06-24 00:12:36 +00:00
Rui Ueyama 6a60be7749 COFF: Add names for logging/debugging to COMDAT chunks.
Chunks are basically unnamed chunks of bytes, and we don't like
to give them names. However, for logging or debugging, we want to
know symbols names of functions for COMDAT chunks. (For example,
we want to print out "we have removed unreferenced COMDAT section
which contains a function FOOBAR.")

This patch is to do that.

llvm-svn: 240484
2015-06-24 00:00:52 +00:00
Rui Ueyama 5e31d0b2e9 COFF: Fix common symbol alignment.
llvm-svn: 240217
2015-06-20 07:25:45 +00:00