llvm-project/lld/COFF
Rui Ueyama 8b33f59bfd COFF: De-virtualize and inline garbage collector functions.
isRoot, isLive and markLive functions are called very frequently.
Previously, they were virtual functions. This patch make them
non-virtual.

Also this patch checks chunk liveness before calling its mark().
Previously, we did that at beginning of markLive(), so the virtual
function would return immediately if it's live. That was inefficient.

llvm-svn: 239458
2015-06-10 04:21:47 +00:00
..
CMakeLists.txt COFF: Implement /lib using LibDriver. 2015-06-09 21:52:48 +00:00
Chunks.cpp COFF: De-virtualize and inline garbage collector functions. 2015-06-10 04:21:47 +00:00
Chunks.h COFF: De-virtualize and inline garbage collector functions. 2015-06-10 04:21:47 +00:00
Config.h COFF: Add /opt:noref option. 2015-06-07 03:17:42 +00:00
DLL.cpp COFF: Use named constants instead of sizeof(). 2015-06-07 22:00:28 +00:00
DLL.h COFF: Move Windows-specific code from Chunk.{cpp,h} to DLL.{cpp,h}. 2015-06-07 01:15:04 +00:00
Driver.cpp COFF: Implement /lib using LibDriver. 2015-06-09 21:52:48 +00:00
Driver.h COFF: Support resonpse files. 2015-06-07 02:55:19 +00:00
DriverUtils.cpp COFF: Simplify. NFC. 2015-06-07 23:02:50 +00:00
Error.h COFF: Better noexcept specification with LLVM_NOEXCEPT 2015-06-01 09:08:11 +00:00
InputFiles.cpp COFF: Skip internal symbols in bitcode files. 2015-06-08 20:21:28 +00:00
InputFiles.h COFF: Read symbol names lazily. 2015-06-08 19:43:59 +00:00
Memory.h COFF: Refactor functions to find files from search paths. 2015-05-31 19:17:12 +00:00
Options.td
README.md COFF: Add a glossary to README. 2015-06-07 22:42:52 +00:00
SymbolTable.cpp COFF: Split SymbolTable::addCombinedLTOObject. NFC. 2015-06-09 17:52:17 +00:00
SymbolTable.h COFF: Split SymbolTable::addCombinedLTOObject. NFC. 2015-06-09 17:52:17 +00:00
Symbols.cpp COFF: Update comment. 2015-06-09 16:52:56 +00:00
Symbols.h COFF: Read symbol names lazily. 2015-06-08 19:43:59 +00:00
Writer.cpp COFF: Avoid callign stable_sort. 2015-06-08 08:26:28 +00:00
Writer.h COFF: Move Windows-specific code from Chunk.{cpp,h} to DLL.{cpp,h}. 2015-06-07 01:15:04 +00:00

README.md

The New PE/COFF Linker

This directory contains an experimental linker for the PE/COFF file format. Because the fundamental design of this port is different from the other ports of LLD, this port is separated to this directory.

The other ports are based on the Atom model, in which symbols and references are represented as vertices and edges of graphs. We don't use that model to aim for performance and simplicity. Our plan is to implement a linker for the PE/COFF format based on a different idea, and then apply the same idea to the ELF if proved to be effective.

Overall Design

This is a list of important data types in this linker.

  • SymbolBody

    SymbolBody is a class for symbols, which may be created for symbols in object files or in archive file headers. The linker may create them out of nothing.

    There are mainly three types of SymbolBodies: Defined, Undefined, or Lazy. Defined symbols are for all symbols that are considered as "resolved", including real defined symbols, COMDAT symbols, common symbols, absolute symbols, linker-created symbols, etc. Undefined symbols are for undefined symbols, which need to be replaced by Defined symbols by the resolver. Lazy symbols represent symbols we found in archive file headers -- which can turn into Defined symbols if we read archieve members, but we haven't done that yet.

  • Symbol

    Symbol is a pointer to a SymbolBody. There's only one Symbol for each unique symbol name (this uniqueness is guaranteed by the symbol table). Because SymbolBodies are created for each file independently, there can be many SymbolBodies for the same name. Thus, the relationship between Symbols and SymbolBodies is 1:N.

    The resolver keeps the Symbol's pointer to always point to the "best" SymbolBody. Pointer mutation is the resolve operation in this linker.

    SymbolBodies have pointers to their Symbols. That means you can always find the best SymbolBody from any SymbolBody by following pointers twice. This structure makes it very easy to find replacements for symbols. For example, if you have an Undefined SymbolBody, you can find a Defined SymbolBody for that symbol just by going to its Symbol and then to SymbolBody, assuming the resolver have successfully resolved all undefined symbols.

  • Chunk

    Chunk represents a chunk of data that will occupy space in an output. They may be backed by sections of input files, but can be created for something different, if they are for common or BSS symbols. The linker may also create chunks out of nothing to append additional data to an output.

    Chunks know about their size, how to copy their data to mmap'ed outputs, and how to apply relocations to them. Specifically, section-based chunks know how to read relocation tables and how to apply them.

  • SymbolTable

    SymbolTable is basically a hash table from strings to Symbols, with a logic to resolve symbol conflicts. It resolves conflicts by symbol type. For example, if we add Undefined and Defined symbols, the symbol table will keep the latter. If we add Undefined and Lazy symbols, it will keep the latter. If we add Lazy and Undefined, it will keep the former, but it will also trigger the Lazy symbol to load the archive member to actually resolve the symbol.

  • OutputSection

    OutputSection is a container of Chunks. A Chunk belongs to at most one OutputSection.

There are mainly three actors in this linker.

  • InputFile

    InputFile is a superclass for file readers. We have a different subclass for each input file type, such as regular object file, archive file, etc. They are responsible for creating and owning SymbolBodies and Chunks.

  • Writer

    The writer is responsible for writing file headers and Chunks to a file. It creates OutputSections, put all Chunks into them, assign unique, non-overlapping addresses and file offsets to them, and then write them down to a file.

  • Driver

    The linking process is drived by the driver. The driver

    • processes command line options,
    • creates a symbol table,
    • creates an InputFile for each input file and put all symbols in it into the symbol table,
    • checks if there's no remaining undefined symbols,
    • creates a writer,
    • and passes the symbol table to the writer to write the result to a file.

Performance

Currently it's able to self-host on the Windows platform. It takes 1.2 seconds to self-host on my Xeon 2580 machine, while the existing Atom-based linker takes 5 seconds to self-host. We believe the performance difference comes from simplification and optimizations we made to the new port. Notable differences are listed below.

  • Reduced number of relocation table reads

    In the existing design, relocation tables are read from beginning to construct graphs because they consist of graph edges. In the new design, they are not read until we actually apply relocations.

    This simplification has two benefits. One is that we don't create additional objects for relocations but instead consume relocation tables directly. The other is that it reduces number of relocation entries we have to read, because we won't read relocations for dead-stripped COMDAT sections. Large C++ programs tend to consist of lots of COMDAT sections. In the existing design, the time to process relocation table is linear to size of input. In this new model, it's linear to size of output.

  • Reduced number of symbol table lookup

    Symbol table lookup can be a heavy operation because number of symbols can be very large and each symbol name can be very long (think of C++ mangled symbols -- time to compute a hash value for a string is linear to the length.)

    We look up the symbol table exactly only once for each symbol in the new design. This is I believe the minimum possible number. This is achieved by the separation of Symbol and SymbolBody. Once you get a pointer to a Symbol by looking up the symbol table, you can always get the latest symbol resolution result by just dereferencing a pointer. (I'm not sure if the idea is new to the linker. At least, all other linkers I've investigated so far seem to look up hash tables or sets more than once for each new symbol, but I may be wrong.)

  • Reduced number of file visits

    The symbol table implements the Windows linker semantics. We treat the symbol table as a bucket of all known symbols, including symbols in archive file headers. We put all symbols into one bucket as we visit new files. That means we visit each file only once.

    This is different from the Unix linker semantics, in which we only keep undefined symbols and visit each file one by one until we resolve all undefined symbols. In the Unix model, we have to visit archive files many times if there are circular dependencies between archives.

  • Avoiding creating additional objects or copying data

    The data structures described in the previous section are all thin wrappers for classes that LLVM libObject provides. We avoid copying data from libObject's objects to our objects. We read much less data than before. For example, we don't read symbol values until we apply relocations because these values are not relevant to symbol resolution. Again, COMDAT symbols may be discarded during symbol resolution, so reading their attributes too early could result in a waste. We use underlying objects directly where doing so makes sense.

Parallelism

The abovementioned data structures are also chosen with multi-threading in mind. It should relatively be easy to make the symbol table a concurrent hash map, so that we let multiple workers work on symbol table concurrently. Symbol resolution in this design is a single pointer mutation, which allows the resolver work concurrently in a lock-free manner using atomic pointer compare-and-swap.

It should also be easy to apply relocations and write chunks concurrently.

We created an experimental multi-threaded linker using the Microsoft ConcRT concurrency library, and it was able to link itself in 0.5 seconds, so we think the design is promising.

Glossary

  • RVA

    Short for Relative Virtual Address.

    Windows executables or DLLs are not position-independent; they are linked against a fixed address called an image base. RVAs are offsets from an image base.

    Default image bases are 0x140000000 for executables and 0x18000000 for DLLs. For example, when we are creating an executable, we assume that the executable will be loaded at address 0x140000000 by the loader, so we apply relocations accordingly. Result texts and data will contain raw absolute addresses.

  • VA

    Short for Virtual Address. Equivalent to RVA + image base. It is rarely used. We almost always use RVAs instead.

  • Base relocations

    Relocation information for the loader. If the loader decides to map an executable or a DLL to a different address than their image bases, it fixes up binaries using information contained in the base relocation table. A base relocation table consists of a list of locations containing addresses. The loader adds a difference between RVA and actual load address to all locations listed there.

    Note 1: This run-time relocation mechanism is very simple compared to ELF. There's no PLT or GOT. Images are relocated as a whole just by shifting entire images in memory by some offsets. Although doing this breaks text sharing, I think this mechanism is not actually bad on today's computers.

    Note 2: We do not support base relocations yet. But if you were wondering how Windows manages to load two images having conflicting addresses into the same memory space, this is how it works.