Self-hosting took 801 ms on my machine. Of which this function took
69 ms. Now it takes 37 ms. That is about 4% overall performance
improvement.
llvm-svn: 248052
Basically the concept of "liveness" is for sections (or chunks in LLD
terminology) and not for symbols. Symbols are always available or live,
or otherwise it indicates a link failure.
Previously, we had isLive() and markLive() methods for DefinedSymbol.
They are confusing methods. What they actually did is to act as a proxy
to backing section chunks. We can simplify eliminate these methods
and call section chunk's methods directly.
llvm-svn: 247869
Only live symbols are written to the symbol table. Because isLive()
returned false if dead-stripping was disabled entirely, only
non-COMDAT sections were written to the symbol table. This patch fixes
the issue.
llvm-svn: 247856
This patch fixes a subtle incompatibility with MSVC linker.
MSVC linker preserves the original spelling of a DLL in the
import descriptor table. LLD previously converted all
characters to lowercase. Usually this difference is benign,
but if a program explicitly checks for DLL file names, the
program could fail.
llvm-svn: 246620
There are some DLLs whose initializers depends on other DLLs'
initializers. The initialization order matters for them.
MSVC linker uses the order of the libraries from the command line.
LLD used ASCII-betical order. So they were incompatible.
This patch makes LLD compatible with MSVC.
llvm-svn: 245201
This is more convenient than the offset from the start of the file as we
don't have to worry about it changing when we move the output section.
This is a port of r245008 from ELF.
llvm-svn: 245018
Sections must start at page boundaries in memory, but they
can be aligned to sector boundaries (512-bytes) on disk.
We aligned them to 4096-byte boundaries even on disk, so we
wasted disk space a bit.
llvm-svn: 244691
This has a few advantages
* Less C++ code (about 300 lines less).
* Less machine code (about 14 KB of text on a linux x86_64 build).
* It is more debugger friendly. Just set a breakpoint on the exit function and
you get the complete lld stack trace of when the error was found.
* It is a more robust API. The errors are handled early and we don't get a
std::error_code hot potato being passed around.
* In most cases the error function in a better position to print diagnostics
(it has more context).
llvm-svn: 244215
Various parameters are passed implicitly using Config global variable
already. Output file path is no different from others, so there was no
special reason to handle that differnetly.
This patch changes the signature of writeResult(SymbolTable *, StringRef)
to writeResult(SymbolTable *).
llvm-svn: 244180
I don't remember why I thought that only functions are subject
of garbage collection, but the comment here said so, which is
not correct. Moreover, the code just below the comment does not
do what the comment says -- it handles non-COMDAT, non-function
sections as GC root. As a result, it just handles non-COMDAT
sections as GC root.
This patch cleans that up by removing SectionChunk::isRoot and
use isCOMDAT instead.
llvm-svn: 243700
We want to convince the NT loader not to map these sections into memory.
A good first step is to move them to the end of the executable.
Differential Revision: http://reviews.llvm.org/D11655
llvm-svn: 243680
Windows ARM is the thumb ARM environment, and pointers to thumb code
needs to have its LSB set. When we apply relocations, we need to
adjust the LSB if it points to an executable section.
llvm-svn: 243560
SECREL should sets the 32-bit offset of the target from the beginning
of *target's* output section. Previously, the offset from the beginning
of source's output section was used instead.
SECTION means the target section's index, and not the source section's
index. This patch fixes that issue too.
llvm-svn: 243535
Previously, we ignore /merge option if /debug is specified
because I thought that was MSVC linker did. This was wrong.
/merge shouldn't be ignored even in debug mode.
llvm-svn: 243375
On x64 and x86, we use only one base relocation type, so we handled
base relocations just as a list of RVAs. That doesn't work well for
ARM becuase we have to handle two types of base relocations on ARM.
This patch changes the type of base relocation from uint32_t to
{reltype, uint32_t} to make it easy to port this code to ARM.
llvm-svn: 243197
An object file compatible with Safe SEH contains a .sxdata section.
The section contains a list of symbol table indices, each of which
is an exception handler function. A safe SEH-enabled executable
contains a list of exception handler RVAs. So, what the linker has
to do to support Safe SEH is basically to read the .sxdata section,
interpret the contents as a list of symbol indices, unique-fy and
sort their RVAs, and then emit that list to .rdata. This patch
implements that feature.
llvm-svn: 243182
__ImageBase is a special symbol whose value is the image base address.
Previously, we handled __ImageBase symbol as an absolute symbol.
Absolute symbols point to specific locations in memory and the locations
never change even if an image is base-relocated. That means that we
don't have base relocation entries for absolute symbols.
This is not a case for __ImageBase. If an image is base-relocated, its
base address changes, and __ImageBase needs to be shifted as well.
So we have to have base relocations for __ImageBase. That means that
__ImageBase is not really an absolute symbol but a different kind of
symbol.
In this patch, I introduced a new type of symbol -- DefinedRelative.
DefinedRelative is similar to DefinedAbsolute, but it has not a VA but RVA
and is a subject of base relocation. Currently only __ImageBase is of
the new symbol type.
llvm-svn: 243176
Load Configuration field points to a structure containing information
for SEH. That data strucutre is not created by the linker but provided
by an external file. What we have to do is just to set __load_config_used
address to the header.
llvm-svn: 242427
If /delayload option is given, we have to resolve __delayLoadHelper2
since the function is the dynamic loader to delay-load DLLs.
The function name is mangled in x86 as ___delayLoadHelper2@8.
llvm-svn: 242078
Previously, we infer machine type at the very end of linking after
all symbols are resolved. That's actually too late because machine
type affects how we mangle symbols (whether or not we need to
add "_").
For example, /entry:foo adds "_foo" to the symbol table if x86 but
"foo" if x64.
This patch moves the code to infer machine type, so that machine
type is inferred based on input files given via the command line
(but not based on .directives files).
llvm-svn: 241843
Providing a symbol table in the executable is quite useful when
debugging a fully-linked executable without having to reconstruct one
from DWARF.
Differential Revision: http://reviews.llvm.org/D11023
llvm-svn: 241689
TLS table header field is supposed to have address and size of TLS table.
The linker doesn't have to understand what TLS table is. TLS table's name
is always "_tls_used", so if there's that symbol, the linker simply sets
that symbol's RVA to the header. The size of the TLS table is always 40 bytes.
llvm-svn: 241426
In the new design, mutation of Symbol pointers is the name resolution
operation. This patch makes them atomic pointers so that they can
be mutated by multiple threads safely. I'm going to use atomic
compare-exchange on these pointers.
dyn_cast<> doesn't recognize atomic pointers as pointers,
so we need to call load(). This is unfortunate, but in other places
automatic type conversion works fine.
llvm-svn: 241416
I think Undefined symbols are a bit more convenient than StringRefs
since SymbolBodies are handles for symbols. You can get resolved
symbols for undefined symbols just by calling getReplacmenet without
looking up the symbol table.
llvm-svn: 241214
Occasionally we have to resolve an undefined symbol to its
mangled symbol. Previously, we did that on calling side of
findMangle by explicitly updating SymbolBody.
In this patch, mangled symbols are handled as weak aliases
for undefined symbols.
llvm-svn: 241213
This flattens the entire liveness walk from a recursive mark approach to
a worklist approach. It also sinks the worklist management completely
out of the SectionChunk and into the Writer by exposing the ability to
iterato over children of a chunk and over the symbol bodies of relocated
symbols. I'm not 100% happy with the API names, so suggestions welcome
there.
This allows us to use a single worklist for the entire recursive walk
and would also be a natural place to take advantage of parallelism at
some future point.
With this, we completely inline away the GC walk into the
Writer::markLive function and it makes it very easy to profile what is
slow. Currently, time is being wasted checking whether a Chunk isa
SectionChunk (it essentially always is), finding (or skipping)
a replacement for a symbol, and chasing pointers between symbols and
their chunks. There are a bunch of things we can do to fix this, and its
easier to do them after this change IMO.
This change alone saves 1-2% of the time for my self-link of lld.exe
(which I'm running and benchmarking on Linux ironically).
Perhaps more notably, we'll no longer blow out the stack for large
links. =]
Just as an FYI, at this point, I/O is starting to really dominate the
profile. Well over 10% of the time appears to be inside the kernel doing
page table silliness. I think a decent chunk of this can be nuked as
well, but it's a little odd as cross-linking in this way isn't really
the primary goal here.
Differential Revision: http://reviews.llvm.org/D10790
llvm-svn: 240995
There were a few issues with the previous delay-import tables.
- "Attribute" field should have been 1 instead of 0.
(I don't know the meaning of this field, though.)
- LEA and CALL operands had wrong addresses.
- Address tables are in .didat (which is read-only).
They should have been in .data.
llvm-svn: 240837
I split them in r240319 because I thought they are different enough
that we should treat them as different types. It turned out that
that was not a good idea. They are so similar that we ended up having
many duplicate code.
llvm-svn: 240706
Only SectionChunk can be dead-stripped. Previously,
all types of chunks implemented these functions,
but their functions were blank.
Likewise, only DefinedRegular and DefinedCOMDAT symbols
can be dead-stripped. markLive() function was implemented
for other symbol types, but they were blank.
I started thinking that the change I made in r240319 was
a mistake. I separated DefinedCOMDAT from DefinedRegular
because I thought that would make the code cleaner, but now
we want to handle them as the same type here. Maybe we
should roll it back.
This change should improve readability a bit as this removes
some dubious uses of reinterpret_cast. Previously, we
assumed that all COMDAT chunks are actually SectionChunks,
which was not very obvious.
llvm-svn: 240675
The change I made in r240620 was not correct. If a symbol foo is
defined, and if you use __imp_foo, __imp_foo symbol is automatically
defined as a pointer (not just an alias) to foo.
Now that we need to create a chunk for automatically-created symbols.
I defined LocalImportChunk class for them.
llvm-svn: 240622
Identical COMDAT Folding (ICF) is an optimization to reduce binary
size by merging COMDAT sections that contain the same metadata,
actual data and relocations. MSVC link.exe and many other linkers
have this feature. LLD achieves on per with MSVC in terms produced
binary size with this patch.
This technique is pretty effective. For example, LLD's size is
reduced from 64MB to 54MB by enaling this optimization.
The algorithm implemented in this patch is extremely inefficient.
It puts all COMDAT sections into a set to identify duplicates.
Time to self-link with/without ICF are 3.3 and 320 seconds,
respectively. So this option roughly makes LLD 100x slower.
But it's okay as I wanted to achieve correctness first.
LLD is still able to link itself with this optimization.
I'm going to make it more efficient in followup patches.
Note that this optimization is *not* entirely safe. C/C++ require
different functions have different addresses. If your program
relies on that property, your program wouldn't work with ICF.
However, it's not going to be an issue on Windows because MSVC
link.exe turns ICF on by default. As long as your program works
with default settings (or not passing /opt:noicf), your program
would work with LLD too.
llvm-svn: 240519
Previously, we added files in directive sections to the symbol
table as we read the sections, so the link order was depth-first.
That's not compatible with MSVC link.exe nor the old LLD.
This patch is to queue files so that new files are added to the
end of the queue and processed last. Now addFile() doesn't parse
files nor resolve symbols. You need to call run() to process
queued files.
llvm-svn: 240483
DLLs are usually resolved at process startup, but you can
delay-load them by passing /delayload option to the linker.
If a /delayload is specified, the linker has to create data
which is similar to regular import table.
One notable difference is that the pointers in a delay-load
import table are originally pointing to thunks that resolves
themselves. Each thunk loads a DLL, resolve its name, and then
overwrites the pointer with the result so that subsequent
function calls directly call a desired function. The linker
has to emit thunks.
llvm-svn: 240250
.pdata section contains a list of triplets of function start address,
function end address and its unwind information. Linkers have to
sort section contents by function start address and set the section
address to the file header (so that runtime is able to find it and
do binary search.)
This change seems to resolve all but one remaining test failures in
check{,-clang,-lld} when building the entire stuff with clang-cl and
lld-link.
llvm-svn: 240231
DLL files are in the same format as executables but they have export tables.
The format of the export table is described in PE/COFF spec section 5.3.
A new class, EdataContents, takes care of creating chunks for export tables.
What we need to do is to parse command line flags for dllexports, and then
instantiate the class to create chunks. For the writer, export table chunks
are opaque data -- it just add chunks to .edata section.
llvm-svn: 239869
PE/COFF executables/DLLs usually contain data which is called
base relocations. Base relocations are a list of addresses that
need to be fixed by the loader if load-time relocation is needed.
Base relocations are in .reloc section.
We emit one base relocation entry for each IMAGE_REL_AMD64_ADDR64
relocation.
In order to save disk space, base relocations are grouped by page.
Each group is called a block. A block starts with a 32-bit page
address followed by 16-bit offsets in the page. That is more
efficient representation of addresses than just an array of 32-bit
addresses.
llvm-svn: 239710
When we add a chunk to an OutputSection, we always want to create
a backreference from an OutputSection to a Chunk. To make sure
we always do, do that in addChunk(). NFC.
llvm-svn: 239706
Resource files are data files containing i18n messages, icon images, etc.
MSVC has a tool to convert a resource file to a regular COFF file so that
you can just link that file to embed resources to an executable.
However, you can directly pass resource files to the linker. If you do that,
the linker invokes the tool automatically. This patch implements that feature.
llvm-svn: 239704
MSVC profiler reported that this stable_sort takes 7% time
when self-linking. As a result, createSection was taking 10% time.
Now createSection takes 3%. This small change actually makes
the linker a bit but perceptibly faster.
llvm-svn: 239292
Chunk has writeTo function which takes uint8_t *Buf.
writeHeaderTo feels more consistent with that because this member
function also takes uint8_t *Buf.
llvm-svn: 239236
Previously, half of the constructor for .idata contents was in Chunks.cpp
and the rest was in Writer.cpp. This patch moves the latter to Chunks.cpp.
Now IdataContents class manages everything for .idata section.
llvm-svn: 239230
In this design, Chunk is the only thing that knows how to write
its contents to output file as well as how to apply relocations
there. The writer shouldn't know about the details.
llvm-svn: 239216
Not only entry point symbol but also symbols specified by /include
option must be preserved, as they will never be dead-stripped.
http://reviews.llvm.org/D10220
llvm-svn: 239005
I'm adding ordinal-only (nameless) imports to the import table.
The chunk for that type is going to be different from LookupChunk.
Without this change, we cannot add objects of the new type to the
vectors.
llvm-svn: 238779
Instead of returning non-categorized errors, return categorized errors.
All uses of make_dynamic_error_code are removed.
Because we don't have error reporting mechanism, I just chose to print out
error messages to stderr, and then return an error object. Not sure if
that's the right thing to do, but at least it seems practical.
http://reviews.llvm.org/D10129
llvm-svn: 238714
Section names were truncated to 8 bytes because the section table's
name field is 8 byte long. This patch creates the string table to
store long names.
llvm-svn: 238661
The new mechanism is less code, and fixes the case where all inputs
are archives.
Differential Revision: http://reviews.llvm.org/D10136
llvm-svn: 238618
Previously Writer directly handles writes to a file.
Chunks needed to give Writer a continuous chunk of memory.
That was inefficent if you construct data in chunks because
it would require two memory copies (one to construct a chunk
and the other is to write that to a file).
This patch teaches chunk to write directly to a file.
From readability point of view, this is also good because
you no longer have to call hasData() before calling getData().
llvm-svn: 238464
This is an initial patch for a section-based COFF linker.
The patch has 2300 lines of code including comments and blank lines.
Before diving into details, you want to start from reading README
because it should give you an overview of the design.
All important things are written in the README file, so I write
summary here.
- The linker is already able to self-link on Windows.
- It's significantly faster than the existing implementation.
The existing one takes 5 seconds to link LLD on my machine,
while the new one only takes 1.2 seconds, even though the new
one is not multi-threaded yet. (And a proof-of-concept multi-
threaded version was able to link it in 0.5 seconds.)
- It uses much less memory (250MB vs. 2GB virtual memory space
to self-host).
- IMHO the new code is much simpler and easier to read than
the existing PE/COFF port.
http://reviews.llvm.org/D10036
llvm-svn: 238458