In many places we assumed that is64() means AMD64 and i386 otherwise.
This assumption is not sound because Windows also supports ARM.
The linker doesn't support ARM yet, but this is a first step.
llvm-svn: 243188
An object file compatible with Safe SEH contains a .sxdata section.
The section contains a list of symbol table indices, each of which
is an exception handler function. A safe SEH-enabled executable
contains a list of exception handler RVAs. So, what the linker has
to do to support Safe SEH is basically to read the .sxdata section,
interpret the contents as a list of symbol indices, unique-fy and
sort their RVAs, and then emit that list to .rdata. This patch
implements that feature.
llvm-svn: 243182
__ImageBase is a special symbol whose value is the image base address.
Previously, we handled __ImageBase symbol as an absolute symbol.
Absolute symbols point to specific locations in memory and the locations
never change even if an image is base-relocated. That means that we
don't have base relocation entries for absolute symbols.
This is not a case for __ImageBase. If an image is base-relocated, its
base address changes, and __ImageBase needs to be shifted as well.
So we have to have base relocations for __ImageBase. That means that
__ImageBase is not really an absolute symbol but a different kind of
symbol.
In this patch, I introduced a new type of symbol -- DefinedRelative.
DefinedRelative is similar to DefinedAbsolute, but it has not a VA but RVA
and is a subject of base relocation. Currently only __ImageBase is of
the new symbol type.
llvm-svn: 243176
If a symbol is exported as /export:foo, and foo is resolved as a
mangled name (_foo@<number> or ?foo@@Y...), that mangled name should
be written to the export table. Previously, we wrote the original
name to the export table.
llvm-svn: 242342
Entry name selection rule is already complicated on x64, but it's more
complicated on x86 because of the underscore name mangling scheme.
If one of _main, _main@<number> (a C function) or ?main@@... (a C++ function)
is defined, entry name is _mainCRTStartup. If _wmain, _wmain@<number or
?wmain@@... is defined, entry name is _wmainCRTStartup. And so on.
llvm-svn: 242110
If /delayload option is given, we have to resolve __delayLoadHelper2
since the function is the dynamic loader to delay-load DLLs.
The function name is mangled in x86 as ___delayLoadHelper2@8.
llvm-svn: 242078
Symbol names are usually mangled by appending "_" prefix on x86.
But the mangled name is not used in DLL export table. The export
table contains unmangled names.
llvm-svn: 241872
Previously, we infer machine type at the very end of linking after
all symbols are resolved. That's actually too late because machine
type affects how we mangle symbols (whether or not we need to
add "_").
For example, /entry:foo adds "_foo" to the symbol table if x86 but
"foo" if x64.
This patch moves the code to infer machine type, so that machine
type is inferred based on input files given via the command line
(but not based on .directives files).
llvm-svn: 241843
In the new design, mutation of Symbol pointers is the name resolution
operation. This patch makes them atomic pointers so that they can
be mutated by multiple threads safely. I'm going to use atomic
compare-exchange on these pointers.
dyn_cast<> doesn't recognize atomic pointers as pointers,
so we need to call load(). This is unfortunate, but in other places
automatic type conversion works fine.
llvm-svn: 241416
We were previously hitting assertion failures in the writer in cases where
a regular object file defined a weak external symbol that was defined by
a bitcode file. Because /export and /entry name mangling were implemented
using weak externals, the same problem affected mangled symbol names in
bitcode files.
The underlying cause of the problem was that weak external symbols were
being resolved before doing LTO, so the symbol table may have contained stale
references to bitcode symbols. The fix here is to defer weak external symbol
resolution until after LTO.
Also implement support for weak external symbols in bitcode files
by modelling them as replaceable DefinedBitcode symbols.
Differential Revision: http://reviews.llvm.org/D10940
llvm-svn: 241391
This change cut the link time of chrome.dll from 24 seconds
to 22 seconds (5% gain). When the control reaches end of link(),
all output files have already been written. All in-memory
objects can just vanish. There is no use to call their dtors.
llvm-svn: 241320
Previously, __ImageBase symbol got a different value than the one
specified by /base:<number> because the symbol was created in the
SymbolTable's constructor. When the constructor is called,
no command line options are processed yet, so the symbol was
created always with the initial value. This caused wrong relocations
and thus caused mysterious crashes of some executables linked by LLD.
llvm-svn: 241313
On Windows, we have four different main functions, {w,}{main,WinMain}.
The linker has to choose a corresponding entry point function among
{w,}{main,WinMain}CRTStartup. These entry point functions are defined
in the standard library. The linker resolves one of them by looking at
which main function is defined and adding a corresponding undefined
symbol to the symbol table.
Object files containing entry point functions conflicts each other.
For example, we cannot resolve both mainCRTStartup and WinMainCRTStartup
because other symbols defined in the files conflict.
Previously, we inferred CRT function name at the very end of name
resolution. I found that that is sometimes too late. If the linker
already linked one of these four archive member objects, it's too late
to change the decision.
The right thing to do here is to infer entry point name after adding
all symbols from command line files and before adding any other files
(which are specified by directive sections). This patch does that.
llvm-svn: 241236
Previously, we use SymbolTable::rename to resolve AlternateName symbols.
This patch is to merge that mechanism with weak aliases, so that we
remove that function.
llvm-svn: 241230
I think Undefined symbols are a bit more convenient than StringRefs
since SymbolBodies are handles for symbols. You can get resolved
symbols for undefined symbols just by calling getReplacmenet without
looking up the symbol table.
llvm-svn: 241214
Occasionally we have to resolve an undefined symbol to its
mangled symbol. Previously, we did that on calling side of
findMangle by explicitly updating SymbolBody.
In this patch, mangled symbols are handled as weak aliases
for undefined symbols.
llvm-svn: 241213
Previously, the order of adding symbols to the symbol table was simple.
We have a list of all input files. We read each file from beginning of
the list and add all symbols in it to the symbol table.
This patch changes that order. Now all archive files are added to the
symbol table first, and then all the other object files are added.
This shouldn't change the behavior in single-threading, and make room
to parallelize in multi-threading.
In the first step, only lazy symbols are added to the symbol table
because archives contain only Lazy symbols. Member object files
found to be necessary are queued. In the second step, defined and
undefined symbols are added from object files. Adding an undefined
symbol to the symbol table may cause more member files to be added
to the queue. We simply continue reading all object files until the
queue is empty.
Finally, new archive or object files may be added to the queues by
object files' directive sections (which contain new command line
options).
The above process is repeated until we get no new files.
Symbols defined both in object files and in archives can make results
undeterministic. If an archive is read before an object, a new member
file gets linked, while in the other way, no new file would be added.
That is the most popular cause of an undeterministic result or linking
failure as I observed. Separating phases of adding lazy symbols and
undefined symbols makes that deterministic. Adding symbols in each
phase should be parallelizable.
llvm-svn: 241107
Compilers recognize "main" function and don't mangle its name.
But if you use a different function as a user-defined entry name,
and if you didn't define that function with extern C, your entry
point function name is mangled. And the linker has to be able to
find that. This is relatively rare but can happen.
llvm-svn: 240953
Most build system depends on existence or time stamp of a file.
This patch is to create an empty file for /pdb:<filename> option
just to satisfy some build rules.
llvm-svn: 240948
The previous logic to find default entry name or subsystem does not
seem correct (i.e. was not compatible with MSVC linker). Previously,
default entry name was inferred from CRT functions and user-defined
entry functions. Subsystem was inferred from CRT functions.
Default entry name and subsystem are now inferred based on the
following table. Note that we no longer use CRT functions to infer
them.
Entry name Subsystem
main mainCRTStartup console
wmain wmainCRTStartup console
WinMain WinMainCRTStartup windows
wWinMain wWinMainCRTStartup windows
llvm-svn: 240922
Usually dllexported symbols are defined with 'extern "C"',
so identifying them is easy. We can just do hash table lookup
to look up exported symbols.
However, C++ non-member functions are also allowed to be exported,
and they can be specified with unmangled name. So, if /export:foo
is given, we need to look up not only "foo" but also its all
mangled names. In MSVC mangling scheme, that means that we need to
look up any symbol which starts with "?foo@@Y".
In this patch, we scan the entire symbol table to search for
a mangled symbol. The symbol table is a DenseMap, and that doesn't
support table lookup by string prefix. This is of course very
inefficient. But that should be probably OK because the user
should always add 'extern "C"' to dllexported symbols.
llvm-svn: 240919
This option is to ignore remaining undefined symbols and force
the linker to create an output file anyways.
The existing code assumes that there's no undefined symbol after
reportRemainingUndefines(). That assumption is legitimate.
I also don't want to mess up the existing code for this minor feature.
In order to keep it as is, remaining undefined symbols are replaced
with dummy defined symbols.
llvm-svn: 240913
This flag can be used to produce a map file, which is essentially a list
of objects linked into the final output file together with the RVAs of
their symbols. Because our format differs from MSVC's we expose it as a
separate flag.
Differential Revision: http://reviews.llvm.org/D10773
llvm-svn: 240812
We were resolving entry symbols and /include'd symbols after all other
symbols are resolved. But looks like it's too late. I found that it
causes some program to fail to link.
Let's say we have an object file A which defines symbols X and Y in an
archive. We also have another file B after A which defines X, Y and
_DLLMainCRTStartup in another archive. They conflict each other, so
either A or B can be linked.
If we have _DLLMainCRTStartup as an undefined symbol, file B is always
chosen. If not, there's a chance that A is chosen. If the linker
find it needs _DllMainCRTStartup after that, it's too late.
This patch adds undefined symbols to the symbol table as soon as
possible to fix the issue.
llvm-svn: 240757
ICF implemented in LLD is so experimental that we don't want to
enable that even if /opt:icf option is passed. I'll rename it back
once the feature is complete.
llvm-svn: 240721
Identical COMDAT Folding (ICF) is an optimization to reduce binary
size by merging COMDAT sections that contain the same metadata,
actual data and relocations. MSVC link.exe and many other linkers
have this feature. LLD achieves on per with MSVC in terms produced
binary size with this patch.
This technique is pretty effective. For example, LLD's size is
reduced from 64MB to 54MB by enaling this optimization.
The algorithm implemented in this patch is extremely inefficient.
It puts all COMDAT sections into a set to identify duplicates.
Time to self-link with/without ICF are 3.3 and 320 seconds,
respectively. So this option roughly makes LLD 100x slower.
But it's okay as I wanted to achieve correctness first.
LLD is still able to link itself with this optimization.
I'm going to make it more efficient in followup patches.
Note that this optimization is *not* entirely safe. C/C++ require
different functions have different addresses. If your program
relies on that property, your program wouldn't work with ICF.
However, it's not going to be an issue on Windows because MSVC
link.exe turns ICF on by default. As long as your program works
with default settings (or not passing /opt:noicf), your program
would work with LLD too.
llvm-svn: 240519
Previously, we added files in directive sections to the symbol
table as we read the sections, so the link order was depth-first.
That's not compatible with MSVC link.exe nor the old LLD.
This patch is to queue files so that new files are added to the
end of the queue and processed last. Now addFile() doesn't parse
files nor resolve symbols. You need to call run() to process
queued files.
llvm-svn: 240483
DLLs are usually resolved at process startup, but you can
delay-load them by passing /delayload option to the linker.
If a /delayload is specified, the linker has to create data
which is similar to regular import table.
One notable difference is that the pointers in a delay-load
import table are originally pointing to thunks that resolves
themselves. Each thunk loads a DLL, resolve its name, and then
overwrites the pointer with the result so that subsequent
function calls directly call a desired function. The linker
has to emit thunks.
llvm-svn: 240250
In this linker model, adding an undefined symbol may trigger chain
reactions. It may trigger a Lazy symbol to read a new file.
A new file may contain a directive section, which may contain various
command line options.
Previously, we didn't handle chain reactions well. We visited /include'd
symbols only once, so newly-added /include symbols were ignored.
This patch fixes that bug.
Now, the symbol table is versioned; every time the symbol table is
updated, the version number is incremented. We repeat adding undefined
symbols until the version number does not change. It is guaranteed to
converge -- the number of undefined symbol in the system is finite,
and adding the same undefined symbol more than once is basically no-op.
llvm-svn: 240177
Alternatename option is in the form of /alternatename:<from>=<to>.
It's effect is to resolve <from> as <to> if <from> is still undefined
at end of name resolution.
If <from> is not undefined but completely a new symbol, alternatename
shouldn't do anything. Previously, it introduced a new undefined
symbol for <from>, which resulted in undefined symbol error.
llvm-svn: 240161
We don't want to insert a new symbol to the symbol table while reading
a .drectve section because it's going to be too complicated.
That we are reading a directive section means that we are currently
reading some object file. Adding a new undefined symbol to the symbol
table can trigger a library file to read a new file, so it would make
the call stack too deep.
In this patch, I add new symbol names to a list to resolve them later.
llvm-svn: 240076
Alternatename option is in the form of /alternatename:<from>=<to>.
It is an error if there are two options having the same <from> but
different <to>. It is *not* an error if both are the same.
llvm-svn: 240075
We skip unknown options in the command line with a warning message
being printed out, but we shouldn't do that for .drectve section.
The section is not visible to the user. We should handle unknown
options as an error.
llvm-svn: 240067
The linker has to create an XML file for each executable.
This patch supports that feature.
You can optionally embed an XML file to an executable as .rsrc
section. If you choose to do that (by passing /manifest:embed
option), the linker has to create a textual resource file
containing an XML file, compile that using rc.exe to a binary
resource file, conver that resource file to a COFF file using
cvtres.exe, and then link that COFF file. This patch implements
that feature too.
llvm-svn: 239978
On Windows, we have to create a .lib file for each .dll.
When linking against DLLs, the linker doesn't use the DLL files,
but instead read a list of dllexported symbols from corresponding
lib files.
A library file containing descriptors of a DLL is called an
import library file.
lib.exe has a feature to create an import library file from a
module-definition file. In this patch, we create a module-definition
file and pass that to lib.exe.
We eventually want to create an import library file by ourselves
to eliminate dependency to lib.exe. For now, we just use the MSVC
tool.
llvm-svn: 239937
Module-definition files (.def files) are yet another way to
specify parameters to the linker. You can write a list of dllexported
symbols in module-definition files instead of using /export command
line option. It also supports a few more directives.
The parser code is taken from lib/Driver/WinLinkModuleDef.cpp
with the following modifications.
- variable names are updated to comply with the LLVM coding style.
- Instead of returning parsing results as "directive" objects,
it updates Config object directly.
llvm-svn: 239929
DLL files are in the same format as executables but they have export tables.
The format of the export table is described in PE/COFF spec section 5.3.
A new class, EdataContents, takes care of creating chunks for export tables.
What we need to do is to parse command line flags for dllexports, and then
instantiate the class to create chunks. For the writer, export table chunks
are opaque data -- it just add chunks to .edata section.
llvm-svn: 239869
PE/COFF executables/DLLs usually contain data which is called
base relocations. Base relocations are a list of addresses that
need to be fixed by the loader if load-time relocation is needed.
Base relocations are in .reloc section.
We emit one base relocation entry for each IMAGE_REL_AMD64_ADDR64
relocation.
In order to save disk space, base relocations are grouped by page.
Each group is called a block. A block starts with a 32-bit page
address followed by 16-bit offsets in the page. That is more
efficient representation of addresses than just an array of 32-bit
addresses.
llvm-svn: 239710
Resource files are data files containing i18n messages, icon images, etc.
MSVC has a tool to convert a resource file to a regular COFF file so that
you can just link that file to embed resources to an executable.
However, you can directly pass resource files to the linker. If you do that,
the linker invokes the tool automatically. This patch implements that feature.
llvm-svn: 239704
Not only entry point symbol but also symbols specified by /include
option must be preserved, as they will never be dead-stripped.
http://reviews.llvm.org/D10220
llvm-svn: 239005
In r238690, I made all files have only MemoryBufferRefs. This change
is to do the same thing for the bitcode file reader. Also updated
a few variable names to match with other code.
llvm-svn: 238782
Instead of returning non-categorized errors, return categorized errors.
All uses of make_dynamic_error_code are removed.
Because we don't have error reporting mechanism, I just chose to print out
error messages to stderr, and then return an error object. Not sure if
that's the right thing to do, but at least it seems practical.
http://reviews.llvm.org/D10129
llvm-svn: 238714
Previously, this feature was implemented using a special type of
undefined symbol, in addition to an intricate way to make the resolver
read a virtual file containing that renaming symbols.
Now the feature is directly handled by the symbol table.
The symbol table has a function, rename(), to rename symbols, whose
definition is 4 lines long. Symbol renaming is naturally modeled using
Symbol and SymbolBody.
llvm-svn: 238696
Previously, a MemoryBuffer of a file was owned by each InputFile object.
This patch makes the Driver own all of them. InputFiles now have only
MemoryBufferRefs. This change simplifies ownership managment
(particularly for ObjectFile -- the object owned a MemoryBuffer only when
it's not created from an archive file, because in that case a parent
archive file owned the entire buffer. Now it owns nothing unconditionally.)
llvm-svn: 238690
It does not involve notions of virtual archives or virtual files,
nor store a list of undefined symbols somewhere else to consume them later.
We did that before. In this patch, undefined symbols are just added to
the symbol table, which now can be done in very few lines of code.
llvm-svn: 238681
Previously the main linker routine is just a non-member function.
We store some context information to the Config object.
This patch makes it belong to Driver.
llvm-svn: 238677
`main` is not the only main function in Windows. You can choose one
from these four -- {w,}{WinMain,main}. There are four different entry
point functions for them, {w,}{WinMain,main}CRTStartup, respectively.
The linker needs to choose the right one depending on which `main`
function is defined.
llvm-svn: 238667
The previous implementation's driver file is cluttered by lots of
small functions, and it was hard to find important functions.
Make a separate file to prevent that issue.
llvm-svn: 238482
This is an initial patch for a section-based COFF linker.
The patch has 2300 lines of code including comments and blank lines.
Before diving into details, you want to start from reading README
because it should give you an overview of the design.
All important things are written in the README file, so I write
summary here.
- The linker is already able to self-link on Windows.
- It's significantly faster than the existing implementation.
The existing one takes 5 seconds to link LLD on my machine,
while the new one only takes 1.2 seconds, even though the new
one is not multi-threaded yet. (And a proof-of-concept multi-
threaded version was able to link it in 0.5 seconds.)
- It uses much less memory (250MB vs. 2GB virtual memory space
to self-host).
- IMHO the new code is much simpler and easier to read than
the existing PE/COFF port.
http://reviews.llvm.org/D10036
llvm-svn: 238458