When comparing two COMDAT sections, we need to take section values
and associative sections into account. This patch fixes that bug.
It fixes a crash bug of llvm-tblgen when linked with /opt:lldicf.
One thing I don't understand yet is that this logic seems to be
too strict. MSVC linker is able to create more compact executables
(which of course work correctly). With this ICF algorithm, LLD is
able to make executable smaller, but the outputs are larger than
MSVC's. There must be something I'm missing here.
llvm-svn: 240897
This function is actually *very* hot. It is hard to see currently
because the call graph is very recursive, but I'm working to remove that
and when I do this function becomes significantly higher on the profile
(up to 5%!) and so worth avoiding the call overhead.
No specific perf gain I can measure yet (below the noise), but likely to
have more impact as we stop cluttering the call graph.
Differential Revision: http://reviews.llvm.org/D10788
llvm-svn: 240873
StringRefs. This uses the LLVM hashing rather than the standard library
and a closed addressed hash table rather than chaining.
This improves the Windows self-link of LLD by 4.4% (averaged over 10
runs, with well under 1% of variance on each).
There is still some room to improve here. Two things I clearly see in
the profile:
1) This is one of the biggest stress tests for the LLVM hashing code. It
actually consumes something like 3-4% of the link time after the
change.
2) The way that StringRef keys are handled in the DenseMap interface is
pretty suboptimal. We pay the price of checking for empty and
tombstone keys when we could only possibly be looking for a normal
key. But fixing this requires invasive API changes.
So there is still some headroom here.
Differential Revision: http://reviews.llvm.org/D10684
llvm-svn: 240871
There were a few issues with the previous delay-import tables.
- "Attribute" field should have been 1 instead of 0.
(I don't know the meaning of this field, though.)
- LEA and CALL operands had wrong addresses.
- Address tables are in .didat (which is read-only).
They should have been in .data.
llvm-svn: 240837
This flag can be used to produce a map file, which is essentially a list
of objects linked into the final output file together with the RVAs of
their symbols. Because our format differs from MSVC's we expose it as a
separate flag.
Differential Revision: http://reviews.llvm.org/D10773
llvm-svn: 240812
We were resolving entry symbols and /include'd symbols after all other
symbols are resolved. But looks like it's too late. I found that it
causes some program to fail to link.
Let's say we have an object file A which defines symbols X and Y in an
archive. We also have another file B after A which defines X, Y and
_DLLMainCRTStartup in another archive. They conflict each other, so
either A or B can be linked.
If we have _DLLMainCRTStartup as an undefined symbol, file B is always
chosen. If not, there's a chance that A is chosen. If the linker
find it needs _DllMainCRTStartup after that, it's too late.
This patch adds undefined symbols to the symbol table as soon as
possible to fix the issue.
llvm-svn: 240757
Absolute symbols were always handled as external symbols, so if two
or more object files define the same absolute symbol, they would
conflict even if the symbol is private to each file.
This patch fixes that bug.
llvm-svn: 240756
ICF implemented in LLD is so experimental that we don't want to
enable that even if /opt:icf option is passed. I'll rename it back
once the feature is complete.
llvm-svn: 240721
I split them in r240319 because I thought they are different enough
that we should treat them as different types. It turned out that
that was not a good idea. They are so similar that we ended up having
many duplicate code.
llvm-svn: 240706
Only SectionChunk can be dead-stripped. Previously,
all types of chunks implemented these functions,
but their functions were blank.
Likewise, only DefinedRegular and DefinedCOMDAT symbols
can be dead-stripped. markLive() function was implemented
for other symbol types, but they were blank.
I started thinking that the change I made in r240319 was
a mistake. I separated DefinedCOMDAT from DefinedRegular
because I thought that would make the code cleaner, but now
we want to handle them as the same type here. Maybe we
should roll it back.
This change should improve readability a bit as this removes
some dubious uses of reinterpret_cast. Previously, we
assumed that all COMDAT chunks are actually SectionChunks,
which was not very obvious.
llvm-svn: 240675
The change I made in r240620 was not correct. If a symbol foo is
defined, and if you use __imp_foo, __imp_foo symbol is automatically
defined as a pointer (not just an alias) to foo.
Now that we need to create a chunk for automatically-created symbols.
I defined LocalImportChunk class for them.
llvm-svn: 240622
MSVC linker is able to link an object file created from the following code.
Note that __imp_hello is not defined anywhere.
void hello() { printf("Hello\n"); }
extern void (*__imp_hello)();
int main() { __imp_hello(); }
Function symbols exported from DLLs are automatically mangled by appending
__imp_ prefix, so they have two names (original one and with the prefix).
This "feature" seems to simulate that behavior even for non-DLL symbols.
This is in my opnion very odd feature. Even MSVC linker warns if you use this.
I'm adding that anyway for the sake of compatibiltiy.
llvm-svn: 240620
Getting an iterator to the relocation table is very hot operation
in the linker. We do that not only to apply relocations but also
to mark live sections and to do ICF.
libObject's interface is slow. By caching pointers to the first
relocation table entries makes the linker 6% faster to self-link.
We probably need to fix libObject as well.
llvm-svn: 240603
Some compilers may not add the section symbol in '.symtab' for the
.init_array and 'ldd' just ignore it. It results in global constructor
not being called in final executable.
This patch add both '.init_array' and '.fini_array' to be added in
Atom graph generation even when the section contains no symbol. An
already existing testcase is modified to check for such scenario.
The issue fixes the llvm test-suite regressions for both Single
and MultiSource files.
llvm-svn: 240570
Identical COMDAT Folding (ICF) is an optimization to reduce binary
size by merging COMDAT sections that contain the same metadata,
actual data and relocations. MSVC link.exe and many other linkers
have this feature. LLD achieves on per with MSVC in terms produced
binary size with this patch.
This technique is pretty effective. For example, LLD's size is
reduced from 64MB to 54MB by enaling this optimization.
The algorithm implemented in this patch is extremely inefficient.
It puts all COMDAT sections into a set to identify duplicates.
Time to self-link with/without ICF are 3.3 and 320 seconds,
respectively. So this option roughly makes LLD 100x slower.
But it's okay as I wanted to achieve correctness first.
LLD is still able to link itself with this optimization.
I'm going to make it more efficient in followup patches.
Note that this optimization is *not* entirely safe. C/C++ require
different functions have different addresses. If your program
relies on that property, your program wouldn't work with ICF.
However, it's not going to be an issue on Windows because MSVC
link.exe turns ICF on by default. As long as your program works
with default settings (or not passing /opt:noicf), your program
would work with LLD too.
llvm-svn: 240519
Chunks are basically unnamed chunks of bytes, and we don't like
to give them names. However, for logging or debugging, we want to
know symbols names of functions for COMDAT chunks. (For example,
we want to print out "we have removed unreferenced COMDAT section
which contains a function FOOBAR.")
This patch is to do that.
llvm-svn: 240484
Previously, we added files in directive sections to the symbol
table as we read the sections, so the link order was depth-first.
That's not compatible with MSVC link.exe nor the old LLD.
This patch is to queue files so that new files are added to the
end of the queue and processed last. Now addFile() doesn't parse
files nor resolve symbols. You need to call run() to process
queued files.
llvm-svn: 240483
The ObjectFileYAML.roundTrip serializes a default-constructed
NormalizedFile to YAML, triggering uninitialized memory reads.
While there use in-class member initializers.
llvm-svn: 240446
Before this change, you got to cast a symbol to DefinedRegular and then
call isCOMDAT() to determine if a given symbol is a COMDAT symbol.
Now you can just use isa<DefinedCOMDAT>().
As to the class definition of DefinedCOMDAT, I could remove duplicate
code from DefinedRegular and DefinedCOMDAT by introducing another base
class for them, but I chose to not do that to keep the class hierarchy
shallow. This amount of code duplication doesn't worth to define a new
class.
llvm-svn: 240319
DLLs are usually resolved at process startup, but you can
delay-load them by passing /delayload option to the linker.
If a /delayload is specified, the linker has to create data
which is similar to regular import table.
One notable difference is that the pointers in a delay-load
import table are originally pointing to thunks that resolves
themselves. Each thunk loads a DLL, resolve its name, and then
overwrites the pointer with the result so that subsequent
function calls directly call a desired function. The linker
has to emit thunks.
llvm-svn: 240250
.pdata section contains a list of triplets of function start address,
function end address and its unwind information. Linkers have to
sort section contents by function start address and set the section
address to the file header (so that runtime is able to find it and
do binary search.)
This change seems to resolve all but one remaining test failures in
check{,-clang,-lld} when building the entire stuff with clang-cl and
lld-link.
llvm-svn: 240231
This is a case that one mistake caused a very mysterious bug.
I made a mistake to calculate addresses of common symbols, so
each common symbol pointed not to the beginning of its location
but to the end of its location. (Ouch!)
Common symbols are aligned on 16 byte boundaries. If a common
symbol is small enough to fit between the end of its real
location and whatever comes next, this bug didn't cause any harm.
However, if a common symbol is larger than that, its memory
naturally overlapped with other symbols. That means some
uninitialized variables accidentally shared memory. Because
totally unrelated memory writes mutated other varaibles, it was
hard to debug.
It's surprising that LLD was able to link itself and all LLD
tests except gunit tests passed with this nasty bug.
With this fix, the new COFF linker is able to pass all tests
for LLVM, Clang and LLD if I use MSVC cl.exe as a compiler.
Only three tests are failing when used with clang-cl.
llvm-svn: 240216
This avoids undefined behaviour caused by an out-of-range access if the
vector is empty, which can happen if an object file's directive section
contains only whitespace.
llvm-svn: 240183
getName() does strlen() on the symbol table, so it's not very fast.
It's not as bad as r239332 because the number of symbols exported
from archive files are fewer than object files, and they are usually
shorter, though.
llvm-svn: 240178
In this linker model, adding an undefined symbol may trigger chain
reactions. It may trigger a Lazy symbol to read a new file.
A new file may contain a directive section, which may contain various
command line options.
Previously, we didn't handle chain reactions well. We visited /include'd
symbols only once, so newly-added /include symbols were ignored.
This patch fixes that bug.
Now, the symbol table is versioned; every time the symbol table is
updated, the version number is incremented. We repeat adding undefined
symbols until the version number does not change. It is guaranteed to
converge -- the number of undefined symbol in the system is finite,
and adding the same undefined symbol more than once is basically no-op.
llvm-svn: 240177
None of the implementations replace the SimpleFile with some other file,
they just modify the SimpleFile in-place, so a direct reference to the
file is sufficient.
llvm-svn: 240167
Alternatename option is in the form of /alternatename:<from>=<to>.
It's effect is to resolve <from> as <to> if <from> is still undefined
at end of name resolution.
If <from> is not undefined but completely a new symbol, alternatename
shouldn't do anything. Previously, it introduced a new undefined
symbol for <from>, which resulted in undefined symbol error.
llvm-svn: 240161
We don't want to insert a new symbol to the symbol table while reading
a .drectve section because it's going to be too complicated.
That we are reading a directive section means that we are currently
reading some object file. Adding a new undefined symbol to the symbol
table can trigger a library file to read a new file, so it would make
the call stack too deep.
In this patch, I add new symbol names to a list to resolve them later.
llvm-svn: 240076
Alternatename option is in the form of /alternatename:<from>=<to>.
It is an error if there are two options having the same <from> but
different <to>. It is *not* an error if both are the same.
llvm-svn: 240075
We skip unknown options in the command line with a warning message
being printed out, but we shouldn't do that for .drectve section.
The section is not visible to the user. We should handle unknown
options as an error.
llvm-svn: 240067
The linker has to create an XML file for each executable.
This patch supports that feature.
You can optionally embed an XML file to an executable as .rsrc
section. If you choose to do that (by passing /manifest:embed
option), the linker has to create a textual resource file
containing an XML file, compile that using rc.exe to a binary
resource file, conver that resource file to a COFF file using
cvtres.exe, and then link that COFF file. This patch implements
that feature too.
llvm-svn: 239978
Common symbols will be handled in a separate patch because it seems
Hexagon redefines the notion of common symbol, which I'm not (yet)
very familiar with.
llvm-svn: 239951