Commit Graph

29 Commits

Author SHA1 Message Date
Rui Ueyama df985afa14 COFF: De-parallelize ICF for now.
There was a threading issue in the ICF code for COFF. That seems like
a venign bug in the sense that it doesn't produce an incorrect output,
but it oftentimes misses reducible sections. As a result, mergeable
sections could remain in outputs, which makes the output nondeterministic.

Basically the algorithm we are using for ICF is this: We group sections
so that identical sections will eventually be in the same group. Initially,
all sections are in one group. We split the group by relocation targets
until we get a convergence (if relocation targets are in different gruops,
the sections are different). Once a group is split, they will never be
merged.

Each section has a group ID. That variable itself is atomic, so there's
no threading issue at the level that we can use thread sanitizer.
The point is, when we split a group, we re-assign new group IDs to group
of sections. That are multiple separate writes to atomic varaibles.
Thus, splitting a group is not an atomic operation, and there's a small
chance that the other thread observes inconsistent group IDs.

Over-splitting is always "safe", so it will never create incorrect output.

I suspect that the nondeterminism stems from that point. However, I
cannot prove or fix that at this moment, so I'm going to avoid using
threads here.

llvm-svn: 251300
2015-10-26 16:20:00 +00:00
Rui Ueyama 548d22c073 COFF: ICF should not merge sectinos if their alignments are not the same.
There's actually a room to improve this patch. Instead of not merging
sections that have different alignements, we can choose the section that
has the largest alignment requirement among all sections that are otherwise
considered the same. Then all section alignments are satisfied, so we can
merge them.

I don't know if that improvement could make any difference for real-world
input, so I'll leave it alone. Would be interesting to revisit later.

llvm-svn: 248581
2015-09-25 16:50:12 +00:00
Rui Ueyama c9e746b9e6 COFF: Fix local varaible type.
This is intended to be 64-bit integer, but size_t is not guranteed
to be the same or larger type than uint64_t.

llvm-svn: 248580
2015-09-25 16:38:13 +00:00
Rui Ueyama c28a08b8d2 COFF: Remove duplicate parameter from hash value calculation.
llvm-svn: 248526
2015-09-24 19:00:42 +00:00
Rui Ueyama 97d92736f5 COFF: Improve section hash value.
std::distance(C->Relocs.end(), C->Relocs.begin()) is the same as NumRelocs
which is already added to the hash value. What we are missing here is the
section size.

llvm-svn: 248202
2015-09-21 19:41:38 +00:00
Rui Ueyama 3cb1f5c860 COFF: Rename A.replaceWith(B) -> B.replace(A). NFC.
llvm-svn: 248197
2015-09-21 19:36:51 +00:00
Rui Ueyama 5f38915624 COFF: Fix ICF regression.
This patch fixes a regression introduced by r247964. Relocations that
are referring the same symbol should be considered equal, but they
were not if they were pointing to non-section chunks.

llvm-svn: 248132
2015-09-20 20:19:12 +00:00
Rui Ueyama f49712a853 COFF: Fix race condition.
NextID is updated inside parallel_for_each, so it needs mutual exclusion.

llvm-svn: 248106
2015-09-20 01:44:44 +00:00
Rui Ueyama 49e72e69e5 Fix build error that std::atomic is not copy-constructible.
llvm-svn: 248061
2015-09-18 22:58:12 +00:00
Rui Ueyama e629a45531 COFF: Address review comments.
- Fix race condition of `Redo`
- Avoid std::distance

llvm-svn: 248058
2015-09-18 22:31:15 +00:00
Rui Ueyama e8d1c59756 Style fix to make it look consistent. NFC.
llvm-svn: 248044
2015-09-18 21:17:44 +00:00
Rui Ueyama aa95e5a4cc COFF: Parallelize ICF.
The LLD's ICF algorithm is highly parallelizable. This patch does that
using parallel_for_each.

ICF accounted for about one third of total execution time. Previously,
it took 324 ms when self-hosting. Now it takes only 62 ms.

Of course your mileage may vary. My machine is a beefy 24-core Xeon machine,
so you may not see this much speedup. But this optimization should be
effective even for 2-core machine, since I saw speedup (324 ms -> 189 ms)
when setting parallelism parameter to 2.

llvm-svn: 248038
2015-09-18 21:06:34 +00:00
Rui Ueyama 603d51104b COFF: Reorder comparisons.
This change makes equalsConstant a bit faster (193ms -> 163ms).

llvm-svn: 247965
2015-09-18 02:40:54 +00:00
Rui Ueyama 8c73dfb6bf COFF: Remove useless micro-optimization.
This patch simplifies code by removing micro-optimization that doesn't
contribute to speed.

llvm-svn: 247964
2015-09-18 02:15:34 +00:00
Rui Ueyama c9a6e827bd COFF: Optimize ICF by not creating temporary vectors.
Previously, ICF created a vector for each SectionChunk. The vector
contained pointers to successors, which are namely associative sections
and COMDAT relocation targets. The reason I created vectors is because
I thought that that would make section comparison faster.

It did make the comparison faster. When self-linking, for example, it
saved about 10 ms on each iteration. The time we spent on constructing
the vectors was 124 ms. If we iterate more than 12 times, return from
the investment exceeds the initial cost.

In reality, it usually needs 5 iterations. So we shouldn't construct
the vectors.

llvm-svn: 247963
2015-09-18 01:51:37 +00:00
Rui Ueyama 7d8263bf1d COFF: Optimize ICF by comparing relocations before section contents.
equalsConstants() is the heaviest function in ICF, and that consumes
more than half of total ICF execution time. Of which, section content
comparison accounts for roughly one third.

Previously, we compared section contents at the beginning of the
function after comparing their checksums. The comparison is very
likely to succeed because when the control reaches that comparison,
their checksums are always equal. And because checksums are 64-bit
CRC, they are unlikely to collide.

We compared relocations and associative sections after that.
If they are different, the time we spent on byte-by-byte comparison
of section contents were wasted.

This patch moves the comparison at the end of function. If the
comparison fails, the time we spent on relocation comparison are
wasted, but as I wrote it's very unlikely to happen.

LLD took 1198 ms to link itself to produce a 27.11 MB executable.
Of which, ICF accounted for 536 ms. This patch cuts it by 90 ms,
which is 17% speedup of ICF and 7.5% speedup overall. All numbers
are median of ten runs.

llvm-svn: 247961
2015-09-18 01:30:56 +00:00
Rui Ueyama 66c06ceaca COFF: ICF: Print out the number of iterations. NFC.
llvm-svn: 247868
2015-09-16 23:55:39 +00:00
Rui Ueyama 92298d5418 COFF: Create ICF class to move code from SectionChunk to ICF. NFC.
This patch defines ICF class and defines ICF-related functions as
members of the class. By doing this we can move code that are
related only to ICF from SectionChunk to the newly-defined class.
This also eliminates a global variable "NextID".

llvm-svn: 247802
2015-09-16 14:19:10 +00:00
Rui Ueyama 9cb2870ce0 ICF: Improve ICF to reduce more sections than before.
This is a patch to make LLD to be on par with MSVC in terms of ICF
effectiveness. MSVC produces a 27.14MB executable when linking LLD.
LLD previously produced a 27.61MB when self-linking. Now the size
is reduced to 27.11MB. Note that without ICF the size is 29.63MB.

In r247387, I implemented an algorithm that handles section graphs
as cyclic graphs and merge them using SCC. The algorithm did not
always work as intended as I demonstrated in r247721. The new
algortihm implemented in this patch is different from the previous
one. If you are interested the details, you want to read the file
comment of ICF.cpp.

llvm-svn: 247770
2015-09-16 03:26:31 +00:00
Rui Ueyama c48f78ca5a Fix typo.
llvm-svn: 247645
2015-09-15 00:35:41 +00:00
Rui Ueyama 5b93aa51de COFF: Teach ICF to merge cyclic graphs.
Previously, LLD's ICF couldn't merge cyclic graphs. That was unfortunate
because, in COFF, cyclic graphs are not exceptional at all. That is
pretty common.

In this patch, sections are grouped by Tarjan's strongly connected
component algorithm to get acyclic graphs. And then we try to merge
SCCs whose outdegree is zero, and remove them from the graph. This
makes other SCCs to have outdegree zero, so we can repeat the
process until all SCCs are removed. When comparing two SCCs, we handle
cycles properly.

This algorithm works better than previous one. Previously, self-linking
produced a 29.0MB executable. It now produces a 27.7MB. There's still some
gap compared to MSVC linker which produces a 27.1MB executable for the
same input. So the gap is narrowed, but still LLD is not on par with MSVC.
I'll investigate that later.

llvm-svn: 247387
2015-09-11 04:29:03 +00:00
Rui Ueyama 3c28ba38de COFF: Split doICF(). No functionality change.
llvm-svn: 246934
2015-09-05 23:06:32 +00:00
Rui Ueyama ef907ec82d COFF: Implement a better algorithm for ICF.
Identical COMDAT Folding is a feature to merge COMDAT sections
by contents. Two sections are considered the same if their contents,
relocations, attributes, etc, are all the same.

An interesting fact is that MSVC linker takes "iterations" parameter
for ICF because the algorithm they are using is iterative. Merging
two sections could make more sections to be mergeable because
different relocations could now point to the same section. ICF is
repeated until we get a convergence (until no section can be merged).
This algorithm is not fast. Usually it needs three iterations until a
convergence is obtained.

In the new algorithm implemented in this patch, we consider sections
and relocations as a directed acyclic graph, and we try to merge
sections whose outdegree is zero. Sections with outdegree zero are then
removed from the graph, which makes  other sections to have outdegree
zero. We repeat that until all sections are processed. In this
algorithm, we don't iterate over the same sections many times.

There's an apparent issue in the algorithm -- the section graph is
not guaranteed to be acyclic. It's actually pretty often cyclic.
So this algorithm cannot eliminate all possible duplicates.
That's OK for now because the previous algorithm was not able to
eliminate cycles too. I'll address the issue in a follow-up patch.

llvm-svn: 246878
2015-09-04 21:35:54 +00:00
Rui Ueyama 434de7a33f Remove unused variable.
llvm-svn: 246874
2015-09-04 21:05:30 +00:00
Rui Ueyama 2dcc23580e COFF: Use section content checksum for ICF.
Previously, we calculated our own hash values for section contents.
Of coruse that's slow because we had to access all bytes in sections.
Fortunately, COFF objects usually contain hash values for COMDAT
sections. We can use that to speed up Identical COMDAT Folding.

llvm-svn: 246869
2015-09-04 20:45:50 +00:00
Rui Ueyama 7b276e2fb4 COFF: Move code for Identical COMDAT Folding to ICF.cpp.
llvm-svn: 243701
2015-07-30 22:57:21 +00:00
Rui Ueyama 50be6edfa6 COFF: Make doICF non-recursive. NFC.
llvm-svn: 240898
2015-06-28 01:35:59 +00:00
Rui Ueyama fc510f4cf8 COFF: Devirtualize mark(), markLive() and isCOMDAT().
Only SectionChunk can be dead-stripped. Previously,
all types of chunks implemented these functions,
but their functions were blank.

Likewise, only DefinedRegular and DefinedCOMDAT symbols
can be dead-stripped. markLive() function was implemented
for other symbol types, but they were blank.

I started thinking that the change I made in r240319 was
a mistake. I separated DefinedCOMDAT from DefinedRegular
because I thought that would make the code cleaner, but now
we want to handle them as the same type here. Maybe we
should roll it back.

This change should improve readability a bit as this removes
some dubious uses of reinterpret_cast. Previously, we
assumed that all COMDAT chunks are actually SectionChunks,
which was not very obvious.

llvm-svn: 240675
2015-06-25 19:10:58 +00:00
Rui Ueyama 49560c7a10 COFF: Move code for ICF from Writer.cpp to ICF.cpp.
llvm-svn: 240590
2015-06-24 20:40:03 +00:00