This is how we use TarWriter in LLD. Now LLD does not append
a file extension, so you need to pass `--reproduce foo.tar`
instead of `--reproduce foo`.
Differential Revision: https://reviews.llvm.org/D28103
llvm-svn: 291210
After Mark's patch I was wondering what was the rationale for the ELF
spec requiring us to merge only sections with matching flags and
types. I tried emailing
https://groups.google.com/forum/#!forum/generic-abi, but looks like my
emails are not being posted (the list is probably moderated). I
emailed Cary Coutant instead.
Cary pointed out that the section was a late addition and didn't got
the scrutiny it deserved. Given that and the problems found by
implementing the letter of the standard, I propose changing lld to
merge all sections with the same name and issue errors if the types or
some critical flags are different.
This should allow an unmodified firefox linked with lld to run.
This also merges some code with the linkerscript path.
llvm-svn: 291107
The glibc dynamic loader rounds the size down, so without this the loader
will fail to change the memory protection for the last page.
Differential Revision: https://reviews.llvm.org/D28267
llvm-svn: 290986
The GUID should match between the RSDS and the PDB. This should repair
the build bots, though we should be ensuring that the GUIDs match.
Unfortunately, different build bots seem to be getting different GUIDs.
llvm-svn: 290981
The PDB GUID, Age, and version are tied together by the RSDS record in
the binary. Pass along the BuildId information into the createPDB to
allow us to tie the binary and the PDB together.
llvm-svn: 290975
In a shared library an undefined symbol is implicitly imported. If the
symbol is called as a function a PLT entry is generated for it. When the
caller is a Thumb b.w a thunk to the PLT entry is needed as all PLT
entries are in ARM state.
This change allows undefined symbols to have thunks in the same way that
shared symbols may have thunks.
llvm-svn: 290951
Assert that the size of the MD5 result is the same size as the signature
field being populated. Use the sizeof operator to determine the size of
the field being written rather than hardcoding it to the magic number
16. NFC.
llvm-svn: 290764
This is necessary for the distribution targets which assume that
each component has an install target. This also moves the CMake
macros into a separate file akin to other LLVM projects.
Differential Revision: https://reviews.llvm.org/D27876
llvm-svn: 290391
This is last known noticable fatal() in target.cpp.
We also have other ones for unknown relocations or
creating unknown targets, but that one can be just error I think.
Used yaml2obj to generate test.
Differential revision: https://reviews.llvm.org/D28049
llvm-svn: 290335
Vectors returned form that function contained nullptrs or Undefined symbols.
This patch filter them out. This makes use of the function a bit easier.
llvm-svn: 290334
Previously, that was an alias to -color-diagnostics=auto. However,
Clang's -fcolor-diagnostics is an alias to -fcolor-diagnostics=always,
so that was confusing. This patch fixes that issue.
llvm-svn: 290332
Previously, you had to call initDemangledSyms() before accessing DemangledSyms.
Now getDemangledSyms() initializes it and then returns it. So it is now less easy
to use it in a wrong way.
llvm-svn: 290323
OpenBSD's cpio does not accept the -t option without -i.
Apparently some systems implement cpio -t as a shortcut
for cpio -it, the latter is the only thing that's documented.
This change avoids test failures on OpenBSD.
Patch by Mark Kettenis!
Differential Revision: https://reviews.llvm.org/D28002
llvm-svn: 290252
DefinedSynthetic is not created for a real ELF object, so it doesn't
have to be a template function. It has a virtual st_value, which is
either 32 bit or 64 bit, but we can simply use 64 bit.
llvm-svn: 290241
We probably would want to avoid fatal() if we can in context of librarification,
but for me reason of that patch is to help D27900 go.
D27900 changes errors reporting to something like
error: text1
note: text2
note: text3
where hint used to provide additional information about location. In that case
I can't just call fatal() because user will not see notes after that what adds additional complication to handle.
So It is good to switch fatal() to error() where it is possible.
Also it adds testcase with broken relocation number.
Previously we did not have any, It checks that error() instead of fatal() works fine.
Differential revision: https://reviews.llvm.org/D27973
llvm-svn: 290239
It was revealed by D27831.
If we have linkerscript that includes another one that sets OUTPUT for example:
RUN: echo "INCLUDE \"foo.script\"" > %t.script
RUN: echo "OUTPUT(\"%t.out\")" > %T/foo.script
then we do:
void ScriptParser::readInclude() {
...
std::unique_ptr<MemoryBuffer> &MB = *MBOrErr;
tokenize(MB->getMemBufferRef());
OwningMBs.push_back(std::move(MB));
}
void ScriptParser::readOutput() {
...
Config->OutputFile = unquote(Tok);
...
}
Problem is that OwningMBs are destroyed after script parser do its job.
So all Toks are dead and Config->OutputFile points to destroyed data.
Patch suggests to save all included scripts into using string Saver.
Differential revision: https://reviews.llvm.org/D27987
llvm-svn: 290238
Older versions of BFD generate libraries with .MIPS.abiflags that only
concatenate the individual .MIPS.abiflags sections instead of merging.
Patch by Alexander Richardson.
Differential revision: https://reviews.llvm.org/D27770
llvm-svn: 290237
GlobPattern is a class to handle glob pattern matching. Currently
only LLD is using that, but technically that feature is not specific
to linkers, so in this patch I move that file to LLVM.
Differential Revision: https://reviews.llvm.org/D27969
llvm-svn: 290212
That was requested by Mark Kettenis in llvm-dev:
"It is the intention that .openbsd.randomdata sections are made
read-only after initialization. The native (ld.bfd based) OpenBSD
toolchain accomplishes this by including .openbsd.randomdata into the
PT_GNU_RELRO segment."
He suggested code change, I added testcase.
Differential revision: https://reviews.llvm.org/D27974
llvm-svn: 290174
Previously, some errors that were checked before we set to
Config->ColorDiagnostics weren't colored. This patch moves the code
to set the variable so that such error messages are colored just like
other error messages.
llvm-svn: 290157
That variable was of type DenseMap<StringRef, unsigned>, but the
unsigned numbers needed to be monotonicly increasing numbers because
the implementation that used the variable depended on that fact.
That was an implementation detail and shouldn't have leaked into Config.
This patch simplifies its type to std::vector<StringRef>.
llvm-svn: 290151
This handles all the corner cases if setting a section address:
- If the address is too low, we cannot allocate the program headers.
- If the load address is lowered, we have to do that before finalize
This also shares some code with the linker script since it was already
hitting similar cases.
This is used by the freebsd boot loader. It is not clear if we need to
support this with a non binary output, but it is not as bad as I was
expecting.
llvm-svn: 290136
--retain-symbols-file=filename
Retain only the symbols listed in the file filename, discarding all others.
filename is simply a flat file, with one symbol name per line. This option
is especially useful in environments (such as VxWorks) where a large global
symbol table is accumulated gradually, to conserve run-time memory.
Note: though documentation says "--retain-symbols-file does not discard
undefined symbols, or symbols needed for relocations.", both bfd and gold
do that, and this patch too, like testcase show.
Differential revision: https://reviews.llvm.org/D27716
llvm-svn: 290122
AArch64 TLSDESC for local symbol in shared objects are implemented in a
arch specific manner where the TLSDESC dynamic relocation addend is the
symbol VM inside the TLS block. For instance, with a shared library
created from the code:
--
static __thread int32_t x1;
static __thread int64_t x2;
int32_t foo1 (int32_t x)
{
x1 += x;
return x;
}
int64_t foo2 (int64_t x)
{
x2 += x;
return x;
}
--
The dynamic relocation should be create as:
Relocations [
Section (N) .rela.dyn {
<Address1> R_AARCH64_TLSDESC - 0x0
<Address2> R_AARCH64_TLSDESC - 0x8
}
]
Where 0x0 addend in first dynamic relocation is the address of 'x1'
in TLS block and '0x8' is the address of 'x2'.
Checked against test-suite on aarch64-linux-gnu.
llvm-svn: 290099
Use of CachedHashStringRef makes sense only when we reuse hash values.
Sprinkling it to all DenseMap has no benefits and just complicates data types.
Basically we shouldn't use CachedHashStringRef unless there is a strong
reason to to do so.
llvm-svn: 290076
I thought for a while about how to remove it, but it looks like we
can just copy the file for now. Of course I'm not happy about that,
but it's just less than 50 lines of code, and we already have
duplicate code in Error.h and some other places. I want to solve
them all at once later.
Differential Revision: https://reviews.llvm.org/D27819
llvm-svn: 290062
File system operations were still dominating the profile on Windows. In this
case we were spending a significant amount of our time repeatedly searching
for libraries as a result of processing linker directives. Address this
by caching whether we have already found a library with a given name. For
chrome_child.dll:
Before: 10.53s
After: 6.88s
Differential Revision: https://reviews.llvm.org/D27840
llvm-svn: 289915
It os used in work/emulators/qemu-user-static port.
Which tries to use -Ttext-segment and then:
# In case ld does not support -Ttext-segment, edit the default linker
# script via sed to set the .text start addr. This is needed on FreeBSD
# at least.
<here it calls -verbose to extract and edit default bfd linker script.>
Actually now we are do not fully support -Ttext properly (see D27613),
but we also seems never will provide anything close to default script, like bfd do,
so at least this patch introduces proper alias handling.
llvm-svn: 289827
Patch continues work started in D24706 and D25821.
in this patch symbol table and constant pool areas were
added to .gdb_index section output.
This one finishes the implementation of --gdb-index functionality in LLD.
Differential revision: https://reviews.llvm.org/D26283
llvm-svn: 289810
Patch continues work started in D24706,
in this patch address area was added to .gdb_index section output.
Differential revision: https://reviews.llvm.org/D25821
llvm-svn: 289790
PR31335 shows that we do that in next case:
SECTIONS { .text 0x2000 : {. = 0x100 ; *(.text) } }
though documentations says that "If . is used inside a section
description however, it refers to the byte offset from the start
of that section, not an absolute address. " looks does not work
as documented in bfd (as mentioned in comments for PR31335).
Until we find out the expected behavior was suggested at least not
to 'crash', what we do after trying to generate huge file.
Differential revision: https://reviews.llvm.org/D27712
llvm-svn: 289782
Profiling revealed that the majority of lld's execution time on Windows was
spent opening and mapping input files. We can reduce this cost significantly
by performing these operations asynchronously.
This change introduces a queue for all operations on input file data. When
we discover that we need to load a file (for example, when we find a lazy
archive for an undefined symbol, or when we read a linker directive to
load a file from disk), the file operation is launched using a future and
the symbol resolution operation is enqueued. This implies another change
to symbol resolution semantics, but it seems to be harmless ("ninja All"
in Chromium still succeeds).
To measure the perf impact of this change I linked Chromium's chrome_child.dll
with both thin and fat archives.
Thin archives:
Before (median of 5 runs): 19.50s
After: 10.93s
Fat archives:
Before: 12.00s
After: 9.90s
On Linux I found that doing this asynchronously had a negative effect on
performance, probably because the cost of mapping a file is small enough that
it becomes outweighed by the cost of managing the futures. So on non-Windows
platforms I use the deferred execution strategy.
Differential Revision: https://reviews.llvm.org/D27768
llvm-svn: 289760
`SC` didn't make much sense. We don't seem to have a clear convention,
but `IS` sounds good here because it emphasizes that it is an input
section (this is one place in the code where we are dealing with both
input sections and output sections at the same time so that extra
emphasis makes it a bit clearer).
llvm-svn: 289748
Remove the includes of <llvm/Config/config.h> private LLVM header.
The relevant files seem not to use any definitions from that file,
and it is not available when building against installed LLVM.
The use in lib/ReaderWriter/MachO/MachOLinkingContext.cpp originates
from rL218718, and the use in ELF/Strings.cpp from rL274804 (where it
was moved from Symbols.cpp). In both cases, they were added as a part of
demangling support, and they provided HAVE_CXXABI_H.
Since we are now using the LLVM demangler library instead, the code was
removed and the includes and no longer necessary.
Differential Revision: https://reviews.llvm.org/D27757
llvm-svn: 289707
The eglibc library, as used by Ubuntu 14.04 requires the presence of an
SHT_ARM_ATTRIBUTES section in for the purposes of checking hard/soft float
compatibility when dlopen() is used. Unfortunately when the section is not
present dlopen() fails with a generic could not find file message.
This change makes lld keep the first .ARM.attributes section that it
encounters and propagates it to the output. This is not a complete
SHT_ARM_ATTRIBUTES implementation, that would involve reading the contents
of the section and joining each individual attribute. It should suffice
for a homogenous build all libraries and executables on the same system
with a compatible set of command line options.
Differential revision: https://reviews.llvm.org/D27718
llvm-svn: 289642
When compiling -fpie and linking with the --pie option the R_ARM_GOTBREL
relocation to D is resolved by writing the value of D into the .got slot
and emitting an R_ARM_RELATIVE relocation for it.
This changes adds the R_ARM_RELATIVE relocation to the switch in
relocateOne() so we can process the GotSection relocation to write the
value of the variable as well as emitting the dynamic relocation.
Differential revision: https://reviews.llvm.org/D27678
llvm-svn: 289527
The VA of _gp was being truncated to 32 bits when calling getVa(), but
for 64bit MIPS we need to write a 64 bit value to .MIPS.options.
Patch by Alexander Richardson.
Differential revision: https://reviews.llvm.org/D27672
llvm-svn: 289432
Enable building lld as a standalone project. This is motivated by the desire to
package lld for inclusion in a linux distribution. This allows building lld
against an existing paired llvm installation. Now that lld is usable on x86_64,
it makes sense to revive this configuration to allow distributions to package
it.
llvm-svn: 289421
This patch replaces the symbol table's object and archive queues, as well as
the convergent loop in the linker driver, with a design more similar to the
ELF linker where symbol resolution directly causes input files to be added to
the link, including input files arising from linker directives. Effectively
this removes the last vestiges of the old parallel input file loader.
Differential Revision: https://reviews.llvm.org/D27660
llvm-svn: 289409
Using a set here caused us to take about 1 second longer to write the symbol
table when linking chrome_child.dll. With this I consistently get better
performance on Windows with the new symbol table.
Before r289280 and with r289183 reverted (median of 5 runs): 17.65s
After this change: 17.33s
On Linux things look even better:
Before: 10.700480444s
After: 5.735681610s
Differential Revision: https://reviews.llvm.org/D27648
llvm-svn: 289408
We first decide that the symbol is global, than that it should have
version foo. Since it was already not the default version, we were
producing a bogus warning.
llvm-svn: 289284
This ports the ELF linker's symbol table design, introduced in r268178,
to the COFF linker.
Differential Revision: http://reviews.llvm.org/D21166
llvm-svn: 289280
The former option bases the filename on the output name, e.g. if the
link output is a.exe, the map will be written to a.map. This matches the
behaviour of link.exe's /MAP option and is useful for creating a map
file of each executable when building a large project.
Differential Revision: https://reviews.llvm.org/D27595
llvm-svn: 289271
Profiling revealed that we were spending 5% of our time linking
chrome_child.dll just in this call to toString().
Differential Revision: https://reviews.llvm.org/D27628
llvm-svn: 289270
The i386 glibc ld.so expects the .got.slot entry that is relocated by a
R_386_IRELATIVE relocation to point directly at the ifunc resolver and
not the address of the PLT entry + 6 (thus entering the lazy resolver).
This is also the case for ARM and I suspect it is because these use REL
relocations and can't use the addend field to store the address of the
ifunc resolver. If the lazy resolver is used we get an error message
stating that only R_386_JUMP_SLOT is supported.
As ARM and i386 share the same code, I've removed the ARM specific test
and added a writeIgotPlt() function that by default calls writeGotPlt().
ARM and i386 override this to write the address of the ifunc resolver.
Differential Revision: https://reviews.llvm.org/D27581
llvm-svn: 289198
I don't think the data I add to a TPI stream in this patch is correct,
but at least it can be displayed using llvm-pdbdump. Until I add more
streams to a PDB file, I'm not able to know whether the data will be
accepted by MSVC tools or not.
llvm-svn: 289183
linkerscript.s is the first test file for linker script, and at the moment
it contains all tests for linker scripts. Now that test file doesn't make
sense.
linkerscript2.s was just badly named. Renamed searchdir.s.
llvm-svn: 289148
The feature is documented as
-----------------------------
The format of the dynamic list is the same as the version node
without scope and node name. See *note VERSION:: for more
information.
--------------------------------
And indeed qt uses a dynamic list with an 'extern "C++"' in it. With
this patch we support that
The change to gc-sections-shared makes us match bfd. Just because we
kept bar doesn't mean it has to be in the dynamic symbol table.
The changes to invalid-dynamic-list.test and reproduce.s are because
of the new parser.
The changes to version-script.s are the only case where we change
behavior with regards to bfd, but I would like to see a mix of
--version-script and --dynamic-list used in the wild before
complicating the code.
llvm-svn: 289082
This is the last peculiar semantics left in the linker. If you want to
always set an entry point to 0, you can pass `-e 0` to the linker.
Differential Revision: https://reviews.llvm.org/D27532
llvm-svn: 289077
This change introduces new synthetic sections IpltSection, IgotPltSection
that represent the ifunc entries that would previously have been put in
the PltSection and the GotPltSection. The separation makes sure that
the R_*_IRELATIVE relocations are placed after the non R_*_IRELATIVE
relocations, which permits ifunc resolvers to know that the .got.plt
slots will be initialized prior to the resolver being called.
A secondary benefit is that for ARM we can move the IgotPltSection and its
dynamic relocations to the .got and .rel.dyn as the ARM glibc expects all
the R_*_IRELATIVE relocations to be in the .rel.dyn
Differential revision: https://reviews.llvm.org/D27406
llvm-svn: 289045
These MIPS specific symbols should be global because in general they can
have an arbitrary value. By default this value is a fixed offset from .got
section.
This patch adds more checks to the mips-gp-local.s test case but marks
it as XFAIL because LLD does not allow redefinition of absolute symbols
value by a linker script. This should be fixed by D27276.
Differential revision: https://reviews.llvm.org/D27524
llvm-svn: 289025
Previously, we had different way to stringize SymbolBody and InputFile
to construct error messages. This patch defines overloaded function
toString() so that we don't need to memorize all these different
function names.
With that change, it is now easy to include demangled names in error
messages. Now, if there is a symbol name conflict, we'll print out
both mangled and demangled names.
llvm-svn: 288992
Currently LLD prints basename of source file name in error messages,
for example:
$ mkdir foo
$ echo 'void _start(void) { foobar(); }' > foo/bar.c
$ gcc -g -c foo/bar.c
$ bin/ld.lld -o out bar.o
bin/ld.lld: error: bar.c:1: undefined symbol 'foobar'
$
This should say:
bin/ld.lld: error: foo/bar.c:1: undefined symbol 'foobar'
This is PR31299
Differential revision: https://reviews.llvm.org/D27506
llvm-svn: 288966
This patch refactor how to apply the R_AARCH64_LDST{8,16,32,64,128}_ABS_NC
relocations by adding a new function to correct extract the bits expected
by each relocation. This make is explicit which are the bits range expected
and simplify the code to mask and shift the deriable values.
It also fixes the R_AARCH64_LDST128_ABS_LO12_NC mask, although in pratice
the mask/shift always returns a 16 bytes aligned value.
Checked on AArch64 and with test-suite.
llvm-svn: 288921
Shared libraries should have entry set following the same rules as for
regular binaries. The only difference is that in case the default entry
point (_start or __start) isn't found (unless it was set explicitly), we
shouldn't give a warning as in case of regular binaries.
Differential Revision: https://reviews.llvm.org/D27497
llvm-svn: 288878
If we do, the freebsd dynamic linker tries to call mmap with a size 0,
which fails.
It is hard to avoid creating them when linker scripts are used, so we
just delete empty PT_LOADs at the end.
llvm-svn: 288808
For preemptable symbols the dynamic linker does all the work. Trying
to compute the addend is at best wasteful and can also lead to crashes
in cases of programs that uses tls but doesn't define any tls
variables.
llvm-svn: 288803
This change seems to make LLD 0.6% faster when linking Clang with
debug info. I don't want us to have lots of local optimizations,
but this function is very hot, and the improvement is small but
not negligible, so I think it's worth doing.
llvm-svn: 288757
On Linux (and probably on other Unix-like systems), unlink(2) is
noticeably slow. It takes 250 milliseconds to remove a 1 GB file
on ext4 filesystem on my machine, whether the file is on SSD or
on a spinning disk.
To create a new result file, we remove existing file first. So, if
you repeatedly link a 1 GB program in a regular compile-link-debug
cycle, every cycle wastes 250 milliseconds only to remove a file.
Since LLD can link a 1 GB in about 5 seconds, that waste actually
matters.
This patch defines `unlinkAsync` function. The function spawns a
background thread to call unlink. The calling thread returns
almost immediately.
Differential Revision: https://reviews.llvm.org/D27295
llvm-svn: 288680
The relocation R_AARCH64_LDST16_ABS_LO12_NC should set a ld/st
immediate value to bits [11:1] not [11:2]. This patches fixes it
and adds a testcase for regression.
With this fix all the faulty tests on test-suite (clavm, lencod,
and trimaran) pass.
llvm-svn: 288670
I removed a wrong optimization for ICF in r288527. Sean Silva suggested
in a post commit review that the correct algorithm can be implemented
easily. So is this patch.
llvm-svn: 288620
Some elf producers (dtrace) put this flag in relocation sections and
some (MC) don't. If we don't ignore the flag we end up with multiple
relocation sections poiting to the same section, which we don't
support.
llvm-svn: 288585
A program or object file using R_386_8, R_386_16, R_386_PC16 or R_386_PC8
relocations is not conformant to latest ABI. The R_386_16, and R_386_8
relocations truncate the computed value to 16 - bits and 8 - bits
respectively. R_386_PC16 and R_386_16 are used by some
applications, for example by FreeBSD loaders.
Previously we did not take addend in account for these relocation,
counting it as 0, what is wrong and was a reason of hangs.
This patch needed for example for FreeBSD pmbr (protective mbr).
Differential revision: https://reviews.llvm.org/D27303
llvm-svn: 288581
Binary output feature is a bit confuzing. bfd and gold output differs a lot sometimes,
though it is important for FreeBSD mbr loaders.
Patch change the way how we compute file offsets for binary output.
This fixes PR31196.
Previously offsets were calculated basing on offsets and addresses of sections
from the same loads:
if (Sec == First)
return alignTo(Off, Target->MaxPageSize, Sec->Addr);
return First->Offset + Sec->Addr - First->Addr;
bfd assigns offsets for each section to VA - MinVA:
https://github.com/redox-os/binutils-gdb/blob/master/bfd/binary.c#L27https://github.com/redox-os/binutils-gdb/blob/master/bfd/binary.c#L255
(LMA == VA usually)
This patch for now just stops creating phdrs for binary output.
An effect from this that no any additional calculation for offset is performed:
uintX_t getFileAlignment(uintX_t Off, OutputSectionBase *Sec) {
OutputSectionBase *First = Sec->FirstInPtLoad;
// If the section is not in a PT_LOAD, we have no other constraint.
if (!First)
return Off; //**First is always null, condition always happens**
That is enough now with combination of another patch to generate output
that is similar to what bfd produce for mbr loader.
Differential revision: https://reviews.llvm.org/D27341
llvm-svn: 288580
--omagic is an option to create old-fashioned executables in which
.text segments are writable. Today, the option is still in use to
create special-purpose programs such as boot loaders. It doesn't
make sense to create PT_GNU_RELRO for such executables.
DIfferential revision: https://reviews.llvm.org/D27297
llvm-svn: 288579
This is a hack for single thread execution. We are using Color[0] and
Color[1] alternately on each iteration. This optimization is to look
at the next slot as opposted to the current slot to get recent results
early. Turns out that the assumption is wrong, because the other slots
are not always have the most recent values, but instead it may have
stale values of the previous iteration. This patch removes that
performance hack.
llvm-svn: 288527
The assertion asserted that colorable sections can never have
a reference to non-colorable sections, but that was simply wrong.
They can have references to non-colorable sections. If that's the
case, referenced sections must be the same in terms of pointer
comparison.
llvm-svn: 288511
LLD used to take 11.73 seconds to link Clang. Now it is 6.94 seconds.
MSVC link takes 83.02 seconds. Note that ICF is enabled by default on
Windows, so a low latency ICF is more important than in ELF.
llvm-svn: 288487
It looks like the way dtrace works is
* The user creates .o files that reference magical symbol names.
* dtrace reads those files, collecs the info it needs and changes the
relocation to R_X86_64_NONE expecting the linker to ignore them.
llvm-svn: 288485
Associative sections are sections that need to be linked if their associated
sections are linked. Associative sections are used to append auxiliary data
such as debug info.
Previously, we compared all associative sections when comparing two comdat
sections. Because usually assocative sections are not mergeable sections,
we missed a lot of mergeable sections. MSVC linker doesn't seem to check
the identity of associative sections.
This patch makes LLD to ignore associative sections when doing ICF.
llvm-svn: 288483
r288228 seems to have regressed ICF performance in some cases in which
a lot of sections are actually mergeable. In r288228, I made a change
to create a Range object for each new color group. So every time we
split a group, we allocated and added a new group to a list of groups.
This patch essentially reverted r288228 with an improvement to
parallelize the original algorithm.
Now the ICF main loop is entirely allocation-free and lock-free.
Just like pre-r288228, we search for group boundaries by linear scan
instead of managing the information using Range class. r288228 was
neutral in performance-wise, and so is this patch.
I confirmed that this produces the exact same result as before
using chromium and clang as tests.
llvm-svn: 288480
This is a fairly reasonable bfd extension since there is one obvious value.
dtrace depends on this feature as it creates multiple absolute
symbols with the same value.
llvm-svn: 288461
ICF is short for Identical Code Folding. It is a size optimization to
identify two or more functions that happened to have the same contents
to merges them. It usually reduces output size by a few percent.
ICF is slow because it is computationally intensive process. I tried
to paralellize it before but failed because I couldn't make a
parallelized version produce consistent outputs. Although it didn't
create broken executables, every invocation of the linker generated
slightly different output, and I couldn't figure out why.
I think I now understand what was going on, and also came up with a
simple algorithm to fix it. So is this patch.
The result is very exciting. Chromium for example has 780,662 input
sections in which 20,774 are reducible by ICF. LLD previously took
7.980 seconds for ICF. Now it finishes in 1.065 seconds.
As a result, LLD can now link a Chromium binary (output size 1.59 GB)
in 10.28 seconds on my machine with ICF enabled. Compared to gold
which takes 40.94 seconds to do the same thing, this is an amazing
number.
From here, I'll describe what we are doing for ICF, what was the
previous problem, and what I did in this patch.
In ICF, two sections are considered identical if they have the same
section flags, section data, and relocations. Relocations are tricky,
becuase two relocations are considered the same if they have the same
relocation type, values, and if they point to the same section _in
terms of ICF_.
Here is an example. If foo and bar defined below are compiled to the
same machine instructions, ICF can (and should) merge the two,
although their relocations point to each other.
void foo() { bar(); }
void bar() { foo(); }
This is not an easy problem to solve.
What we are doing in LLD is some sort of coloring algorithm. We color
non-identical sections using different colors repeatedly, and sections
in the same color when the algorithm terminates are considered
identical. Here is the details:
1. First, we color all sections using their hash values of section
types, section contents, and numbers of relocations. At this moment,
relocation targets are not taken into account. We just color
sections that apparently differ in different colors.
2. Next, for each color C, we visit sections having color C to see
if their relocations are the same. Relocations are considered equal
if their targets have the same color. We then recolor sections that
have different relocation targets in new colors.
3. If we recolor some section in step 2, relocations that were
previously pointing to the same color targets may now be pointing to
different colors. Therefore, repeat 2 until a convergence is
obtained.
Step 2 is a heavy operation. For Chromium, the first iteration of step
2 takes 2.882 seconds, and the second iteration takes 1.038 seconds,
and in total it needs 23 iterations.
Parallelizing step 1 is easy because we can color each section
independently. This patch does that.
Parallelizing step 2 is tricky. We could work on each color
independently, but we cannot recolor sections in place, because it
will break the invariance that two possibly-identical sections must
have the same color at any moment.
Consider sections S1, S2, S3, S4 in the same color C, where S1 and S2
are identical, S3 and S4 are identical, but S2 and S3 are not. Thread
A is about to recolor S1 and S2 in C'. After thread A recolor S1 in
C', but before recolor S2 in C', other thread B might observe S1 and
S2. Then thread B will conclude that S1 and S2 are different, and it
will split thread B's sections into smaller groups wrongly. Over-
splitting doesn't produce broken results, but it loses a chance to
merge some identical sections. That was the cause of indeterminism.
To fix the problem, I made sections have two colors, namely current
color and next color. At the beginning of each iteration, both colors
are the same. Each thread reads from current color and writes to next
color. In this way, we can avoid threads from reading partial
results. After each iteration, we flip current and next.
This is a very simple solution and is implemented in less than 50
lines of code.
I tested this patch with Chromium and confirmed that this parallelized
ICF produces the identical output as the non-parallelized one.
Differential Revision: https://reviews.llvm.org/D27247
llvm-svn: 288373
In various places in LLD's hot loops, we have expressions of the form
"E == R_FOO || E == R_BAR || ..." (E is a RelExpr).
Some of these expressions are quite long, and even though they usually go just
a very small number of ways and so should be well predicted, they can still
occupy branch predictor resources harming other parts of the code, or they
won't be predicted well if they overflow branch predictor resources or if the
branches are too dense and the branch predictor can't track them all (the
compiler can in theory avoid this, at a cost in text size). And some of these
expressions are so large and executed so frequently that even when
well-predicted they probably still have a nontrivial cost.
This speedup should be pretty portable. The cost of these simple bit tests is
independent of:
- the target we are linking for
- the distribution of RelExpr's for a given link (which can depend on how the
input files were compiled)
- what compiler was used to compile LLD (it is just a simple bit test;
hopefully the compiler gets it right!)
- adding new target-dependent relocations (e.g. needsPlt doesn't pay any extra
cost checking R_PPC_PLT_OPD on x86-64 builds)
I did some rough measurements on clang-fsds and this patch gives over about 4%
speedup for a regular -O1 link, about 2.5% for -O3 --gc-sections and over 5%
for -O0. Sorry, I don't have my current machine set up for doing really
accurate measurements right now.
This also is just a bit cleaner. Thanks for Joerg for suggesting for
this approach.
Differential Revision: https://reviews.llvm.org/D27156
llvm-svn: 288314
- Rename currentBuffer -> getCurrentMB to start it with verb.
- Simplify containsString.
- Add llvm_unreachable at end of getCurrentMB.
llvm-svn: 288310