When parsing CU ranges for gdb-index, handle the error (now propagated
up though the API lld is calling here - previously the error was
printed within the libDebugInfo API, not allowing lld to format or
handle the message at all) - including information about the object and
archive name, as well as failing the link.
llvm-svn: 349979
Summary:
For the 2-bit bloom filter, we currently pick the bits Hash%64 and Hash>>6%64 (Shift2=6), but bits [6:...] are also used to select a word, causing a loss of precision.
In this patch, we choose Shift2=26, with is suggested by Ambrose Feinstein.
Note, Shift2 is computed as maskbitslog2 in bfd/elflink.c and gold/dynobj.cc
It is varying with the number of dynamic symbols but we don't
necessarily copy its rule.
Reviewers: ruiu, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D55971
llvm-svn: 349966
Summary:
This reinstates what I originally intended to do in D54361.
It removes the assumption that .debug_gnu_pubnames has increasing CuOffset.
Now we do better than gold here: when .debug_gnu_pubnames contains
multiple sets, gold would think every set has the same CU index as the
first set (incorrect).
Reviewed By: ruiu
Reviewers: ruiu, dblaikie, espindola
Subscribers: emaste, arichardson, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54483
llvm-svn: 347820
The DT_PLTRELSZ dynamic tag is calculated using the size of the
OutputSection containing the In.RelaPlt InputSection. This will work for the
default no linker script case and the majority of linker scripts.
Unfortunately it doesn't work for some 'almost' sensible linker scripts. It
is permitted by ELF to have a single OutputSection containing both
In.RelaDyn, In.RelaPlt and In.RelaIPlt. It is also permissible for the range
of memory [DT_RELA, DT_RELA + DT_RELASZ) and the range
[DT_JMPREL, DT_JMPREL + DT_JMPRELSZ) to overlap as long as the the latter
range is at the end.
To support this type of linker script use the specific InputSection sizes.
Fixes pr39678
Differential Revision: https://reviews.llvm.org/D54759
llvm-svn: 347736
This is https://bugs.llvm.org//show_bug.cgi?id=38978
Spec says that:
"Objects may be built with the -z nodefaultlib option to
suppress any search of the default locations at runtime.
Use of this option implies that all the dependencies of an
object can be located using its runpaths.
Without this option, which is the most common case, no
matter how you augment the runtime linker's library
search path, its last element is always /usr/lib for 32-bit
objects and /usr/lib/64 for 64-bit objects."
The patch implements this option.
Differential revision: https://reviews.llvm.org/D54577
llvm-svn: 347647
We explicitly call finalizeContents() only once for
DynamicSection. The code testing we do not do it twice is
just excessive.
It could be an assert, but we don't do
that for other sections, so does not seem we
should do it here too.
llvm-svn: 347543
Summary:
This fixes PR39711: -static -z retpolineplt does not produce retpoline PLT header.
-z now is not relevant.
Statically linked executable does not have PLT, but may have IPLT with no header. When -z retpolineplt is specified, however, the repoline PLT header should still be emitted.
I've checked that this fixes the FreeBSD reproduce in PR39711 and a Linux program statically linked against glibc. The programm print "Hi" rather than SIGILL/SIGSEGV.
getPltEntryOffset may look dirty after this patch, but it can be cleaned up later.
Another possible improvement is that when there are non-preemptible IFUNC symbols (rare case, e.g. -Bsymbolic), both In.Plt and In.Iplt can be non-empty and we'll emit the retpoline PLT header twice.
Reviewers: espindola, emaste, chandlerc, ruiu
Reviewed By: emaste
Subscribers: emaste, arichardson, krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D54782
llvm-svn: 347404
On PowerPC64, when a function call offset is too large to encode in a call
instruction the address is stored in a table in the data segment. A thunk is
used to load the branch target address from the table relative to the
TOC-pointer and indirectly branch to the callee. When linking position-dependent
code the addresses are stored directly in the table, for position-independent
code the table is allocated and filled in at load time by the dynamic linker.
For position-independent code the branch targets could have gone in the .got.plt
but using the .branch_lt section for both position dependent and position
independent binaries keeps it consitent and helps keep this PPC64 specific logic
seperated from the target-independent code handling the .got.plt.
Differential Revision: https://reviews.llvm.org/D53408
llvm-svn: 346877
Summary:
NameTypeEntry::Type is a bit-packed value of CU index+attributes (https://sourceware.org/gdb//onlinedocs/gdb/Index-Section-Format.html), which is named cu_index_and_attrs in a local variable in gdb/dwarf2read.c:dw2_symtab_iter_next
The new name CuIndexAndAttrs is more meaningful.
Reviewers: ruiu, dblaikie, espindola
Reviewed By: dblaikie
Subscribers: emaste, aprantl, arichardson, JDevlieghere, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54481
llvm-svn: 346794
Summary:
Idx passed to readPubNamesAndTypes was an index into Chunks, not an
index into the CU list. This would be incorrect if some .debug_info
section contained more than 1 DW_TAG_compile_unit.
In real world, glibc Scrt1.o is a partial link of start.os abi-note.o init.o and contains 2 CUs in debug builds.
Without this patch, any application linking such Scrt1.o would have invalid .gdb_index
The issue could be demonstrated by:
(gdb) py print(gdb.lookup_global_symbol('main'))
None
Reviewers: espindola, ruiu
Reviewed By: ruiu
Subscribers: Higuoxing, grimar, dblaikie, emaste, aprantl, arichardson, JDevlieghere, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54361
llvm-svn: 346747
Summary:
The debug_info_offset value may be relocated.
This is lld side change of D54375.
Reviewers: ruiu, dblaikie, grimar, espindola
Subscribers: emaste, arichardson, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D54376
llvm-svn: 346616
Summary:
D52830 sets sh_link to .symtab in static link, which breaks executable stripped by GNU strip.
It may also be odd that .rela.plt (SHF_ALLOC) points to .symtab (non-SHF_ALLOC).
Change the logic on pcc's suggestion.
Before:
% clang -fuse-ld=lld -static -xc =(printf 'int main(){}') # or gcc
% strip a.out; ./a.out
unexpected reloc type in static binary[1] 61634 segmentation fault ./a.out
Reviewers: ruiu, grimar, emaste, espindola
Reviewed By: ruiu
Subscribers: pcc, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D53993
llvm-svn: 345899
Summary: .rela.plt may only contain R_*_{,I}RELATIVE relocations and not need a symbol table link. bfd/gold fallbacks to sh_link=0 in this case. Without this patch, ld.lld --strip-all caused lld to dereference a null pointer.
Reviewers: ruiu, grimar, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D53881
llvm-svn: 345648
Out::DebugInfo was used only by GdbIndex class to determine if
we need to create a .gdb_index section, but we can do the same
check without it.
Added a test that this patch doesn't change the existing behavior.
llvm-svn: 345058
Android uses a compressed relocation format, which means the size of the
relocation section isn't predictable based on the number of relocations,
and can vary if the layout changes in any way. To deal with this, the
linker normally runs multiple passes until the layout converges.
The layout should converge if the size of the compressed
relocation section increases monotonically: if the size of an encoded
offset increases by one byte, the larget value which can be encoded is
multiplied by 128, so the representable offsets grow much faster than
the size of the section itself.
The problem here is that there is no code to ensure the size of the
section doesn't decrease. If the size of the relocation section
decreases, the relative offsets can increase due to alignment
restrictions, so that can force the size of the relocation section to
increase again. The end result is an infinite loop; the loop gets cut
off after 10 iterations with the message "thunk creation not
converged".
To avoid this issue, this patch adds padding to the end of the
relocation section if its size would decrease. The extra
padding is harmless because of the way the format is defined:
decoding stops after it reaches the number of relocations specified
in the section's header.
Differential Revision: https://reviews.llvm.org/D53003
llvm-svn: 344300
This is https://bugs.llvm.org/show_bug.cgi?id=37538,
Currently, LLD may set both sh_link and sh_info for
.rela.plt section to zero when we have only .rela.iplt section part used.
ELF spec (https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-94076/index.html)
says that for SHT_REL and SHT_RELA, sh_link references the associated symbol table
and sh_info the "section to which the relocation applies."
When we set the sh_link field, for the regular case we use the .dynsym index.
For .rela.iplt sections, it is unclear what is the associated symbol table,
because R_*_RELATIVE relocations do not use symbol names and we might have no
.dynsym section at all so this patch uses .symtab section index.
Differential revision: https://reviews.llvm.org/D52830
llvm-svn: 344226
Previously, we uncompress all compressed sections before doing anything.
That works, and that is conceptually simple, but that could results in
a waste of CPU time and memory if uncompressed sections are then
discarded or just copied to the output buffer.
In particular, if .debug_gnu_pub{names,types} are compressed and if no
-gdb-index option is given, we wasted CPU and memory because we
uncompress them into newly allocated bufers and then memcpy the buffers
to the output buffer. That temporary buffer was redundant.
This patch changes how to uncompress sections. Now, compressed sections
are uncompressed lazily. To do that, `Data` member of `InputSectionBase`
is now hidden from outside, and `data()` accessor automatically expands
an compressed buffer if necessary.
If no one calls `data()`, then `writeTo()` directly uncompresses
compressed data into the output buffer. That eliminates the redundant
memory allocation and redundant memcpy.
This patch significantly reduces memory consumption (20 GiB max RSS to
15 Gib) for an executable whose .debug_gnu_pub{names,types} are in total
5 GiB in an uncompressed form.
Differential Revision: https://reviews.llvm.org/D52917
llvm-svn: 343979
Summary: The convenience wrapper in STLExtras is available since rL342102.
Reviewers: ruiu, espindola
Subscribers: emaste, arichardson, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D52569
llvm-svn: 343146
When we write a struct to a mmap'ed buffer, we usually use
write16/32/64, but we didn't for VersionDefinitionSection, so
we needed to template that class.
llvm-svn: 343024
Previously, if you invoke lld's `main` more than once in the same process,
the second invocation could fail or produce a wrong result due to a stale
pointer values of the previous run.
Differential Revision: https://reviews.llvm.org/D52506
llvm-svn: 343009
tolower() has some overhead because current locale is considered (though in lld the default "C" locale is used which does not matter too much). llvm::toLower is more efficient as it compiles to a compare and a conditional jump, as opposed to a libc call if tolower is used.
Disregarding locale also matches gdb's behavior (gdb/minsyms.h):
#define SYMBOL_HASH_NEXT(hash, c) \
((hash) * 67 + TOLOWER ((unsigned char) (c)) - 113)
where TOLOWER (include/safe-ctype.h) is a macro that uses a lookup table under the hood which is similar to llvm::toLower.
Reviewers: ruiu, espindola
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D52128
llvm-svn: 342342
Once we create .gdb_index contents, .zdebug_gnu_pub{names,types}
are useless, so there's no need to keep their uncompressed data
in memory.
I observed that for a test case in which lld creates a 3GB .gdb_index
section, the maximum resident set size reduced from 43GB to 29GB after
this patch.
Differential Revision: https://reviews.llvm.org/D52126
llvm-svn: 342297
-z interpose sets the DF_1_INTERPOSE flag, marking the object as an
interposer.
Via FreeBSD PR 230604, linking Valgrind with lld failed.
Differential Revision: https://reviews.llvm.org/D52094
llvm-svn: 342239
This fixes the following warning when compiling with gcc version 8.0.1 20180319 (experimental) (GCC):
/home/umb/LLVM/llvm/tools/lld/ELF/SyntheticSections.cpp:1951:46: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
return OS->SectionIndex >= SHN_LORESERVE ? SHN_XINDEX : OS->SectionIndex;
llvm-svn: 340164
Summary: This makes it conform to what the comment says. Otherwise when getErrPlace() is called afterwards, cast<InputSection>(D) will cause incompatible cast as MergeInputSection is not a subclass of InputSection.
Reviewers: ruiu, grimar, espindola, pcc
Reviewed By: grimar
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D50742
llvm-svn: 339904
It turns out that postThunkContents() is only used for
sorting symbols in .symtab.
Though we can instead move the logic to SymbolTableBaseSection::finalizeContents(),
postpone calling it and then get rid of postThunkContents completely.
Differential revision: https://reviews.llvm.org/D49547
llvm-svn: 339413
If we fail to merge a secondary GOT with the primary GOT but so far only
one merged GOT has been created (the primary one), the final element in
MergedGots is the primary GOT. Thus we should not try to merge with this
final element passing IsPrimary=false, since this will ignore the fact
that the destination GOT does in fact need a header, and those extra two
entries can be enough to allow the merge to incorrectly occur. Instead
we should check for this case before attempting the second merge.
Patch by James Clarke.
Differential revision: https://reviews.llvm.org/D49422
llvm-svn: 337810
Signed values for the FDE PC addr were not correctly handled in
readFdeAddr(). If the value is negative and the type of the value is
smaller than 64 bits, the FDE PC addr overflow error would be
incorrectly triggered.
Fixed readFdeAddr() to properly handle signed values by sign extending
where appropriate.
Differential Revision: https://reviews.llvm.org/D49557
llvm-svn: 337683
Currently, getFdePC() returns uint64_t. Its because the following
encodings might use 8 bytes: DW_EH_PE_absptr and DW_EH_PE_udata8.
But caller assigns returned value to uint32_t field:
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L508
Value is used for building .eh_frame_hdr section.
We use DW_EH_PE_sdata4 encoding for building it at this moment:
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L2545
And that means that an overflow issue might happen if
DW_EH_PE_absptr/DW_EH_PE_udata8 address encodings are present
in .eh_frame. In that case, before this patch, we silently would
truncate the address and produced broken .eh_frame_hdr section.
It would be not hard to support real 64-bit values for
DW_EH_PE_absptr/DW_EH_PE_udata8 encodings, but it is
unclear if it is usefull and if we should do it.
Since nobody faced/reported it, int this patch I only implement
a check to stop producing broken output silently for now.
llvm-svn: 337382
Code was dead because at the moment of BssSection creation
it can never have a parent. Also, code simply does not
make sence as alignment adjastment happens when
BssSection is added to its parent later.
llvm-svn: 337276
Since .gdb_index sections contain all known symbols, they can be very large.
One of my executables has a .gdb_index section of 1350 GiB. Uniquifying
symbols by name takes 3.77 seconds on my machine. This patch parallelize it.
Time to call createSymbols() with 8.4 million unique symbols:
Without this patch: 3773 ms
Parallelism = 1: 4374 ms
Parallelism = 2: 2628 ms
Parallelism = 16: 837 ms
As you can see above, this algorithm is a bit more inefficient
than the non-parallelized version, but even with dual-core, it is
faster than that, so I think it is overall a win.
Differential Revision: https://reviews.llvm.org/D49164
llvm-svn: 336790
This workaround is for GCC 5.4.1. Without this workaround, lld will
produce larger .gdb_index sections for object files compiled with the
buggy version of the compiler.
Since it is not for correctness, and it affects only debug builds (since
you are generating .gdb_index sections), perhaps the hack shouldn't have been
added in the first place. At least, I think it is time to remove this hack.
Differential Revision: https://reviews.llvm.org/D49149
llvm-svn: 336788
This patch merges createGdbIndex function and GdbIndexSection's
constructor into a single static member function of the class.
This patch also change how we keep CU vectors. Previously, CuVector
and GdbSymbols were parallel arrays, but there's no reason to choose that
design. Now, CuVector is a member of GdbSymbol class.
A lot of members are removed from GdbIndexSection. Previously, it has
members that need to be kept in sync over several phases. I belive the new
design is less error-prone, and the new code is much easier to read
than before.
llvm-svn: 336743
.gdb_index sections can be very large. When you are compiling
multi-gibibyte executables, they can be larger than 1 GiB. The previous
implementation of .gdb_index seems to consume too much memory.
This patch reduces memory consumption by eliminating temporary objects.
In one experiment, memory consumption of GdbIndexSection class is
reduced from 962 MiB to 228 MiB when creating a .gdb_index of 1350 GiB.
Differential Revision: https://reviews.llvm.org/D49094
llvm-svn: 336672
I believe the only way to test this functionality is to create extremely
large object files and attempt to create a .gdb_index that is greater
than 4 GiB. But I think that's too much for most environments and buildbots,
so I'm commiting this without a test that actually triggers the new
error condition.
llvm-svn: 336631
Previously, we didn't create multiple consecutive bitmaps.
Added a test to catch this bug too.
Differential Revision: https://reviews.llvm.org/D49107
llvm-svn: 336620
This patch also speeds it up by making some constants compile-time
constants. Other than that, NFC.
Differential Revision: https://reviews.llvm.org/D49101
llvm-svn: 336614
Patch by Rahul Chaudhry!
This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg
Pass '--pack-dyn-relocs=relr' to enable generation of SHT_RELR section
and DT_RELR, DT_RELRSZ, and DT_RELRENT dynamic tags.
Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!
Pass '--use-android-relr-tags' with '--pack-dyn-relocs=relr' to use
SHT_ANDROID_RELR section type instead of SHT_RELR, as well as
DT_ANDROID_RELR* dynamic tags instead of DT_RELR*. The generated
section contents are identical.
'--pack-dyn-relocs=android+relr --use-android-relr-tags' enables both
'--pack-dyn-relocs=android' and '--pack-dyn-relocs=relr': lld will
encode the relative relocations in a SHT_ANDROID_RELR section, and pack
the rest of the dynamic relocations in a SHT_ANDROID_REL(A) section.
Differential Revision: https://reviews.llvm.org/D48247
llvm-svn: 336594
The local dynamic TLS access on PPC64 ELF v2 ABI uses R_PPC64_GOT_DTPREL16*
relocations when a TLS variables falls outside 2 GB of the thread storage
block. This patch adds support for these relocations by adding a new RelExpr
called R_TLSLD_GOT_OFF which emits a got entry for the TLS variable relative
to the dynamic thread pointer using the relocation R_PPC64_DTPREL64. It then
evaluates the R_PPC64_GOT_DTPREL16* relocations as the got offset for the
R_PPC64_DTPREL64 got entries.
Differential Revision: https://reviews.llvm.org/D48484
llvm-svn: 335732
rLLD329787 added the stable sorting to SymbolTableBaseSection::postThunkContents.
I profiled the Mozilla (response-O0.txt) from lld-speed-test package and found
std::stable_sort is showing up in profile results and consuming the 3.1% of the total
CPU time in the RelWithDebug build. Total time of postThunkContents is 3.54%, 238ms.
This change reduces postTimeContents time to 50ms, making it to take 0.73% of Total CPU time.
So, instead of sorting the local part I suggest to just rebuild it.
That is what this patch does.
Differential revision: https://reviews.llvm.org/D45519
llvm-svn: 335583
While building a Global Offset Table try to fill the primary GOT as much
as possible because the primary GOT can be accessed in the most
effective way. If it is not possible, try to fill the last GOT in the
multi-GOT list, and finally create a new GOT if both attempts failed.
llvm-svn: 335140
Summary:
Previously LLD would not add any dynamic relocations and write a module
index of 1 which is not correct for the shared library case.
This can happen when a thread-local global variable is marked as local with
a version script. With this change I am now able to link all of the FreeBSD
base system for MIPS64 with LLD.
Differential Revision: https://reviews.llvm.org/D48002
llvm-svn: 334483
Almost all entries inside MIPS GOT are referenced by signed 16-bit
index. Zero entry lies approximately in the middle of the GOT. So the
total number of GOT entries cannot exceed ~16384 for 32-bit architecture
and ~8192 for 64-bit architecture. This limitation makes impossible to
link rather large application like for example LLVM+Clang. There are two
workaround for this problem. The first one is using the -mxgot
compiler's flag. It enables using a 32-bit index to access GOT entries.
But each access requires two assembly instructions two load GOT entry
index to a register. Another workaround is multi-GOT. This patch
implements it.
Here is a brief description of multi-GOT for detailed one see the
following link https://dmz-portal.mips.com/wiki/MIPS_Multi_GOT.
If the sum of local, global and tls entries is less than 64K only single
got is enough. Otherwise, multi-got is created. Series of primary and
multiple secondary GOTs have the following layout:
```
- Primary GOT
Header
Local entries
Global entries
Relocation only entries
TLS entries
- Secondary GOT
Local entries
Global entries
TLS entries
...
```
All GOT entries required by relocations from a single input file
entirely belong to either primary or one of secondary GOTs. To reference
GOT entries each GOT has its own _gp value points to the "middle" of the
GOT. In the code this value loaded to the register which is used for GOT
access.
MIPS 32 function's prologue:
```
lui v0,0x0
0: R_MIPS_HI16 _gp_disp
addiu v0,v0,0
4: R_MIPS_LO16 _gp_disp
```
MIPS 64 function's prologue:
```
lui at,0x0
14: R_MIPS_GPREL16 main
```
Dynamic linker does not know anything about secondary GOTs and cannot
use a regular MIPS mechanism for GOT entries initialization. So we have
to use an approach accepted by other architectures and create dynamic
relocations R_MIPS_REL32 to initialize global entries (and local in case
of PIC code) in secondary GOTs. But ironically MIPS dynamic linker
requires GOT entries and correspondingly ordered dynamic symbol table
entries to deal with dynamic relocations. To handle this problem
relocation-only section in the primary GOT contains entries for all
symbols referenced in global parts of secondary GOTs. Although the sum
of local and normal global entries of the primary got should be less
than 64K, the size of the primary got (including relocation-only entries
can be greater than 64K, because parts of the primary got that overflow
the 64K limit are used only by the dynamic linker at dynamic link-time
and not by 16-bit gp-relative addressing at run-time.
The patch affects common LLD code in the following places:
- Added new hidden -mips-got-size flag. This flag required to set low
maximum size of a single GOT to be able to test the implementation using
small test cases.
- Added InputFile argument to the getRelocTargetVA function. The same
symbol referenced by GOT relocation from different input file might be
allocated in different GOT. So result of relocation depends on the file.
- Added new ctor to the DynamicReloc class. This constructor records
settings of dynamic relocation which used to adjust address of 64kb page
lies inside a specific output section.
With the patch LLD is able to link all LLVM+Clang+LLD applications and
libraries for MIPS 32/64 targets.
Differential revision: https://reviews.llvm.org/D31528
llvm-svn: 334390
Adds support for .glink resolver stubs from the example implementation in the V2
ABI (Section 4.2.5.3. Procedure Linkage Table). The stubs are written to the
PltSection, and the sections are renamed to match the PPC64 ABI:
.got.plt --> .plt Type = SHT_NOBITS
.plt --> .glink
And adds the DT_PPC64_GLINK dynamic tag to the dynamic section when the plt is
not empty.
Differential Revision: https://reviews.llvm.org/D45642
llvm-svn: 331840
Some MIPS relocations depend on "gp" value. By default, this value has
0x7ff0 offset from a .got section. But relocatable files produced by a
compiler or a linker might redefine this default value and we have to
use it for a calculation of the relocation result. When we generate EXE
or DSO it's trivial. Generating a relocatable output is more difficult
case because the linker does calculate relocations in this case and
cannot store individual "gp" values used by each input object file.
As a workaround we add the "gp" value to the relocation addend.
This fixes https://llvm.org/pr31149
Differential revision: https://reviews.llvm.org/D45972
llvm-svn: 331772
Summary: This is not technically required, but glibc unwind-dw2-fde.c classify_object_over_fdes expects there is a CIE record length 0 as a terminator.
Reviewers: ruiu, espindola
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D46566
llvm-svn: 331708
These maps are small, but we are creating an destroying one for each
input .eh_frame.
This patch reduces the total memory allocation from 765.54MB to
749.19MB. The peak is still the same: 563.7MB.
llvm-svn: 331075
Now that getSectionPiece is fast (uses a hash) it is probably OK to
split merge sections early.
The reason I want to do this is to split eh_frame sections in the same
place.
This does mean that we have to decompress early. Given that the only
compressed sections are debug info, I don't think we are missing much.
It is a small improvement: 0.5% on the geometric mean.
llvm-svn: 331058
This is slightly simpler to read IMHO. Now if a symbol has a position
in the file, it is Defined.
The main motivation is that with this a SharedSymbol doesn't need a
section, which reduces the size of SymbolUnion.
With this the peak allocation when linking chromium goes from 568.1 to
564.2 MB.
llvm-svn: 330966
It was always an offset of PltIndex.
This doesn't reduce the size of the structures, but makes it easier to
do so in a followup patch.
llvm-svn: 330953
It turns out we should not use the std::sort anymore.
r327219 added a new wrapper llvm::sort (D39245).
When EXPENSIVE_CHECKS is defined, it shuffles the
input container and that helps to find non-deterministic
ordering.
Patch changes code to use llvm::sort and std::stable_sort
instead of std::sort
Differential revision: https://reviews.llvm.org/D45969
llvm-svn: 330702
MIPS ABI requires creation of the MIPS_RLD_MAP dynamic tag for non-PIE
executables only and MIPS_RLD_MAP_REL tag for both PIE and non-PIE
executables. The patch skips definition of the MIPS_RLD_MAP for PIE
files and defines MIPS_RLD_MAP_REL.
The MIPS_RLD_MAP_REL tag stores the offset to the .rld_map section
relative to the address of the tag itself.
Differential Revision: https://reviews.llvm.org/D43347
llvm-svn: 329996
This is similar to r329219, but for the entire section. Like r329219 I
don't expect this to have any real impact, it is just more consistent
and simpler.
llvm-svn: 329367
In the lld perf builder r328686 had a negative impact in
stalled-cycles-frontend. Somehow that stat is not showing on my
machine, but the attached patch shows an improvement on cache-misses,
which is probably a reasonable proxy.
My working theory is that given a large input the pieces vector is out
of cache by the time initOffsetMap runs.
Both finalizeContents implementation have a convenient location for
initializing the OffsetMap, so this seems the best solution.
llvm-svn: 329117
Also make certain Thunk methods non-const as this will be required for
an upcoming change.
Differential Revision: https://reviews.llvm.org/D44961
llvm-svn: 328732
When the target saves ElfSym::GlobalOffsetTable in the .got rather than
.got.plt, Target->GotHeaderEntriesNum states the number of extra entries
required in the .got. Rather than having to add Target->GotHeaderEntriesNum to
NumEntries in every function which refers to NumEntries, this patch changes the
initial value of NumEntries in the constructor.
Differential Revision: https://reviews.llvm.org/D44744
llvm-svn: 328559
Previously, we used 0 as an alias for VER_NDX_GLOBAL and had a dummy
entry in SharedFile::Verdefs so that the access to the array is within
its boundary. But that's not straightforwad. We can just stop doing both.
llvm-svn: 328401
Choosing a Shift2 value based on wordsize is cargo-culted from gold.
Assuming that djb hash is a good hash function, choosing bits [4,9]
shouldn't be any worse or better than choosing bits [5,10]. We shouldn't
have copied that behavior that we can't justify in the first place.
Differential Revision: https://reviews.llvm.org/D44547
llvm-svn: 327921
This patch adds changes to start supporting the Power 64-Bit ELF V2 ABI.
This includes:
- Changing the ElfSym::GlobalOffsetTable to be named .TOC.
- Creating a GotHeader so the first entry in the .got is .TOC.
- Setting the e_flags to be 1 for ELF V1 and 2 for ELF V2
Differential Revision: https://reviews.llvm.org/D44483
llvm-svn: 327871
This is the same as 327248 except Arm defining _GLOBAL_OFFSET_TABLE_ to
be the base of the .got section as some existing code is relying upon it.
For most Targets the _GLOBAL_OFFSET_TABLE_ symbol is expected to be at
the start of the .got.plt section so that _GLOBAL_OFFSET_TABLE_[0] =
reserved value that is by convention the address of the dynamic section.
Previously we had defined _GLOBAL_OFFSET_TABLE_ as either the start or end
of the .got section with the intention that the .got.plt section would
follow the .got. However this does not always hold with the current
default section ordering so _GLOBAL_OFFSET_TABLE_[0] may not be consistent
with the reserved first entry of the .got.plt.
X86, X86_64 and AArch64 will use the .got.plt. Arm, Mips and Power use .got
Fixes PR36555
Differential Revision: https://reviews.llvm.org/D44259
llvm-svn: 327823
This change broke ARM code that expects to be able to add
_GLOBAL_OFFSET_TABLE_ to the result of an R_ARM_REL32.
I will provide a reproducer on llvm-commits.
llvm-svn: 327688
Currently, we can end up with NBuckets==0 and android loader
does not like it (PR36537).
Seems we can go with a minimal amount of changes here for simplicity
and be consistent with gold and so just always use >= 1 value for NBuckets.
Differential revision: https://reviews.llvm.org/D44422
llvm-svn: 327481
the start of the .got.plt section so that _GLOBAL_OFFSET_TABLE_[0] =
reserved value that is by convention the address of the dynamic section.
Previously we had defined _GLOBAL_OFFSET_TABLE_ as either the start or end
of the .got section with the intention that the .got.plt section would
follow the .got. However this does not always hold with the current
default section ordering so _GLOBAL_OFFSET_TABLE_[0] may not be consistent
with the reserved first entry of the .got.plt.
X86, X86_64, Arm and AArch64 will use the .got.plt. Mips and Power use .got
Fixes PR36555
Differential Revision: https://reviews.llvm.org/D44259
llvm-svn: 327248
Summary:
If an executable needs text relocations, it should be marked as such so
that the loader can prepare for text relocations. We currently create a
dummy segment with DT_TEXTREL for that purpose.
Generic ABI as of 2000 [1] mentioned that "Its [DT_TEXTREL's] use
has been superseded by the DF_TEXTREL flag". However, it's actually not
superseded even after 18 years. OpenBSD and musl recognize only DT_TEXTREL.
So we still need to set both.
[1] http://www.sco.com/developers/gabi/2000-07-17/ch5.dynamic.html
Reviewers: rafael
Subscribers: emaste, llvm-commits, arichardson
Differential Revision: https://reviews.llvm.org/D43920
llvm-svn: 326503
This should resolve the issue that lld build fails in some hosts
that uses case-insensitive file system.
Differential Revision: https://reviews.llvm.org/D43788
llvm-svn: 326339
We sometimes need to iterate over input sections for a given
output section. It is not very convinent because we have to iterate
over section descriptions.
Patch introduces getInputSections helper, it simplifies things.
Differential revision: https://reviews.llvm.org/D43574
llvm-svn: 325763
Summary:
Before the name of the function sounded like it was just a getter for the
private class member Addend. However, it actually calculates the final
value for the r_addend field in Elf_Rela that is used when writing the
.rela.dyn section. I also added a comment to the UseSymVA member to
explain how it interacts with computeAddend().
Differential Revision: https://reviews.llvm.org/D43161
llvm-svn: 325485
Now that we have R_ADDEND, UseSymVA was redundant. We only want to
write the symbol virtual address when using an expression other than
R_ADDEND.
llvm-svn: 325360
Summary:
This follows up on r321889 where writing of Elf_Rel addends was partially
moved to RelocationBaseSection. This patch ensures that the addends are
always written to the output section when a input section uses RELA but the
output is REL.
Differential Revision: https://reviews.llvm.org/D42843
llvm-svn: 325328
When resolving dynamic RELA relocations the addend is taken from the
relocation and not the place being relocated. Accordingly lld does not
write the addend field to the place like it would for a REL relocation.
Unfortunately there is some system software, in particlar dynamic loaders
such as Bionic's linker64 that use the value of the place prior to
relocation to find the offset that they have been loaded at. Both gold
and bfd control this behavior with the --[no-]apply-dynamic-relocs option.
This change implements the option and defaults it to true for compatibility
with gold and bfd.
Differential Revision: https://reviews.llvm.org/D42797
llvm-svn: 324221
Compiler doesn't know the fact that Config->WordSize * 8 is always a
power of two, so it had to use the div instruction to divide some
number with C.
llvm-svn: 323014
I created https://reviews.llvm.org/D42202 to see how large the bloom
filter should be. With that patch, I tested various bloom filter sizes
with the following commands:
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_LLD=true \
-DLLVM_ENABLE_PROJECTS='clang;lld' -DBUILD_SHARED_LIBS=ON \
-DCMAKE_SHARED_LINKER_FLAGS=-Wl,-bloom-filter-bits=<some integer> \
../llvm-project/llvm
$ rm -f $(find . -name \*.so.7.0.0svn)
$ ninja lld
$ LD_BIND_NOW=1 perf stat bin/ld.lld
Here is the result:
-bloom-filter-bits=8 0.220351609 seconds
-bloom-filter-bits=10 0.217146597 seconds
-bloom-filter-bits=12 0.206870826 seconds
-bloom-filter-bits=16 0.209456312 seconds
-bloom-filter-bits=32 0.195092075 seconds
Currently we allocate 8 bits for a symbol, but according to the above
result, that number is not optimal. Even though the numbers follow the
diminishing return rule, the point where a marginal improvement becomes
too small is not -bloom-filter-bits=8 but 12. So this patch sets it to 12.
Differential Revision: https://reviews.llvm.org/D42204
llvm-svn: 323010
Previously we checked (HeaderSize == 0) to find out if
PltSection section is IPLT or PLT. Some targets does not set
HeaderSize though. For example PPC64 has no lazy binding implemented
and does not set PltHeaderSize constant.
Because of that using of both IPLT and PLT relocations worked
incorrectly there (testcase is provided).
Patch fixes the issue.
Differential revision: https://reviews.llvm.org/D41613
llvm-svn: 322362
Summary:
As reported in bug 35788, rL316280 reintroduces a race between two
members of SectionPiece, which share the same 64 bit memory location.
To fix the race, check the hash before checking the Live member, as
suggested by Rafael.
Reviewers: ruiu, rafael
Reviewed By: ruiu
Subscribers: smeenai, emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D41884
llvm-svn: 322264
When setting up the chain, we copy over the bucket's previous symbol
index, assuming that this index will be 0 (STN_UNDEF) for an unused
bucket (marking the end of the chain). When linking with --no-rosegment,
however, unused buckets will in fact contain the padding value, and so
the hash table will end up containing invalid chains. Zero out the hash
table section explicitly to avoid this, similar to what's already done
for GNU hash sections.
Differential Revision: https://reviews.llvm.org/D41928
llvm-svn: 322259
This is "Bug 35751 - .dynamic relocation entries omitted if output
contains only IFUNC relocations"
We have InX::RelaPlt and InX::RelaIPlt synthetic sections for PLT relocations.
They are usually live in rela.plt section. Problem appears when InX::RelaPlt
section is empty. In that case we did not produce normal set of dynamic tags
required, because logic was written in the way assuming we always have
non-IRelative relocations in rela.plt.
Patch fixes the issue.
Differential revision: https://reviews.llvm.org/D41592
llvm-svn: 321600
This was raised in comments for D41592.
With current code we always assign parent
section for Rel[a] sections like
InX::RelaPlt or InX::RelaDyn, so checking
their parent for null is excessive.
llvm-svn: 321581
We add dynamic section entries both in the ctor of the class and
DynamicSection::finalizeContents(). Some entries need to be added early
in the ctor because they add strings to .dynstr. Other entries were
intended to be added in finalizeContents(). However, some entries are
added in the ctor even though they don't add strings. This patch
fix the issue.
llvm-svn: 320851
We might crash in 'ARMExidxSentinelSection::writeTo()' because it expected
the sentinel entry to be put in the same 'InputSectionDescription' as
the last real entry. This assumption fails if the last output section command
for .ARM.exidx is anything but an input section description, because in this
case 'OutputSection::addSection()' creates a new 'InputSectionDescription'.
Differential Revision: https://reviews.llvm.org/D41105
llvm-svn: 320668
By using an index instead of a pointer for verdef we can put the index
next to the alignment field. This uses the otherwise wasted area and
reduces the shared symbol size.
By itself the performance change of this is in the noise, but I have a
followup patch to remove another 8 bytes that improves performance
when combined with this.
llvm-svn: 320449
We fill executable sections with trap instructions (0xcc or equivalent).
If a .gnu.hash section was put into an executable segment, we created
corrupted .gnu.hash section. This patch fixes the issue.
llvm-svn: 319863
This change actually makes the linker slightly faster. My observation
is that, with this patch, link time of clang without debug is about 1%
faster.
Differential Revision: https://reviews.llvm.org/D40697
llvm-svn: 319600
This is for "Bug 35474 - --emit-relocs produces wrongly-named reloc sections".
LLD currently for scripts like:
.text.boot : { *(.text.boot) }
emits relocation section with name .rela.text because does not take
redefined name of output section into account and builds section name
using rules for non-scripted case. Patch fixes this oddness.
Differential revision: https://reviews.llvm.org/D40652
llvm-svn: 319526
The MIPS GOT section has a number of local entries based on the number of pages
needed for output sections referenced by GOT page relocations. The number is
recorded in the DT_MIPS_LOCAL_GOTNO dynamic section tag. However, the dynamic tag
is added before assignAddresses has been called, meaning that any section size used
to calculate the value will not include size modifications caused by, for example,
linker scripts and thunks.
This change moves the calculation of DT_MIPS_LOCAL_GOTNO until writeTo, by which
time the output section sizes have been finalized.
Reviewers: ruiu, rafael
Differential Revision: https://reviews.llvm.org/D39493
llvm-svn: 318828
No difference in practice other than having sh_entsize in the output.
This should simplify the patch for handling SHF_MERGE in -r.
Based on a patch by George Rimar.
llvm-svn: 318306
microMIPS symbols including microMIPS PLT records created for regular
symbols needs to be marked by STO_MIPS_MICROMIPS flag in a symbol table.
Additionally microMIPS entries in a dynamic symbol table should have
configured less-significant bit. That allows to escape teaching a
dynamic linker about microMIPS symbols.
llvm-svn: 318097