Branch Target Identification (BTI) and Pointer Authentication (PAC) are
architecture features introduced in v8.5a and 8.3a respectively. The new
instructions have been added in the hint space so that binaries take
advantage of support where it exists yet still run on older hardware. The
impact of each feature is:
BTI: For executable pages that have been guarded, all indirect branches
must have a destination that is a BTI instruction of the appropriate type.
For the static linker, this means that PLT entries must have a "BTI c" as
the first instruction in the sequence. BTI is an all or nothing
property for a link unit, any indirect branch not landing on a valid
destination will cause a Branch Target Exception.
PAC: The dynamic loader encodes with PACIA the address of the destination
that the PLT entry will load from the .plt.got, placing the result in a
subset of the top-bits that are not valid virtual addresses. The PLT entry
may authenticate these top-bits using the AUTIA instruction before
branching to the destination. Use of PAC in PLT sequences is a contract
between the dynamic loader and the static linker, it is independent of
whether the relocatable objects use PAC.
BTI and PAC are independent features that can be combined. So we can have
several combinations of PLT:
- Standard with no BTI or PAC
- BTI PLT with "BTI c" as first instruction.
- PAC PLT with "AUTIA1716" before the indirect branch to X17.
- BTIPAC PLT with "BTI c" as first instruction and "AUTIA1716" before the
first indirect branch to X17.
The use of BTI and PAC in relocatable object files are encoded by feature
bits in the .note.gnu.property section in a similar way to Intel CET. There
is one AArch64 specific program property GNU_PROPERTY_AARCH64_FEATURE_1_AND
and two target feature bits defined:
- GNU_PROPERTY_AARCH64_FEATURE_1_BTI
-- All executable sections are compatible with BTI.
- GNU_PROPERTY_AARCH64_FEATURE_1_PAC
-- All executable sections have return address signing enabled.
Due to the properties of FEATURE_1_AND the static linker can tell when all
input relocatable objects have the BTI and PAC feature bits set. The static
linker uses this to enable the appropriate PLT sequence.
Neither -> standard PLT
GNU_PROPERTY_AARCH64_FEATURE_1_BTI -> BTI PLT
GNU_PROPERTY_AARCH64_FEATURE_1_PAC -> PAC PLT
Both properties -> BTIPAC PLT
In addition to the .note.gnu.properties there are two new command line
options:
--force-bti : Act as if all relocatable inputs had
GNU_PROPERTY_AARCH64_FEATURE_1_BTI and warn for every relocatable object
that does not.
--pac-plt : Act as if all relocatable inputs had
GNU_PROPERTY_AARCH64_FEATURE_1_PAC. As PAC is a contract between the loader
and static linker no warning is given if it is not present in an input.
Two processor specific dynamic tags are used to communicate that a non
standard PLT sequence is being used.
DTI_AARCH64_BTI_PLT and DTI_AARCH64_BTI_PAC.
Differential Revision: https://reviews.llvm.org/D62609
llvm-svn: 362793
Many -static/-no-pie/-shared/-pie applications linked against glibc or musl
should work with this patch. This also helps FreeBSD PowerPC64 to migrate
their lib32 (PR40888).
* Fix default image base and max page size.
* Support new-style Secure PLT (see below). Old-style BSS PLT is not
implemented, so it is not suitable for FreeBSD rtld now because it doesn't
support Secure PLT yet.
* Support more initial relocation types:
R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16.
The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type
but it should be ignored for the computation of target symbol VA.
* Support GNU ifunc
* Support .glink used for lazy PLT resolution in glibc
* Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub.
It is used by R_PPC_REL24 and R_PPC_PLTREL24.
A PLT stub used in -fPIE/-fPIC usually loads an address relative to
.got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative
addresses).
Two .got2 sections in two object files have different addresses, thus a PLT stub
can't be shared by two object files. To handle this incompatibility,
change the parameters of Thunk::isCompatibleWith to
`const InputSection &, const Relocation &`.
PowerPC psABI specified an old-style .plt (BSS PLT) that is both
writable and executable. Linkers don't make separate RW- and RWE segments,
which causes all initially writable memory (think .data) executable.
This is a big security concern so a new PLT scheme (secure PLT) was developed to
address the security issue.
TLS will be implemented in D62940.
glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can
not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack
(not included in this patch) in LinkerScript.cpp addOrphanSections() to
work around the issue:
if (Config->EMachine == EM_PPC) {
// Older glibc assumes .rela.dyn includes .rela.plt
Add(In.RelaDyn);
if (In.RelaPlt->isLive() && !In.RelaPlt->Parent)
In.RelaDyn->getParent()->addSection(In.RelaPlt);
}
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D62464
llvm-svn: 362721
GotEntrySize and GotPltEntrySize were added in D22288. Later, with
the introduction of wordsize() (then Config->Wordsize), they become
redundant, because there is no target that sets GotEntrySize or
GotPltEntrySize to a number different from Config->Wordsize.
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D62727
llvm-svn: 362220
This change causes us to read partition specifications from partition
specification sections and split output sections into partitions according
to their reachability from partition entry points.
This is only the first step towards a full implementation of partitions. Later
changes will add additional synthetic sections to each partition so that
they can be loaded independently.
Differential Revision: https://reviews.llvm.org/D60353
llvm-svn: 361925
We currently sort dynamic relocations by (!is_relative,symbol_index).
Add r_offset as the third key. This makes `readelf -r` debugging easier
(relocations to the same symbol are ordered by r_offset).
Refactor the test combreloc.s (renamed from combrelocs.s) to check
R_X86_64_RELATIVE, and delete --expand-relocs.
The difference from the reverted D61477 is that we keep !is_relative as
the first key. In local dynamic TLS model, DTPMOD (e.g.
R_ARM_TLS_DTPMOD32 R_X86_64_DTPMOD and R_PPC{,64}_DTPMOD) may use 0 as
the symbol index.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D62141
llvm-svn: 361164
This reverts commit r361125. This linker change breaks shared libraries
in some subtle way on x86_64. (Specifically, gold segfaults when
loading the LLVMgold.so plugin linked with lldb with this patch.)
llvm-svn: 361150
Fixes PR41692.
We currently sort dynamic relocations by (!is_relative,symbol_index).
Change it to (symbol_index,r_offset). We still place relative
relocations first because R_*_RELATIVE are the only dynamic relocations
with 0 symbol index (except on MIPS, which doesn't use DT_REL[A]COUNT
anyway).
This makes `readelf -r` debugging easier (relocations to the same symbol
are ordered by r_offset).
Refactor the test combreloc.s (renamed from combrelocs.s) to check
R_X86_64_RELATIVE, and delete --expand-relocs.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D61477
llvm-svn: 361125
See D61891: llvm had a bug that might create invalid (DW_AT_low_pc,DW_AT_high_pc) pairs or range list entries due to missing DW_AT_addr_base.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D61889
llvm-svn: 360679
Make some small adjustment while touching the code: make parameters
const, use less_first(), etc.
Differential Revision: https://reviews.llvm.org/D60989
llvm-svn: 358943
Summary:
We access Live and OutputOff (which may share the same memory location)
concurrently in 2 parallelForEachN loops. Separating them avoids subtle
data races like D41884/PR35788. This patch places Live and Hash
together.
2 reasons this is appealing:
1) Hash is immutable. Live is almost read-only - only written once in MarkLive.cpp where
Hash is not accessed
2) we already discard low bits of Hash to decide ShardID. It doesn't
matter much if we make 32-bit Hash to 31-bit.
For a huge internal clang -O3 executable (1.6GiB),
`Strings` in StringTableBuilder::finalizeStringTable contains at most 310253 elements.
The expected number of pair-wise collisions 2^(-31) * C(310253,2) ~= 22.41 is too small to have a negative impact on performance.
Actually, my benchmark shows there is actually a minor performance improvement.
Differential Revision: https://reviews.llvm.org/D60765
llvm-svn: 358645
With partitions, each partition should have the same build id. This means
that the build id needs to be only computed once, otherwise we will end up
with different build ids in each partition as a result of the file contents
changing. This change moves the computation of the build id into Writer so
that it only happens once.
Differential Revision: https://reviews.llvm.org/D60342
llvm-svn: 358536
The typo was introduced to llvm MC in rL204769 (fixed in rL358247) and then to lld.
Also, for relocatable-many-sections.s, the size of .symtab changed at some point and the formula needs update.
llvm-svn: 358248
For partitions I intend to use the same set of version indexes in
each partition for simplicity. Since each partition will need its own
VersionNeedSection this will require moving the verneed tracking out of
VersionNeedSection. The way I've done this is to move most of the tracking
into SharedFile. What will eventually become the per-partition tracking
still lives in VersionNeedSection.
As a bonus the code gets a little simpler and more consistent with how we
handle verdef.
Differential Revision: https://reviews.llvm.org/D60307
llvm-svn: 357926
And rename the function to combineEhSections(). This makes the processing
of .ARM.exidx even more similar to .eh_frame and means that we can avoid an
additional loop over InputSections.
Differential Revision: https://reviews.llvm.org/D60026
llvm-svn: 357417
Summary:
Some synthetic sections can be empty while still being needed, thus they
can't be removed by removeUnusedSyntheticSections(). Rename this member
function to more appropriate isNeeded() with the opposite meaning.
No functional change intended.
Reviewers: ruiu, espindola
Reviewed By: ruiu
Subscribers: jhenderson, grimar, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59982
llvm-svn: 357377
Recommit r356666 with fixes for buildbot failure, as well as handling for
--emit-relocs, which we decide not to emit any relocation sections as the
table is already position independent and an offline tool can deduce the
relocations.
Instead of creating extra Synthetic .ARM.exidx sections to account for
gaps in the table, create a single .ARM.exidx SyntheticSection that can
derive the contents of the gaps from a sorted list of the executable
InputSections. This has the benefit of moving the ARM specific code for
SyntheticSections in SHF_LINK_ORDER processing and the table merging code
into the ARM specific SyntheticSection. This also makes it easier to create
EXIDX_CANTUNWIND table entries for executable InputSections that don't
have an associated .ARM.exidx section.
Fixes pr40277
Differential Revision: https://reviews.llvm.org/D59216
llvm-svn: 357160
Summary:
This should address remaining issues discussed in PR36555.
Currently R_GOT*_FROM_END are exclusively used by x86 and x86_64 to
express relocations types relative to the GOT base. We have
_GLOBAL_OFFSET_TABLE_ (GOT base) = start(.got.plt) but end(.got) !=
start(.got.plt)
This can have problems when _GLOBAL_OFFSET_TABLE_ is used as a symbol, e.g.
glibc dl_machine_dynamic assumes _GLOBAL_OFFSET_TABLE_ is start(.got.plt),
which is not true.
extern const ElfW(Addr) _GLOBAL_OFFSET_TABLE_[] attribute_hidden;
return _GLOBAL_OFFSET_TABLE_[0]; // R_X86_64_GOTPC32
In this patch, we
* Change all GOT*_FROM_END to GOTPLT* to fix the problem.
* Add HasGotPltOffRel to denote whether .got.plt should be kept even if
the section is empty.
* Simplify GotSection::empty and GotPltSection::empty by setting
HasGotOffRel and HasGotPltOffRel according to GlobalOffsetTable early.
The change of R_386_GOTPC makes X86::writePltHeader simpler as we don't
have to compute the offset start(.got.plt) - Ebx (it is constant 0).
We still diverge from ld.bfd (at least in most cases) and gold in that
.got.plt and .got are not adjacent, but the advantage doing that is
unclear.
Reviewers: ruiu, sivachandra, espindola
Subscribers: emaste, mehdi_amini, arichardson, dexonsmith, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59594
llvm-svn: 356968
Previously, `Entries` contains pairs of symbols and their indices.
The indices are always 0, x, 2x, 3x, ..., where x is the size of
relocation entry. We didn't have to store that values because we can
compute them when we consume them.
llvm-svn: 356812
There is a reproducible buildbot failure (segfault) on the 2 stage
clang-cmake-armv8-lld bot. Reverting while I investigate.
Differential Revision: https://reviews.llvm.org/D59216
llvm-svn: 356684
Instead of creating extra Synthetic .ARM.exidx sections to account for
gaps in the table, create a single .ARM.exidx SyntheticSection that can
derive the contents of the gaps from a sorted list of the executable
InputSections. This has the benefit of moving the ARM specific code for
SyntheticSections in SHF_LINK_ORDER processing and the table merging code
into the ARM specific SyntheticSection. This also makes it easier to create
EXIDX_CANTUNWIND table entries for executable InputSections that don't
have an associated .ARM.exidx section.
Fixes pr40277
Differential Revision: https://reviews.llvm.org/D59216
llvm-svn: 356666
Summary:
This implements Rui Ueyama's idea in PR39044.
I've checked that ld.bfd and gold do not have the power-of-2 requirement
and do not require sh_entsize to be a multiple of sh_align.
Now on the updated test merge-entsize.s, all the 3 linkers happily
create .rodata that is not 3-byte aligned.
This has a use case in Linux arch/x86/crypto/sha512-avx2-asm.S
It uses sh_entsize of 640, which is not a power of 2.
See https://github.com/ClangBuiltLinux/linux/issues/417
Reviewers: ruiu, espindola
Reviewed By: ruiu
Subscribers: nickdesaulniers, E5ten, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59478
llvm-svn: 356428
This does not appear to be necessary because StringTableSection does not
need to be finalized, which also means that we can remove the call to
finalizeSynthetic on .dynstr.
Differential Revision: https://reviews.llvm.org/D59240
llvm-svn: 355977
We're going to need a separate VersionNeedSection for each partition, and
the partition data structure won't be templated.
With this the VersionTableSection class no longer needs ELFT, so detemplate it.
Differential Revision: https://reviews.llvm.org/D58808
llvm-svn: 355478
This lets us remove the special case from Writer::writeSections(), and also
fixes a bug where .eh_frame_hdr isn't necessarily written in the correct
order if a linker script moves .eh_frame and .eh_frame_hdr into the same
output section.
Differential Revision: https://reviews.llvm.org/D58795
llvm-svn: 355153
RelocationBaseSection is not used in -r links, so Config->Relocatable will
always be false.
Differential Revision: https://reviews.llvm.org/D58489
llvm-svn: 354607
The patch solves two tasks:
1. MIPS ABI allows to mix regular and microMIPS code and perform
cross-mode jumps. Linker needs to detect such cases and replace
jump/branch instructions by their cross-mode equivalents.
2. Other tools like dunamic linkers need to recognize cases when dynamic
table entries, e_entry field of an ELF header etc point to microMIPS
symbol. Linker should provide such information.
The first task is implemented in the `MIPS<ELFT>::relocateOne()` method.
New routine `fixupCrossModeJump` detects ISA mode change, checks and
replaces an instruction.
The main problem is how to recognize that relocation target is microMIPS
symbol. For absolute and section symbols compiler or assembler set the
less-significant bit of the symbol's value or sum of the symbol's value
and addend. And this bit signals to linker about microMIPS code. For
global symbols compiler cannot do the same trick because other tools like,
for example, disassembler wants to know an actual position of the symbol.
So compiler sets STO_MIPS_MICROMIPS flag in the `st_other` field.
In `MIPS<ELFT>::relocateOne()` method we have a symbol's value only and
cannot access any symbol's attributes. To pass type of the symbol
(regular/microMIPS) to that routine as well as other places where we
write a symbol value as-is (.dynamic section, `Elf_Ehdr::e_entry` field
etc) we set when necessary a less-significant bit in the `getSymVA`
function.
Differential revision: https://reviews.llvm.org/D40147
llvm-svn: 354311
On PowerPC64, it is necessary to keep the LocalEntry bits in st_other,
especially when -r is used. Otherwise, when the resulting object is used
in a posterior linking, LocalEntry info will be unavailable and
functions may be called through the wrong entrypoint.
Patch by Leandro Lupori.
Differential Revision: https://reviews.llvm.org/D56782
llvm-svn: 354184
Non-GOT non-PLT relocations to non-preemptible ifuncs result in the
creation of a canonical PLT, which now takes the identity of the IFUNC
in the symbol table. This (a) ensures address consistency inside and
outside the module, and (b) fixes a bug where some of these relocations
end up pointing to the resolver.
Fixes (at least) PR40474 and PR40501.
Differential Revision: https://reviews.llvm.org/D57371
llvm-svn: 353981
With the following changes:
1) Compilation fix:
std::atomic<bool> HasStaticTlsModel = false; ->
std::atomic<bool> HasStaticTlsModel{false};
2) Adjusted the comment in code.
Initial commit message:
DF_STATIC_TLS flag indicates that the shared object or executable
contains code using a static thread-local storage scheme.
Patch checks if IE/LE relocations were used to check if the code uses
a static model. If so it sets the DF_STATIC_TLS flag.
Differential revision: https://reviews.llvm.org/D57749
----
Modified : /lld/trunk/ELF/Arch/X86.cpp
Modified : /lld/trunk/ELF/Config.h
Modified : /lld/trunk/ELF/SyntheticSections.cpp
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model1.s
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model2.s
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model3.s
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model4.s
Added : /lld/trunk/test/ELF/i386-static-tls-model.s
Modified : /lld/trunk/test/ELF/i386-tls-ie-shared.s
Modified : /lld/trunk/test/ELF/tls-dynamic-i686.s
Modified : /lld/trunk/test/ELF/tls-opt-iele-i686-nopic.s
llvm-svn: 353299
DF_STATIC_TLS flag indicates that the shared object or executable
contains code using a static thread-local storage scheme.
Patch checks if IE/LE relocations were used to check if the code uses
a static model. If so it sets the DF_STATIC_TLS flag.
Differential revision: https://reviews.llvm.org/D57749
llvm-svn: 353293
Previously we were setting it to the GotPlt output section, which is
incorrect on ARM where this section is in .got. In static binaries
this can lead to sh_info being set to -1 (because there is no .got.plt)
which results in various tools rejecting the output file.
Differential Revision: https://reviews.llvm.org/D57274
llvm-svn: 352413
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
When parsing CU ranges for gdb-index, handle the error (now propagated
up though the API lld is calling here - previously the error was
printed within the libDebugInfo API, not allowing lld to format or
handle the message at all) - including information about the object and
archive name, as well as failing the link.
llvm-svn: 349979
Summary:
For the 2-bit bloom filter, we currently pick the bits Hash%64 and Hash>>6%64 (Shift2=6), but bits [6:...] are also used to select a word, causing a loss of precision.
In this patch, we choose Shift2=26, with is suggested by Ambrose Feinstein.
Note, Shift2 is computed as maskbitslog2 in bfd/elflink.c and gold/dynobj.cc
It is varying with the number of dynamic symbols but we don't
necessarily copy its rule.
Reviewers: ruiu, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D55971
llvm-svn: 349966
Summary:
This reinstates what I originally intended to do in D54361.
It removes the assumption that .debug_gnu_pubnames has increasing CuOffset.
Now we do better than gold here: when .debug_gnu_pubnames contains
multiple sets, gold would think every set has the same CU index as the
first set (incorrect).
Reviewed By: ruiu
Reviewers: ruiu, dblaikie, espindola
Subscribers: emaste, arichardson, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54483
llvm-svn: 347820
The DT_PLTRELSZ dynamic tag is calculated using the size of the
OutputSection containing the In.RelaPlt InputSection. This will work for the
default no linker script case and the majority of linker scripts.
Unfortunately it doesn't work for some 'almost' sensible linker scripts. It
is permitted by ELF to have a single OutputSection containing both
In.RelaDyn, In.RelaPlt and In.RelaIPlt. It is also permissible for the range
of memory [DT_RELA, DT_RELA + DT_RELASZ) and the range
[DT_JMPREL, DT_JMPREL + DT_JMPRELSZ) to overlap as long as the the latter
range is at the end.
To support this type of linker script use the specific InputSection sizes.
Fixes pr39678
Differential Revision: https://reviews.llvm.org/D54759
llvm-svn: 347736
This is https://bugs.llvm.org//show_bug.cgi?id=38978
Spec says that:
"Objects may be built with the -z nodefaultlib option to
suppress any search of the default locations at runtime.
Use of this option implies that all the dependencies of an
object can be located using its runpaths.
Without this option, which is the most common case, no
matter how you augment the runtime linker's library
search path, its last element is always /usr/lib for 32-bit
objects and /usr/lib/64 for 64-bit objects."
The patch implements this option.
Differential revision: https://reviews.llvm.org/D54577
llvm-svn: 347647
We explicitly call finalizeContents() only once for
DynamicSection. The code testing we do not do it twice is
just excessive.
It could be an assert, but we don't do
that for other sections, so does not seem we
should do it here too.
llvm-svn: 347543
Summary:
This fixes PR39711: -static -z retpolineplt does not produce retpoline PLT header.
-z now is not relevant.
Statically linked executable does not have PLT, but may have IPLT with no header. When -z retpolineplt is specified, however, the repoline PLT header should still be emitted.
I've checked that this fixes the FreeBSD reproduce in PR39711 and a Linux program statically linked against glibc. The programm print "Hi" rather than SIGILL/SIGSEGV.
getPltEntryOffset may look dirty after this patch, but it can be cleaned up later.
Another possible improvement is that when there are non-preemptible IFUNC symbols (rare case, e.g. -Bsymbolic), both In.Plt and In.Iplt can be non-empty and we'll emit the retpoline PLT header twice.
Reviewers: espindola, emaste, chandlerc, ruiu
Reviewed By: emaste
Subscribers: emaste, arichardson, krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D54782
llvm-svn: 347404
On PowerPC64, when a function call offset is too large to encode in a call
instruction the address is stored in a table in the data segment. A thunk is
used to load the branch target address from the table relative to the
TOC-pointer and indirectly branch to the callee. When linking position-dependent
code the addresses are stored directly in the table, for position-independent
code the table is allocated and filled in at load time by the dynamic linker.
For position-independent code the branch targets could have gone in the .got.plt
but using the .branch_lt section for both position dependent and position
independent binaries keeps it consitent and helps keep this PPC64 specific logic
seperated from the target-independent code handling the .got.plt.
Differential Revision: https://reviews.llvm.org/D53408
llvm-svn: 346877
Summary:
NameTypeEntry::Type is a bit-packed value of CU index+attributes (https://sourceware.org/gdb//onlinedocs/gdb/Index-Section-Format.html), which is named cu_index_and_attrs in a local variable in gdb/dwarf2read.c:dw2_symtab_iter_next
The new name CuIndexAndAttrs is more meaningful.
Reviewers: ruiu, dblaikie, espindola
Reviewed By: dblaikie
Subscribers: emaste, aprantl, arichardson, JDevlieghere, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54481
llvm-svn: 346794
Summary:
Idx passed to readPubNamesAndTypes was an index into Chunks, not an
index into the CU list. This would be incorrect if some .debug_info
section contained more than 1 DW_TAG_compile_unit.
In real world, glibc Scrt1.o is a partial link of start.os abi-note.o init.o and contains 2 CUs in debug builds.
Without this patch, any application linking such Scrt1.o would have invalid .gdb_index
The issue could be demonstrated by:
(gdb) py print(gdb.lookup_global_symbol('main'))
None
Reviewers: espindola, ruiu
Reviewed By: ruiu
Subscribers: Higuoxing, grimar, dblaikie, emaste, aprantl, arichardson, JDevlieghere, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54361
llvm-svn: 346747
Summary:
The debug_info_offset value may be relocated.
This is lld side change of D54375.
Reviewers: ruiu, dblaikie, grimar, espindola
Subscribers: emaste, arichardson, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D54376
llvm-svn: 346616
Summary:
D52830 sets sh_link to .symtab in static link, which breaks executable stripped by GNU strip.
It may also be odd that .rela.plt (SHF_ALLOC) points to .symtab (non-SHF_ALLOC).
Change the logic on pcc's suggestion.
Before:
% clang -fuse-ld=lld -static -xc =(printf 'int main(){}') # or gcc
% strip a.out; ./a.out
unexpected reloc type in static binary[1] 61634 segmentation fault ./a.out
Reviewers: ruiu, grimar, emaste, espindola
Reviewed By: ruiu
Subscribers: pcc, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D53993
llvm-svn: 345899
Summary: .rela.plt may only contain R_*_{,I}RELATIVE relocations and not need a symbol table link. bfd/gold fallbacks to sh_link=0 in this case. Without this patch, ld.lld --strip-all caused lld to dereference a null pointer.
Reviewers: ruiu, grimar, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D53881
llvm-svn: 345648
Out::DebugInfo was used only by GdbIndex class to determine if
we need to create a .gdb_index section, but we can do the same
check without it.
Added a test that this patch doesn't change the existing behavior.
llvm-svn: 345058
Android uses a compressed relocation format, which means the size of the
relocation section isn't predictable based on the number of relocations,
and can vary if the layout changes in any way. To deal with this, the
linker normally runs multiple passes until the layout converges.
The layout should converge if the size of the compressed
relocation section increases monotonically: if the size of an encoded
offset increases by one byte, the larget value which can be encoded is
multiplied by 128, so the representable offsets grow much faster than
the size of the section itself.
The problem here is that there is no code to ensure the size of the
section doesn't decrease. If the size of the relocation section
decreases, the relative offsets can increase due to alignment
restrictions, so that can force the size of the relocation section to
increase again. The end result is an infinite loop; the loop gets cut
off after 10 iterations with the message "thunk creation not
converged".
To avoid this issue, this patch adds padding to the end of the
relocation section if its size would decrease. The extra
padding is harmless because of the way the format is defined:
decoding stops after it reaches the number of relocations specified
in the section's header.
Differential Revision: https://reviews.llvm.org/D53003
llvm-svn: 344300
This is https://bugs.llvm.org/show_bug.cgi?id=37538,
Currently, LLD may set both sh_link and sh_info for
.rela.plt section to zero when we have only .rela.iplt section part used.
ELF spec (https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-94076/index.html)
says that for SHT_REL and SHT_RELA, sh_link references the associated symbol table
and sh_info the "section to which the relocation applies."
When we set the sh_link field, for the regular case we use the .dynsym index.
For .rela.iplt sections, it is unclear what is the associated symbol table,
because R_*_RELATIVE relocations do not use symbol names and we might have no
.dynsym section at all so this patch uses .symtab section index.
Differential revision: https://reviews.llvm.org/D52830
llvm-svn: 344226
Previously, we uncompress all compressed sections before doing anything.
That works, and that is conceptually simple, but that could results in
a waste of CPU time and memory if uncompressed sections are then
discarded or just copied to the output buffer.
In particular, if .debug_gnu_pub{names,types} are compressed and if no
-gdb-index option is given, we wasted CPU and memory because we
uncompress them into newly allocated bufers and then memcpy the buffers
to the output buffer. That temporary buffer was redundant.
This patch changes how to uncompress sections. Now, compressed sections
are uncompressed lazily. To do that, `Data` member of `InputSectionBase`
is now hidden from outside, and `data()` accessor automatically expands
an compressed buffer if necessary.
If no one calls `data()`, then `writeTo()` directly uncompresses
compressed data into the output buffer. That eliminates the redundant
memory allocation and redundant memcpy.
This patch significantly reduces memory consumption (20 GiB max RSS to
15 Gib) for an executable whose .debug_gnu_pub{names,types} are in total
5 GiB in an uncompressed form.
Differential Revision: https://reviews.llvm.org/D52917
llvm-svn: 343979
Summary: The convenience wrapper in STLExtras is available since rL342102.
Reviewers: ruiu, espindola
Subscribers: emaste, arichardson, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D52569
llvm-svn: 343146
When we write a struct to a mmap'ed buffer, we usually use
write16/32/64, but we didn't for VersionDefinitionSection, so
we needed to template that class.
llvm-svn: 343024
Previously, if you invoke lld's `main` more than once in the same process,
the second invocation could fail or produce a wrong result due to a stale
pointer values of the previous run.
Differential Revision: https://reviews.llvm.org/D52506
llvm-svn: 343009
tolower() has some overhead because current locale is considered (though in lld the default "C" locale is used which does not matter too much). llvm::toLower is more efficient as it compiles to a compare and a conditional jump, as opposed to a libc call if tolower is used.
Disregarding locale also matches gdb's behavior (gdb/minsyms.h):
#define SYMBOL_HASH_NEXT(hash, c) \
((hash) * 67 + TOLOWER ((unsigned char) (c)) - 113)
where TOLOWER (include/safe-ctype.h) is a macro that uses a lookup table under the hood which is similar to llvm::toLower.
Reviewers: ruiu, espindola
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D52128
llvm-svn: 342342
Once we create .gdb_index contents, .zdebug_gnu_pub{names,types}
are useless, so there's no need to keep their uncompressed data
in memory.
I observed that for a test case in which lld creates a 3GB .gdb_index
section, the maximum resident set size reduced from 43GB to 29GB after
this patch.
Differential Revision: https://reviews.llvm.org/D52126
llvm-svn: 342297
-z interpose sets the DF_1_INTERPOSE flag, marking the object as an
interposer.
Via FreeBSD PR 230604, linking Valgrind with lld failed.
Differential Revision: https://reviews.llvm.org/D52094
llvm-svn: 342239
This fixes the following warning when compiling with gcc version 8.0.1 20180319 (experimental) (GCC):
/home/umb/LLVM/llvm/tools/lld/ELF/SyntheticSections.cpp:1951:46: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
return OS->SectionIndex >= SHN_LORESERVE ? SHN_XINDEX : OS->SectionIndex;
llvm-svn: 340164
Summary: This makes it conform to what the comment says. Otherwise when getErrPlace() is called afterwards, cast<InputSection>(D) will cause incompatible cast as MergeInputSection is not a subclass of InputSection.
Reviewers: ruiu, grimar, espindola, pcc
Reviewed By: grimar
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D50742
llvm-svn: 339904
It turns out that postThunkContents() is only used for
sorting symbols in .symtab.
Though we can instead move the logic to SymbolTableBaseSection::finalizeContents(),
postpone calling it and then get rid of postThunkContents completely.
Differential revision: https://reviews.llvm.org/D49547
llvm-svn: 339413
If we fail to merge a secondary GOT with the primary GOT but so far only
one merged GOT has been created (the primary one), the final element in
MergedGots is the primary GOT. Thus we should not try to merge with this
final element passing IsPrimary=false, since this will ignore the fact
that the destination GOT does in fact need a header, and those extra two
entries can be enough to allow the merge to incorrectly occur. Instead
we should check for this case before attempting the second merge.
Patch by James Clarke.
Differential revision: https://reviews.llvm.org/D49422
llvm-svn: 337810
Signed values for the FDE PC addr were not correctly handled in
readFdeAddr(). If the value is negative and the type of the value is
smaller than 64 bits, the FDE PC addr overflow error would be
incorrectly triggered.
Fixed readFdeAddr() to properly handle signed values by sign extending
where appropriate.
Differential Revision: https://reviews.llvm.org/D49557
llvm-svn: 337683
Currently, getFdePC() returns uint64_t. Its because the following
encodings might use 8 bytes: DW_EH_PE_absptr and DW_EH_PE_udata8.
But caller assigns returned value to uint32_t field:
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L508
Value is used for building .eh_frame_hdr section.
We use DW_EH_PE_sdata4 encoding for building it at this moment:
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L2545
And that means that an overflow issue might happen if
DW_EH_PE_absptr/DW_EH_PE_udata8 address encodings are present
in .eh_frame. In that case, before this patch, we silently would
truncate the address and produced broken .eh_frame_hdr section.
It would be not hard to support real 64-bit values for
DW_EH_PE_absptr/DW_EH_PE_udata8 encodings, but it is
unclear if it is usefull and if we should do it.
Since nobody faced/reported it, int this patch I only implement
a check to stop producing broken output silently for now.
llvm-svn: 337382
Code was dead because at the moment of BssSection creation
it can never have a parent. Also, code simply does not
make sence as alignment adjastment happens when
BssSection is added to its parent later.
llvm-svn: 337276
Since .gdb_index sections contain all known symbols, they can be very large.
One of my executables has a .gdb_index section of 1350 GiB. Uniquifying
symbols by name takes 3.77 seconds on my machine. This patch parallelize it.
Time to call createSymbols() with 8.4 million unique symbols:
Without this patch: 3773 ms
Parallelism = 1: 4374 ms
Parallelism = 2: 2628 ms
Parallelism = 16: 837 ms
As you can see above, this algorithm is a bit more inefficient
than the non-parallelized version, but even with dual-core, it is
faster than that, so I think it is overall a win.
Differential Revision: https://reviews.llvm.org/D49164
llvm-svn: 336790
This workaround is for GCC 5.4.1. Without this workaround, lld will
produce larger .gdb_index sections for object files compiled with the
buggy version of the compiler.
Since it is not for correctness, and it affects only debug builds (since
you are generating .gdb_index sections), perhaps the hack shouldn't have been
added in the first place. At least, I think it is time to remove this hack.
Differential Revision: https://reviews.llvm.org/D49149
llvm-svn: 336788
This patch merges createGdbIndex function and GdbIndexSection's
constructor into a single static member function of the class.
This patch also change how we keep CU vectors. Previously, CuVector
and GdbSymbols were parallel arrays, but there's no reason to choose that
design. Now, CuVector is a member of GdbSymbol class.
A lot of members are removed from GdbIndexSection. Previously, it has
members that need to be kept in sync over several phases. I belive the new
design is less error-prone, and the new code is much easier to read
than before.
llvm-svn: 336743
.gdb_index sections can be very large. When you are compiling
multi-gibibyte executables, they can be larger than 1 GiB. The previous
implementation of .gdb_index seems to consume too much memory.
This patch reduces memory consumption by eliminating temporary objects.
In one experiment, memory consumption of GdbIndexSection class is
reduced from 962 MiB to 228 MiB when creating a .gdb_index of 1350 GiB.
Differential Revision: https://reviews.llvm.org/D49094
llvm-svn: 336672
I believe the only way to test this functionality is to create extremely
large object files and attempt to create a .gdb_index that is greater
than 4 GiB. But I think that's too much for most environments and buildbots,
so I'm commiting this without a test that actually triggers the new
error condition.
llvm-svn: 336631
Previously, we didn't create multiple consecutive bitmaps.
Added a test to catch this bug too.
Differential Revision: https://reviews.llvm.org/D49107
llvm-svn: 336620
This patch also speeds it up by making some constants compile-time
constants. Other than that, NFC.
Differential Revision: https://reviews.llvm.org/D49101
llvm-svn: 336614
Patch by Rahul Chaudhry!
This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg
Pass '--pack-dyn-relocs=relr' to enable generation of SHT_RELR section
and DT_RELR, DT_RELRSZ, and DT_RELRENT dynamic tags.
Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!
Pass '--use-android-relr-tags' with '--pack-dyn-relocs=relr' to use
SHT_ANDROID_RELR section type instead of SHT_RELR, as well as
DT_ANDROID_RELR* dynamic tags instead of DT_RELR*. The generated
section contents are identical.
'--pack-dyn-relocs=android+relr --use-android-relr-tags' enables both
'--pack-dyn-relocs=android' and '--pack-dyn-relocs=relr': lld will
encode the relative relocations in a SHT_ANDROID_RELR section, and pack
the rest of the dynamic relocations in a SHT_ANDROID_REL(A) section.
Differential Revision: https://reviews.llvm.org/D48247
llvm-svn: 336594
The local dynamic TLS access on PPC64 ELF v2 ABI uses R_PPC64_GOT_DTPREL16*
relocations when a TLS variables falls outside 2 GB of the thread storage
block. This patch adds support for these relocations by adding a new RelExpr
called R_TLSLD_GOT_OFF which emits a got entry for the TLS variable relative
to the dynamic thread pointer using the relocation R_PPC64_DTPREL64. It then
evaluates the R_PPC64_GOT_DTPREL16* relocations as the got offset for the
R_PPC64_DTPREL64 got entries.
Differential Revision: https://reviews.llvm.org/D48484
llvm-svn: 335732
rLLD329787 added the stable sorting to SymbolTableBaseSection::postThunkContents.
I profiled the Mozilla (response-O0.txt) from lld-speed-test package and found
std::stable_sort is showing up in profile results and consuming the 3.1% of the total
CPU time in the RelWithDebug build. Total time of postThunkContents is 3.54%, 238ms.
This change reduces postTimeContents time to 50ms, making it to take 0.73% of Total CPU time.
So, instead of sorting the local part I suggest to just rebuild it.
That is what this patch does.
Differential revision: https://reviews.llvm.org/D45519
llvm-svn: 335583
While building a Global Offset Table try to fill the primary GOT as much
as possible because the primary GOT can be accessed in the most
effective way. If it is not possible, try to fill the last GOT in the
multi-GOT list, and finally create a new GOT if both attempts failed.
llvm-svn: 335140