Commit Graph

691 Commits

Author SHA1 Message Date
Alexey Lapshin 0a2d415bd0 [LLD] Report errors occurred while parsing debug info as warnings.
Summary:
Extracted from D74773. Currently, errors happened while parsing
debug info are reported as errors. DebugInfoDWARF library treats such
errors as "Recoverable errors". This patch makes debug info errors
to be reported as warnings, to support DebugInfoDWARF approach.

Reviewers: ruiu, grimar, MaskRay, jhenderson, espindola

Reviewed By: MaskRay, jhenderson

Subscribers: emaste, aprantl, arichardson, arphaman, llvm-commits

Tags: #llvm, #debug-info, #lld

Differential Revision: https://reviews.llvm.org/D75234
2020-02-29 00:03:18 +03:00
Daniel Kiss b6162622c0 [LLD][ELF][AArch64] Change the semantics of -z pac-plt.
Summary:
Generate PAC protected plt only when "-z pac-plt" is passed to the
linker. GNU toolchain generates when it is explicitly requested[1].
When pac-plt is requested then set the GNU_PROPERTY_AARCH64_FEATURE_1_PAC
note even when not all function compiled with PAC but issue a warning.
Harmonizing the warning style for BTI/PAC/IBT.
Generate BTI protected PLT if case of "-z force-bti".

[1] https://www.sourceware.org/ml/binutils/2019-03/msg00021.html

Reviewers: peter.smith, espindola, MaskRay, grimar

Reviewed By: peter.smith, MaskRay

Subscribers: tatyana-krasnukha, emaste, arichardson, kristof.beyls, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74537
2020-02-18 09:56:57 +01:00
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Russell Gallop e7cb374433 [LLD][ELF] Add time-trace to ELF LLD
This adds some of LLD specific scopes and picks up optimisation scopes
via LTO/ThinLTO. Makes use of TimeProfiler multi-thread support added in
77e6bb3c.

Differential Revision: https://reviews.llvm.org/D71060
2020-02-06 12:14:13 +00:00
Fangrui Song 837e8a9c0c [ELF][PPC32] Support canonical PLT
-fno-pie produces a pair of non-GOT-non-PLT relocations R_PPC_ADDR16_{HA,LO} (R_ABS) referencing external
functions.

```
lis 3, func@ha
la 3, func@l(3)
```

In a -no-pie/-pie link, if func is not defined in the executable, a canonical PLT entry (st_value>0, st_shndx=0) will be needed.
References to func in shared objects will be resolved to this address.
-fno-pie -pie should fail with "can't create dynamic relocation ... against ...", so we just need to think about -no-pie.

On x86, the PLT entry passes the JMP_SLOT offset to the rtld PLT resolver.
On x86-64: the PLT entry passes the JUMP_SLOT index to the rtld PLT resolver.
On ARM/AArch64: the PLT entry passes &.got.plt[n]. The PLT header passes &.got.plt[fixed-index]. The rtld PLT resolver can compute the JUMP_SLOT index from the two addresses.

For these targets, the canonical PLT entry can just reuse the regular PLT entry (in PltSection).

On PPC32: PltSection (.glink) consists of `b PLTresolve` instructions and `PLTresolve`. The rtld PLT resolver depends on r11 having been set up to the .plt (GotPltSection) entry.
On PPC64 ELFv2: PltSection (.glink) consists of `__glink_PLTresolve` and `bl __glink_PLTresolve`. The rtld PLT resolver depends on r12 having been set up to the .plt (GotPltSection) entry.

We cannot reuse a `b PLTresolve`/`bl __glink_PLTresolve` in PltSection as a canonical PLT entry. PPC64 ELFv2 avoids the problem by using TOC for any external reference, even in non-pic code, so the canonical PLT entry scenario should not happen in the first place.
For PPC32, we have to create a PLT call stub as the canonical PLT entry. The code sequence sets up r11.

Reviewed By: Bdragon28

Differential Revision: https://reviews.llvm.org/D73399
2020-01-25 17:56:37 -08:00
Fangrui Song deb5819d62 [ELF] Rename relocateOne() to relocate() and pass `Relocation` to it
Symbol information can be used to improve out-of-range/misalignment diagnostics.
It also helps R_ARM_CALL/R_ARM_THM_CALL which has different behaviors with different symbol types.

There are many (67) relocateOne() call sites used in thunks, {Arm,AArch64}errata, PLT, etc.
Rename them to `relocateNoSym()` to be clearer that there is no symbol information.

Reviewed By: grimar, peter.smith

Differential Revision: https://reviews.llvm.org/D73254
2020-01-25 12:00:18 -08:00
Peter Smith 01ad4c8384 [LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS.
In D71281 a fix was put in to round up the size of a ThunkSection to the
nearest 4KiB when performing errata patching. This fixed a problem with a
very large instrumented program that had thunks and patches mutually
trigger each other. Unfortunately it triggers an assertion failure in an
AArch64 allyesconfig build of the kernel. There is a specific assertion
preventing an InputSectionDescription being larger than 4KiB. This will
always trigger if there is at least one Thunk needed in that
InputSectionDescription, which is possible for an allyesconfig build.

Abstractly the problem case is:
.text : {
          *(.text) ;
          ...
          . = ALIGN(SZ_4K);
          __idmap_text_start = .;
          *(.idmap.text)
          __idmap_text_end = .;
          ...
        }
The assertion checks that __idmap_text_end - __idmap_start is < 4 KiB.
Note that there is more than one InputSectionDescription in the
OutputSection so we can't just restrict the fix to OutputSections smaller
than 4 KiB.

The fix presented here limits the D71281 to InputSectionDescriptions that
meet the following conditions:
1.) The OutputSection is bigger than the thunkSectionSpacing so adding
thunks will affect the addresses of following code.
2.) The InputSectionDescription is larger than 4 KiB. This will prevent
any assertion failures that an InputSectionDescription is < 4 KiB
in size.

We do this at ThunkSection creation time as at this point we know that
the addresses are stable and up to date prior to adding the thunks as
assignAddresses() will have been called immediately prior to thunk
generation.

The fix reverts the two tests affected by D71281 to their original state
as they no longer need the 4KiB size roundup. I've added simpler tests to
check for D71281 when the OutputSection size is larger than the ThunkSection
spacing.

Fixes https://github.com/ClangBuiltLinux/linux/issues/812

Differential Revision: https://reviews.llvm.org/D72344
2020-01-17 10:47:21 +00:00
Fangrui Song 870094decf [ELF] Decrease alignment of ThunkSection on 64-bit targets from 8 to 4
ThunkSection contains 4-byte instructions on all targets that use
thunks. Thunks should not be used in any performance sensitive places,
and locality/cache line/instruction fetching arguments should not apply.

We use 16 bytes as preferred function alignments for modern PowerPC cores.
In any case, 8 is not optimal.

Differential Revision: https://reviews.llvm.org/D72819
2020-01-16 10:36:33 -08:00
Fangrui Song 7cd429f27d [ELF] Add -z force-ibt and -z shstk for Intel Control-flow Enforcement Technology
This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang.

It adds Intel CET (Control-flow Enforcement Technology) support to lld.
The implementation follows the draft version of psABI which you can
download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.

CET introduces a new restriction on indirect jump instructions so that
you can limit the places to which you can jump to using indirect jumps.

In order to use the feature, you need to compile source files with
-fcf-protection=full.

* IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt.
* SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified.

IBT-enabled executables/shared objects have two PLT sections, ".plt" and
".plt.sec".  For the details as to why we have two sections, please read
the comments.

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D59780
2020-01-13 23:39:28 -08:00
Fangrui Song 681b1be774 [lld] Fix -Wrange-loop-analysis warnings
One instance looks like a false positive:

lld/ELF/Relocations.cpp:1622:14: note: use reference type 'const std::pair<ThunkSection *, uint32_t> &' (aka 'cons
t pair<lld:🧝:ThunkSection *, unsigned int> &') to prevent copying
        for (const std::pair<ThunkSection *, uint32_t> ts : isd->thunkSections)

It is not changed in this commit.
2020-01-01 15:41:20 -08:00
Fangrui Song 1edd965130 [ELF] Support input section description .gnu.version* in /DISCARD/
Linux powerpc discards `*(.gnu.version*)` (arch/powerpc/kernel/vmlinux.lds.S)
to suppress --orphan-handling=warn warnings in the -pie output `.tmp_vmlinux1`

The support is simple. Just add isLive() to:

1) Fix an assertion in SectionBase::getPartition() called by VersionTableSection::isNeeded().
2) Suppress DT_VERSYM, DT_VERDEF, DT_VERNEED and DT_VERNEEDNUM, if the relevant section is discarded.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D71819
2019-12-26 09:54:22 -08:00
Fangrui Song 37b2808059 [ELF] writePlt, writeIplt: replace parameters gotPltEntryAddr and index with `const Symbol &`. NFC
PPC::writeIplt (IPLT code sequence, D71621) needs to access `Symbol`.

Reviewed By: grimar, ruiu

Differential Revision: https://reviews.llvm.org/D71631
2019-12-18 00:14:03 -08:00
Fangrui Song 345f59667d [ELF] Rename .plt to .iplt and decrease EM_PPC{,64} alignment of .glink to 4
GNU ld creates the synthetic section .iplt, and has a built-in linker
script that assigns .iplt to the output section .plt . There is no
output section named .iplt .

Making .iplt an output section actually has a benefit that makes the
tricky toolchain feature stand out. Symbolizers don't have to deal with
mixed PLT entries (e.g. llvm-objdump -d incorrectly annotates such jump
targets).

On EM_PPC{,64}, .glink contains a PLT resolver and a series of jump
instructions. The 4-byte entry size makes it unnecessary to have an
alignment of 16.

Mark ppc32-gnu-ifunc.s and ppc32-gnu-ifunc-nonpreemptable.s as `XFAIL: *`.
They test IPLT on EM_PPC, which never works.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D71520
2019-12-17 00:15:59 -08:00
Fangrui Song 891a8655ab [ELF] Add IpltSection
PltSection is used by both PLT and IPLT. The PLT section may have a
header while the IPLT section does not. Split off IpltSection from
PltSection to be clearer.

Unlike other targets, PPC64 cannot use the same code sequence for PLT
and IPLT. This helps make a future PPC64 patch (D71509) more isolated.

On EM_386 and EM_X86_64, when PLT is empty while IPLT is not, currently
we are inconsistent whether the PLT header is conceptually attached to
in.plt or in.iplt .  Consistently attach the header to in.plt can make
the -z retpolineplt logic simpler. It also makes `jmp` point to an
aesthetically better place for non-retpolineplt cases.

Reviewed By: grimar, ruiu

Differential Revision: https://reviews.llvm.org/D71519
2019-12-17 00:06:04 -08:00
Fangrui Song 90d195d026 [ELF] Delete relOff from TargetInfo::writePLT
This change only affects EM_386. relOff can be computed from `index`
easily, so it is unnecessarily passed as a parameter.

Both in.plt and in.iplt entries are written by writePLT. For in.iplt,
the instruction `push reloc_offset` will change because `index` is now
different. Fortunately, this does not matter because `push; jmp` is only
used by PLT. IPLT does not need the code sequence.

Reviewed By: grimar, ruiu

Differential Revision: https://reviews.llvm.org/D71518
2019-12-16 11:10:02 -08:00
Fangrui Song 98afa2c1f1 [ELF] De-template PltSection::addEntry. NFC 2019-12-16 11:03:20 -08:00
Rui Ueyama 69da7e29de Revert an accidental commit af5ca40b47 2019-12-13 15:17:40 +09:00
Rui Ueyama af5ca40b47 temporary 2019-12-13 14:35:03 +09:00
Peter Smith 86d24193a9 [LLD][ELF][AArch64][ARM] When errata patching, round thunk size to 4KiB.
On some edge cases such as Chromium compiled with full instrumentation we
have a .text section over twice the size of the maximum branch range and
the instrumented code generation containing many examples of the erratum
sequence. The combination of Thunks and many erratum sequences causes
finalizeAddressDependentContent() to not converge. We end up with:
start
- Thunk Creation (disturbs addresses after thunks, creating more patches)
- Patch Creation (disturbs addresses after patches, creating more thunks)
- goto start

In most images with few thunks and patches the mutual disturbance does not
cause convergence problems. As the .text size and number of patches go up
the risk increases.

A way to prevent the thunk creation from interfering with patch creation is
to round up the size of the thunks to a 4KiB boundary when the
erratum patch is enabled. As the erratum sequence only triggers when an
instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the
thunks not affect addresses modulo (4 KiB) we prevent thunks from
interfering with the patch.

The patches themselves could be aggregated in the same way that Thunks are
within ThunkSections and we could round up the size in the same way. This
would reduce the number of patches created in a .text section size >
128 MiB but would not likely help convergence problems.

Differential Revision: https://reviews.llvm.org/D71281

fixes (remaining part of) pr44071, other part in D71242
2019-12-11 14:09:15 +00:00
Fangrui Song c8f0d3e130 [ELF][PPC64] Support long branch thunks with addends
Fixes PPC64 part of PR40438

  // clang -target ppc64le -c a.cc
  // .text.unlikely may be placed in a separate output section (via -z keep-text-section-prefix)
  // The distance between bar in .text.unlikely and foo in .text may be larger than 32MiB.
  static void foo() {}
  __attribute__((section(".text.unlikely"))) static int bar() { foo(); return 0; }
  __attribute__((used)) static int dummy = bar();

This patch makes such thunks with addends work for PPC64.

AArch64: .text -> `__AArch64ADRPThunk_ (adrp x16, ...; add x16, x16, ...; br x16)` -> target
PPC64: .text -> `__long_branch_ (addis 12, 2, ...; ld 12, ...(12); mtctr 12; bctr)` -> target

AArch64 can leverage ADRP to jump to the target directly, but PPC64
needs to load an address from .branch_lt . Before Power ISA v3.0, the
PC-relative ADDPCIS was not available. .branch_lt was invented to work
around the limitation.

Symbol::ppc64BranchltIndex is replaced by
PPC64LongBranchTargetSection::entry_index which take addends into
consideration.

The tests are rewritten: ppc64-long-branch.s tests -no-pie and
ppc64-long-branch-pi.s tests -pie and -shared.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D70937
2019-12-05 10:17:45 -08:00
Peter Smith 784f57584f [LLD][ELF][AArch64] .note.gnu.property sections should have alignment 8
The .note.gnu.property SHT_NOTE sections on AArch64 (a 64-bit target)
should have alignment 8 to more closely match the binutils implementation
where alignment is 4-bytes on 32-bit machines and 8-bytes on 64-bit
machines.

Previously LLD was using 4 for both 32-bit and 64-bit machines.

Differential Revision: https://reviews.llvm.org/D70962
2019-12-05 10:11:31 +00:00
Peter Collingbourne 2c6fae179e ELF: Discard .ARM.exidx sections for empty functions instead of misordering them.
The logic added in r372781 caused ARMExidxSyntheticSection::addSection()
to return false for exidx sections without a link order dep that passed
isValidExidxSectionDep(). This included exidx sections for empty functions. As
a result, such exidx sections would end up treated like ordinary sections and
would end up being laid out before the ARMExidxSyntheticSection, most likely in
the wrong order relative to the exidx entries in the ARMExidxSyntheticSection,
breaking the orderedness invariant relied upon by unwinders. Fix this by
simply discarding such sections.

Differential Revision: https://reviews.llvm.org/D69744
2019-11-04 09:11:14 -08:00
Nico Weber 5976a3f5aa Fix a few typos in lld/ELF to cycle bots 2019-10-28 21:41:47 -04:00
Fangrui Song bd8cfe65f5 [ELF] Wrap things in `namespace lld { namespace elf {`, NFC
This makes it clear `ELF/**/*.cpp` files define things in the `lld::elf`
namespace and simplifies `elf::foo` to `foo`.

Reviewed By: atanasyan, grimar, ruiu

Differential Revision: https://reviews.llvm.org/D68323

llvm-svn: 373885
2019-10-07 08:31:18 +00:00
Peter Collingbourne 0bb825d208 ELF: Add .interp synthetic sections first in createSyntheticSections().
Our .interp section is not a SyntheticSection. As a result, it terminates the
loop in removeUnusedSyntheticSections(). This has at least two consequences:

- The synthetic .bss and .bss.rel.ro sections are always present in
  dynamically linked executables, even when they are not needed.
- The synthetic .ARM.exidx (and possibly other) sections are always present
  in partitions other than the last one, even when not needed.
  .ARM.exidx in particular is problematic because it assumes that its
  list of code sections is non-empty in getLinkOrderDep(), which can
  lead to a crash if the partition does not have any code sections.

Fix these problems by moving the creation of the .interp sections to the
top of createSyntheticSections(). While here, make the code a little less
error-prone by changing the add() lambdas to take a SyntheticSection instead
of an InputSectionBase.

Differential Revision: https://reviews.llvm.org/D68256

llvm-svn: 373347
2019-10-01 16:10:13 +00:00
Peter Smith 06b3e3421a [ELF][ARM] Fix crash when discarding InputSections that have .ARM.exidx
When /DISCARD/ is used on an input section, that input section may have
a .ARM.exidx metadata section that depends on it. As the discard handling
comes after the .ARM.exidx synthetic section is created we need to make
sure that we account for the case where the .ARM.exidx output section
should be removed because there are no more live input sections.

Differential Revision: https://reviews.llvm.org/D67848

llvm-svn: 372781
2019-09-24 21:44:14 +00:00
Fangrui Song e47bbd28f8 [ELF] Make MergeInputSection merging aware of output sections
Fixes PR38748

mergeSections() calls getOutputSectionName() to get output section
names. Two MergeInputSections may be merged even if they are made
different by SECTIONS commands.

This patch moves mergeSections() after processSectionCommands() and
addOrphanSections() to fix the issue. The new pass is renamed to
OutputSection::finalizeInputSections().

processSectionCommands() and addorphanSections() are changed to add
sections to InputSectionDescription::sectionBases.

finalizeInputSections() merges MergeInputSections and migrates
`sectionBases` to `sections`.

For the -r case, we drop an optimization that tries keeping sh_entsize
non-zero. This is for the simplicity of addOrphanSections(). The
updated merge-entsize2.s reflects the change.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D67504

llvm-svn: 372734
2019-09-24 11:48:31 +00:00
Fangrui Song 7afffb54ea [ELF] Don't shrink RelrSection
Fixes PR43214.

The size of SHT_RELR may oscillate between 2 numbers (see D53003 for a
similar --pack-dyn-relocs=android issue). This can happen if the shrink
of SHT_RELR causes it to take more words to encode relocation offsets
(this can happen with thunks or segments with overlapping p_offset
ranges), and the expansion of SHT_RELR causes it to take fewer words to
encode relocation offsets.

To avoid the issue, add padding 1s to the end of the relocation section
if its size would decrease. Trailing 1s do not decode to more relocations.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D67164

llvm-svn: 370923
2019-09-04 16:27:35 +00:00
Fangrui Song 8fbe81fb29 [ELF][RISCV] Assign st_shndx of __global_pointer$ to 1 if .sdata does not exist
This essentially reverts the code change of D63132 and switches to a simpler approach.

In an executable/shared object, st_shndx of a symbol can be:

1) SHN_UNDEF: undefined symbol (or canonical PLT)
2) SHN_ABS: absolute symbol
3) any other value (usually a regular section index) represents a relative symbol.
  The actual value does not matter.

Many ld.so (musl, all archs except MIPS of FreeBSD rtld-elf) even treat 2) and 3)
the same. If .sdata does not exist, it does not matter what value/section
__global_pointer$ has, as long as it is relative (otherwise there will be a pedantic
lld error. See D63132). Just set the st_shndx arbitrarily to 1.

Dummy st_shndx=1 may be used by __rela_iplt_start, linker-script-defined symbols outside a section, __dso_handle, etc.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D66798

llvm-svn: 370172
2019-08-28 09:01:03 +00:00
Fangrui Song 1681ceb2c4 [ELF] EhFrameSection: postpone FDE liveness check to finalizeSections
EhFrameSection::addSection checks liveness of FDE early. This makes it
infeasible to move combineEhSections() before ICF.

Postpone the check to EhFrameSection::finalizeContents(). This is what
ARMExidxSyntheticSection does and it will make a subsequent patch D66717
simpler.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D66727

llvm-svn: 369890
2019-08-26 10:32:12 +00:00
Fangrui Song 2d337fdc95 Reland D65242 "[ELF] More dynamic relocation packing""
This fixed a bug in r369488. When config->isRela is false, i->r_addend
is not initialized (see encodeDynamicReloc). So we should check
config->isRela before accessing r_addend:

- if (j - i < 3 || i->r_addend)
+ if (j - i < 3 || (config->isRela && i->r_addend != 0))

Original description:

Currently, with Android dynamic relocation packing, only relative
relocations are grouped together. This patch implements similar
packing for non-relative relocations.

The implementation groups non-relative relocations with the same
r_info and r_addend, if using RELA. By requiring a minimum group
size of 3, this achieves smaller relocation sections. Building Android
for an ARM32 device, I see the total size of /system/lib decrease by
392 KB.

Grouping by r_info also allows the runtime dynamic linker to implement
an 1-entry cache to reduce the number of symbol lookup required. With
such 1-entry cache implemented on Android, I'm seeing 10% to 20%
reduction in total time spent in runtime linker for several executables
that I tested.

As a simple correctness check, I've also built x86_64 Android and booted
successfully.

Differential Revision: https://reviews.llvm.org/D65242
Patch by Vic Yang

llvm-svn: 369507
2019-08-21 09:21:37 +00:00
Fangrui Song b2895a8cdc Revert D65242 "[ELF] More dynamic relocation packing"
This reverts r369488 and r369489. The change broke build bots:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407

llvm-svn: 369497
2019-08-21 06:50:08 +00:00
Fangrui Song 35f9a84a15 [ELF] More dynamic relocation packing
Currently, with Android dynamic relocation packing, only relative
relocations are grouped together. This patch implements similar
packing for non-relative relocations.

The implementation groups non-relative relocations with the same
r_info and r_addend, if using RELA. By requiring a minimum group
size of 3, this achieves smaller relocation sections. Building Android
for an ARM32 device, I see the total size of /system/lib decrease by
392 KB.

Grouping by r_info also allows the runtime dynamic linker to implement
an 1-entry cache to reduce the number of symbol lookup required. With
such 1-entry cache implemented on Android, I'm seeing 10% to 20%
reduction in total time spent in runtime linker for several executables
that I tested.

As a simple correctness check, I've also built x86_64 Android and booted
successfully.

Differential Revision: https://reviews.llvm.org/D66491
Patch by Vic Yang!

llvm-svn: 369488
2019-08-21 03:02:08 +00:00
Jonas Devlieghere 6ba7992031 [LLD] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

Differential revision: https://reviews.llvm.org/D66259

llvm-svn: 368936
2019-08-14 22:28:17 +00:00
Fangrui Song e220c67b7a [ELF] --gdb-index: fix odd variable name cUs after r365730 and replace lower_bound with partition_point. NFC
llvm-svn: 368845
2019-08-14 12:56:30 +00:00
David Blaikie 84b55e61dd DebugInfo: Explicitly handle errors when parsing unit DIEs
This ensures these errors produce a non-zero exit and improves the
context (providing the name of the input object and section being
parsed).

llvm-svn: 368378
2019-08-09 01:14:36 +00:00
David Blaikie fa2f4395e9 API update for change to LLVM's lib/DebugInfo/DWARF
llvm-svn: 368190
2019-08-07 17:18:18 +00:00
Peter Smith 7f320d4bf0 [ELF][ARM] Fix /DISCARD/ of section with .ARM.exidx section
The combineEhSections runs, by design, before processSectionCommands so
that input exception sections like .ARM.exidx and .eh_frame are not assigned
to OutputSections. Unfortunately if /DISCARD/ removes InputSections that
have associated .ARM.exidx sections without discarding the .ARM.exidx
synthetic section then we will end up crashing when trying to sort the
InputSections in ascending address order.

We fix this by filtering out the sections that have been discarded prior
to processing the InputSections in finalizeContents().

fixes pr42890

Differential Revision: https://reviews.llvm.org/D65759

llvm-svn: 368041
2019-08-06 14:13:38 +00:00
Fangrui Song e28a70daf4 [ELF] Consistently prioritize non-* wildcards overs "*" in version scripts
We prioritize non-* wildcards overs VER_NDX_LOCAL/VER_NDX_GLOBAL "*".
This patch generalizes the rule to "*" of other versions and thus fixes PR40176.
I don't feel strongly about this GNU linkers' behavior but the
generalization simplifies code.

Delete `config->defaultSymbolVersion` which was used to special case
VER_NDX_LOCAL/VER_NDX_GLOBAL "*".

In `SymbolTable::scanVersionScript`, custom versions are handled the same
way as VER_NDX_LOCAL/VER_NDX_GLOBAL. So merge
`config->versionScript{Locals,Globals}` into `config->versionDefinitions`.
Overall this seems to simplify the code.

In `SymbolTable::assign{Exact,Wildcard}Versions`,
`sym->verdefIndex == config->defaultSymbolVersion` is changed to
`verdefIndex == UINT32_C(-1)`.
This allows us to give duplicate assignment diagnostics for
`{ global: foo; };` `V1 { global: foo; };`

In test/linkerscript/version-script.s:
  vs_index of an undefined symbol changes from 0 to 1. This doesn't matter (arguably 1 is better because the binding is STB_GLOBAL) because vs_index of an undefined symbol is ignored.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D65716

llvm-svn: 367869
2019-08-05 14:31:39 +00:00
Fangrui Song 25ab1c6471 [ELF] Move R_*_IRELATIVE from .rel[a].plt to .rel[a].dyn unless --pack-dyn-relocs=android[+relr]
An R_*_IRELATIVE represents the address of a STT_GNU_IFUNC symbol
(redirected at runtime) which is non-preemptable and is not associated
with a canonical PLT (associated with a symbol with a section index of
SHN_UNDEF but a non-zero st_value).

.rel[a].plt [DT_JMPREL, DT_JMPREL+DT_JMPRELSZ) contains relocations that
can be lazily resolved. R_*_IRELATIVE are always eagerly resolved, so
conceptually they do not belong to .rela.plt. "iplt" is mostly a misnomer.

glibc powerpc and powerpc64 do not resolve R_*_IRELATIVE if they are in .rela.plt.

    // a.o - synthesized PLT call stub has an R_*_IRELATIVE
    void ifunc(); int main() { ifunc(); }
    // b.o
    static void real() {}
    asm (".type ifunc, %gnu_indirect_function");
    void *ifunc() { return &real; }

The lld-linked executable crashes. ld.bfd places R_*_IRELATIVE in
.rela.dyn and the executable works.

glibc i386, x86_64, and aarch64 have logic
(glibc/sysdeps/*/dl-machine.h:elf_machine_lazy_rel) to eagerly resolve
R_*_IRELATIVE in .rel[a].plt so the lld-linked executable works.

Move R_*_IRELATIVE from .rel[a].plt to .rel[a].dyn to fix the crashes on
glibc powerpc/powerpc64. This also helps simplifying ifunc
implementation in FreeBSD rtld-elf powerpc64.

If --pack-dyn-relocs=android[+relr] is specified, the Android packed
dynamic relocation format is used for .rela.dyn. We cannot name
in.relaIplt ".rela.dyn" because the output section will have mixed
formats. This can be improved in the future.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D65651

llvm-svn: 367745
2019-08-03 02:26:52 +00:00
Fangrui Song 47cfe8f321 [ELF] Fix variable names in comments after VariableName -> variableName change
Also fix some typos.

llvm-svn: 366181
2019-07-16 05:50:45 +00:00
Rui Ueyama 136d27ab4d [Coding style change][lld] Rename variables for non-ELF ports
This patch does the same thing as r365595 to other subdirectories,
which completes the naming style change for the entire lld directory.

With this, the naming style conversion is complete for lld.

Differential Revision: https://reviews.llvm.org/D64473

llvm-svn: 365730
2019-07-11 05:40:30 +00:00
Rui Ueyama 3837f4273f [Coding style change] Rename variables so that they start with a lowercase letter
This patch is mechanically generated by clang-llvm-rename tool that I wrote
using Clang Refactoring Engine just for creating this patch. You can see the
source code of the tool at https://reviews.llvm.org/D64123. There's no manual
post-processing; you can generate the same patch by re-running the tool against
lld's code base.

Here is the main discussion thread to change the LLVM coding style:
https://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html
In the discussion thread, I proposed we use lld as a testbed for variable
naming scheme change, and this patch does that.

I chose to rename variables so that they are in camelCase, just because that
is a minimal change to make variables to start with a lowercase letter.

Note to downstream patch maintainers: if you are maintaining a downstream lld
repo, just rebasing ahead of this commit would cause massive merge conflicts
because this patch essentially changes every line in the lld subdirectory. But
there's a remedy.

clang-llvm-rename tool is a batch tool, so you can rename variables in your
downstream repo with the tool. Given that, here is how to rebase your repo to
a commit after the mass renaming:

1. rebase to the commit just before the mass variable renaming,
2. apply the tool to your downstream repo to mass-rename variables locally, and
3. rebase again to the head.

Most changes made by the tool should be identical for a downstream repo and
for the head, so at the step 3, almost all changes should be merged and
disappear. I'd expect that there would be some lines that you need to merge by
hand, but that shouldn't be too many.

Differential Revision: https://reviews.llvm.org/D64121

llvm-svn: 365595
2019-07-10 05:00:37 +00:00
Fangrui Song 5c4bbc2746 [ELF] Allow placing non-string SHF_MERGE sections with different alignments into the same MergeSyntheticSection
The difference from D63432/r365015 is that this patch does not place
SHF_STRINGS sections with different alignments into the same
MergeSyntheticSection. Doing that would:

(1) create unnecessary padding and thus waste space.
  Add a test tail-merge-string-align2.s to check no extra padding is created.
(2) make some input sections unaligned when tail merge (-O2) is enabled.
  The alignment of MergeTailAlignment::Builder was out of sync in D63432.
  MOVAPS on such unaligned strings can raise SIGSEGV.

This should fix PR42289: the Linux kernel has a use case that input
files have .rodata.cst32 sections with different alignments. The
expectation (and what ld.bfd and gold do) is that in the -r link, there
is only one .rodata.cst32 (SHF_MERGE sections with different alignments
can be combined), but lld currently creates one for each different
alignment.

The current merging strategy:

1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize and
   sh_addralign). Merging is performed among a group, even if -O0 is specified.
2) Create one output section for each group. This is a special case in
   addInputSec().

This patch changes 1) to:

1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize).
   Merging is performed among a group, even if -O0 is specified.

We will thus create just one .rodata.cst32 . This also improves merging
efficiency when sections with the same name but different alignments are
combined.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D64200

llvm-svn: 365139
2019-07-04 13:33:27 +00:00
Fangrui Song b9bc9f67f5 Revert D63432 "[ELF] Allow placing SHF_MERGE sections with different alignments into the same MergeSyntheticSection"
This reverts r365015.

David Zarzycki reported this change broke stage2 and stage3 tests.  The
root cause is still not very clear, but I guess some SHF_MERGE sections
with the same name have different alignments. They were not merged
before but were merged after r365015.

Something that assumes address uniqueness of such mergeable data caused
the bug.

llvm-svn: 365048
2019-07-03 15:26:54 +00:00
Fangrui Song 347692e2de [ELF] Allow placing SHF_MERGE sections with different alignments into the same MergeSyntheticSection
This should fix PR42289: the Linux kernel has a use case that input
files have .rodata.cst32 sections with different alignments. The
expectation (and what ld.bfd and gold do) is that in the -r link, there
is only one .rodata.cst32 (SHF_MERGE sections with different alignments
can be combined), but lld currently creates one for each different
alignment.

The current merging strategy:

1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize and
   sh_addralign). String merging is performed among a group, even if -O0 is specified.
2) Create one output section for each group. This is a special case in
   addInputSec().

This patch changes 1) to:

1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize).
   String merging is performed among a group, even if -O0 is specified.

We will thus create just one .rodata.cst32 . This also improves merging
efficiency when sections with the same name but different alignments are
combined.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D63432

llvm-svn: 365015
2019-07-03 10:03:49 +00:00
Igor Kudrin fd0ad4b24d [ELF] Do not produce DT_JMPREL and DT_PLTGOT if .rela.plt is empty.
If .rela.plt is mentioned in a linker script, it might be preserved
even if it is empty. In that case, LLD created DT_JMPREL and DT_PLTGOT
dynamic tags. When the tags exist, a dynamic loader writes values into
reserved slots in .got.plt to support lazy symbol resolution.
The problem is that, in fact, the linker has not reserved that space,
and the writing may occur into the memory allocated for something else.

Differential Revision: https://reviews.llvm.org/D63869

llvm-svn: 364639
2019-06-28 10:14:14 +00:00
Fangrui Song 5b4285d82d [ELF][RISCV] Create dummy .sdata for __global_pointer$ if .sdata does not exist
If .sdata is absent, linker synthesized __global_pointer$ gets a section index of SHN_ABS.
(ld.bfd has a similar issue: binutils PR24678)

Scrt1.o may use `lla gp, __global_pointer$` to reference the symbol PC
relatively. In -pie/-shared mode, lld complains if a PC relative
relocation references an absolute symbol (SHN_ABS) but ld.bfd doesn't:

    ld.lld: error: relocation R_RISCV_PCREL_HI20 cannot refer to lute symbol: __global_pointer$

Let the reference of __global_pointer$ to force creation of .sdata to
fix the problem. This is similar to _GLOBAL_OFFSET_TABLE_, which forces
creation of .got or .got.plt .

Also, change the visibility from STV_HIDDEN to STV_DEFAULT and don't
define the symbol for -shared. This matches ld.bfd, though I don't
understand why it uses STV_DEFAULT.

Reviewed By: ruiu, jrtc27

Differential Revision: https://reviews.llvm.org/D63132

llvm-svn: 363351
2019-06-14 02:14:53 +00:00
Peter Collingbourne 0282898586 ELF: Create synthetic sections for loadable partitions.
We create several types of synthetic sections for loadable partitions, including:
- The dynamic symbol table. This allows code outside of the loadable partitions
  to find entry points with dlsym.
- Creating a dynamic symbol table also requires the creation of several other
  synthetic sections for the partition, such as the dynamic table and hash table
  sections.
- The partition's ELF header is represented as a synthetic section in the
  combined output file, and will be used by llvm-objcopy to extract partitions.

Differential Revision: https://reviews.llvm.org/D62350

llvm-svn: 362819
2019-06-07 17:57:58 +00:00
Peter Smith e208208a31 [ELF][AArch64] Support for BTI and PAC
Branch Target Identification (BTI) and Pointer Authentication (PAC) are
architecture features introduced in v8.5a and 8.3a respectively. The new
instructions have been added in the hint space so that binaries take
advantage of support where it exists yet still run on older hardware. The
impact of each feature is:

BTI: For executable pages that have been guarded, all indirect branches
must have a destination that is a BTI instruction of the appropriate type.
For the static linker, this means that PLT entries must have a "BTI c" as
the first instruction in the sequence. BTI is an all or nothing
property for a link unit, any indirect branch not landing on a valid
destination will cause a Branch Target Exception.

PAC: The dynamic loader encodes with PACIA the address of the destination
that the PLT entry will load from the .plt.got, placing the result in a
subset of the top-bits that are not valid virtual addresses. The PLT entry
may authenticate these top-bits using the AUTIA instruction before
branching to the destination. Use of PAC in PLT sequences is a contract
between the dynamic loader and the static linker, it is independent of
whether the relocatable objects use PAC.

BTI and PAC are independent features that can be combined. So we can have
several combinations of PLT:
- Standard with no BTI or PAC
- BTI PLT with "BTI c" as first instruction.
- PAC PLT with "AUTIA1716" before the indirect branch to X17.
- BTIPAC PLT with "BTI c" as first instruction and "AUTIA1716" before the
  first indirect branch to X17.
    
The use of BTI and PAC in relocatable object files are encoded by feature
bits in the .note.gnu.property section in a similar way to Intel CET. There
is one AArch64 specific program property GNU_PROPERTY_AARCH64_FEATURE_1_AND
and two target feature bits defined:
- GNU_PROPERTY_AARCH64_FEATURE_1_BTI
-- All executable sections are compatible with BTI.
- GNU_PROPERTY_AARCH64_FEATURE_1_PAC
-- All executable sections have return address signing enabled.

Due to the properties of FEATURE_1_AND the static linker can tell when all
input relocatable objects have the BTI and PAC feature bits set. The static
linker uses this to enable the appropriate PLT sequence.
Neither -> standard PLT
GNU_PROPERTY_AARCH64_FEATURE_1_BTI -> BTI PLT
GNU_PROPERTY_AARCH64_FEATURE_1_PAC -> PAC PLT
Both properties -> BTIPAC PLT

In addition to the .note.gnu.properties there are two new command line
options:
--force-bti : Act as if all relocatable inputs had
GNU_PROPERTY_AARCH64_FEATURE_1_BTI and warn for every relocatable object
that does not.
--pac-plt : Act as if all relocatable inputs had
GNU_PROPERTY_AARCH64_FEATURE_1_PAC. As PAC is a contract between the loader
and static linker no warning is given if it is not present in an input.

Two processor specific dynamic tags are used to communicate that a non
standard PLT sequence is being used.
DTI_AARCH64_BTI_PLT and DTI_AARCH64_BTI_PAC.

Differential Revision: https://reviews.llvm.org/D62609

llvm-svn: 362793
2019-06-07 13:00:17 +00:00
Fangrui Song 82442adfc0 [PPC32] Improve the 32-bit PowerPC port
Many -static/-no-pie/-shared/-pie applications linked against glibc or musl
should work with this patch. This also helps FreeBSD PowerPC64 to migrate
their lib32 (PR40888).

* Fix default image base and max page size.
* Support new-style Secure PLT (see below). Old-style BSS PLT is not
  implemented, so it is not suitable for FreeBSD rtld now because it doesn't
  support Secure PLT yet.
* Support more initial relocation types:
  R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16.
  The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type
  but it should be ignored for the computation of target symbol VA.
* Support GNU ifunc
* Support .glink used for lazy PLT resolution in glibc
* Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub.
  It is used by R_PPC_REL24 and R_PPC_PLTREL24.

A PLT stub used in -fPIE/-fPIC usually loads an address relative to
.got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative
addresses).
Two .got2 sections in two object files have different addresses, thus a PLT stub
can't be shared by two object files. To handle this incompatibility,
change the parameters of Thunk::isCompatibleWith to
`const InputSection &, const Relocation &`.

PowerPC psABI specified an old-style .plt (BSS PLT) that is both
writable and executable. Linkers don't make separate RW- and RWE segments,
which causes all initially writable memory (think .data) executable.
This is a big security concern so a new PLT scheme (secure PLT) was developed to
address the security issue.

TLS will be implemented in D62940.

glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can
not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack
(not included in this patch) in LinkerScript.cpp addOrphanSections() to
work around the issue:

    if (Config->EMachine == EM_PPC) {
      // Older glibc assumes .rela.dyn includes .rela.plt
      Add(In.RelaDyn);
      if (In.RelaPlt->isLive() && !In.RelaPlt->Parent)
        In.RelaDyn->getParent()->addSection(In.RelaPlt);
    }

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D62464

llvm-svn: 362721
2019-06-06 17:03:00 +00:00
Rui Ueyama 2057f8366a Read .note.gnu.property sections and emit a merged .note.gnu.property section.
This patch also adds `--require-cet` option for the sake of testing.
The actual feature for IBT-aware PLT is not included in this patch.

This is a part of https://reviews.llvm.org/D59780. Submitting this
first should make it easy to work with a related change
(https://reviews.llvm.org/D62609).

Differential Revision: https://reviews.llvm.org/D62853

llvm-svn: 362579
2019-06-05 03:04:46 +00:00
Fangrui Song e98baf8631 [ELF] Delete GotEntrySize and GotPltEntrySize
GotEntrySize and GotPltEntrySize were added in D22288. Later, with
the introduction of wordsize() (then Config->Wordsize), they become
redundant, because there is no target that sets GotEntrySize or
GotPltEntrySize to a number different from Config->Wordsize.

Reviewed By: grimar, ruiu

Differential Revision: https://reviews.llvm.org/D62727

llvm-svn: 362220
2019-05-31 10:35:45 +00:00
Peter Collingbourne ba2816be82 ELF: Add basic partition data structures and behaviours.
This change causes us to read partition specifications from partition
specification sections and split output sections into partitions according
to their reachability from partition entry points.

This is only the first step towards a full implementation of partitions. Later
changes will add additional synthetic sections to each partition so that
they can be loaded independently.

Differential Revision: https://reviews.llvm.org/D60353

llvm-svn: 361925
2019-05-29 03:55:20 +00:00
Fangrui Song 055906e1e5 [ELF] -z combreloc: sort dynamic relocations by (!is_relative,symbol_index,r_offset)
We currently sort dynamic relocations by (!is_relative,symbol_index).
Add r_offset as the third key. This makes `readelf -r` debugging easier
(relocations to the same symbol are ordered by r_offset).

Refactor the test combreloc.s (renamed from combrelocs.s) to check
R_X86_64_RELATIVE, and delete --expand-relocs.

The difference from the reverted D61477 is that we keep !is_relative as
the first key. In local dynamic TLS model, DTPMOD (e.g.
R_ARM_TLS_DTPMOD32 R_X86_64_DTPMOD and R_PPC{,64}_DTPMOD) may use 0 as
the symbol index.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D62141

llvm-svn: 361164
2019-05-20 15:25:01 +00:00
Dmitri Gribenko a2fbe2bcda Revert "[ELF] -z combreloc: sort dynamic relocations by (symbol_index,r_offset)"
This reverts commit r361125.  This linker change breaks shared libraries
in some subtle way on x86_64.  (Specifically, gold segfaults when
loading the LLVMgold.so plugin linked with lldb with this patch.)

llvm-svn: 361150
2019-05-20 13:05:55 +00:00
Fangrui Song 9f1a6de631 [ELF] -z combreloc: sort dynamic relocations by (symbol_index,r_offset)
Fixes PR41692.

We currently sort dynamic relocations by (!is_relative,symbol_index).
Change it to (symbol_index,r_offset). We still place relative
relocations first because R_*_RELATIVE are the only dynamic relocations
with 0 symbol index (except on MIPS, which doesn't use DT_REL[A]COUNT
anyway).

This makes `readelf -r` debugging easier (relocations to the same symbol
are ordered by r_offset).

Refactor the test combreloc.s (renamed from combrelocs.s) to check
R_X86_64_RELATIVE, and delete --expand-relocs.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D61477

llvm-svn: 361125
2019-05-20 07:22:55 +00:00
Fangrui Song ed4dbe6326 [ELF] --gdb-index: fix SIGSEGV when a DWARFAddressRange has invalid SectionIndex
See D61891: llvm had a bug that might create invalid (DW_AT_low_pc,DW_AT_high_pc) pairs or range list entries due to missing DW_AT_addr_base.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D61889

llvm-svn: 360679
2019-05-14 14:41:20 +00:00
Fangrui Song 32c0ebe615 Use llvm::stable_sort
Make some small adjustment while touching the code: make parameters
const, use less_first(), etc.

Differential Revision: https://reviews.llvm.org/D60989

llvm-svn: 358943
2019-04-23 02:42:06 +00:00
Fangrui Song 957c356ffe [ELF] Place SectionPiece::{Live,Hash} bit fields together
Summary:
We access Live and OutputOff (which may share the same memory location)
concurrently in 2 parallelForEachN loops. Separating them avoids subtle
data races like D41884/PR35788. This patch places Live and Hash
together.

2 reasons this is appealing:

1) Hash is immutable. Live is almost read-only - only written once in MarkLive.cpp where
   Hash is not accessed
2) we already discard low bits of Hash to decide ShardID. It doesn't
   matter much if we make 32-bit Hash to 31-bit.

   For a huge internal clang -O3 executable (1.6GiB),
   `Strings` in StringTableBuilder::finalizeStringTable contains at most 310253 elements.
   The expected number of pair-wise collisions 2^(-31) * C(310253,2) ~= 22.41 is too small to have a negative impact on performance.
   Actually, my benchmark shows there is actually a minor performance improvement.

Differential Revision: https://reviews.llvm.org/D60765

llvm-svn: 358645
2019-04-18 07:46:09 +00:00
Peter Collingbourne 97d25e068f ELF: Move build id computation to Writer. NFCI.
With partitions, each partition should have the same build id. This means
that the build id needs to be only computed once, otherwise we will end up
with different build ids in each partition as a result of the file contents
changing. This change moves the computation of the build id into Writer so
that it only happens once.

Differential Revision: https://reviews.llvm.org/D60342

llvm-svn: 358536
2019-04-16 22:45:14 +00:00
Fangrui Song abc068fc59 [ELF] Fix typo: .symtab_shndxr -> .symtab_shndx
The typo was introduced to llvm MC in rL204769 (fixed in rL358247) and then to lld.

Also, for relocatable-many-sections.s, the size of .symtab changed at some point and the formula needs update.

llvm-svn: 358248
2019-04-12 02:20:52 +00:00
Peter Collingbourne d3e207057f ELF: Move verneed tracking data structures out of VersionNeedSection.
For partitions I intend to use the same set of version indexes in
each partition for simplicity. Since each partition will need its own
VersionNeedSection this will require moving the verneed tracking out of
VersionNeedSection. The way I've done this is to move most of the tracking
into SharedFile. What will eventually become the per-partition tracking
still lives in VersionNeedSection.

As a bonus the code gets a little simpler and more consistent with how we
handle verdef.

Differential Revision: https://reviews.llvm.org/D60307

llvm-svn: 357926
2019-04-08 17:48:05 +00:00
Peter Collingbourne cc1618e668 ELF: De-template SharedFile. NFCI.
Differential Revision: https://reviews.llvm.org/D60305

llvm-svn: 357925
2019-04-08 17:35:55 +00:00
Peter Collingbourne a9e847238e ELF: Perform per-section .ARM.exidx processing during combineEhFrameSections(). NFCI.
And rename the function to combineEhSections(). This makes the processing
of .ARM.exidx even more similar to .eh_frame and means that we can avoid an
additional loop over InputSections.

Differential Revision: https://reviews.llvm.org/D60026

llvm-svn: 357417
2019-04-01 18:01:18 +00:00
Fangrui Song d83fb24533 [ELF] Rename SyntheticSection::empty to more appropriate isNeeded() with opposite meaning
Summary:
Some synthetic sections can be empty while still being needed, thus they
can't be removed by removeUnusedSyntheticSections(). Rename this member
function to more appropriate isNeeded() with the opposite meaning.

No functional change intended.

Reviewers: ruiu, espindola

Reviewed By: ruiu

Subscribers: jhenderson, grimar, emaste, arichardson, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59982

llvm-svn: 357377
2019-04-01 08:16:08 +00:00
Rui Ueyama 68b9f45fee Replace `typedef A B` with `using B = A`. NFC.
I did this using Perl.

Differential Revision: https://reviews.llvm.org/D60003

llvm-svn: 357372
2019-04-01 00:11:24 +00:00
Peter Smith 3ce9af9370 [ELF][ARM] Recommit Redesign of .ARM.exidx handling to use a SyntheticSection
Recommit r356666 with fixes for buildbot failure, as well as handling for
--emit-relocs, which we decide not to emit any relocation sections as the
table is already position independent and an offline tool can deduce the
relocations.

Instead of creating extra Synthetic .ARM.exidx sections to account for
gaps in the table, create a single .ARM.exidx SyntheticSection that can
derive the contents of the gaps from a sorted list of the executable
InputSections. This has the benefit of moving the ARM specific code for
SyntheticSections in SHF_LINK_ORDER processing and the table merging code
into the ARM specific SyntheticSection. This also makes it easier to create
EXIDX_CANTUNWIND table entries for executable InputSections that don't
have an associated .ARM.exidx section.

Fixes pr40277

Differential Revision: https://reviews.llvm.org/D59216

llvm-svn: 357160
2019-03-28 11:10:20 +00:00
Fangrui Song 210949a221 [ELF] Change GOT*_FROM_END (relative to end(.got)) to GOTPLT* (start(.got.plt))
Summary:
This should address remaining issues discussed in PR36555.

Currently R_GOT*_FROM_END are exclusively used by x86 and x86_64 to
express relocations types relative to the GOT base. We have
_GLOBAL_OFFSET_TABLE_ (GOT base) = start(.got.plt) but end(.got) !=
start(.got.plt)

This can have problems when _GLOBAL_OFFSET_TABLE_ is used as a symbol, e.g.
glibc dl_machine_dynamic assumes _GLOBAL_OFFSET_TABLE_ is start(.got.plt),
which is not true.

  extern const ElfW(Addr) _GLOBAL_OFFSET_TABLE_[] attribute_hidden;
  return _GLOBAL_OFFSET_TABLE_[0]; // R_X86_64_GOTPC32

In this patch, we

* Change all GOT*_FROM_END to GOTPLT* to fix the problem.
* Add HasGotPltOffRel to denote whether .got.plt should be kept even if
  the section is empty.
* Simplify GotSection::empty and GotPltSection::empty by setting
  HasGotOffRel and HasGotPltOffRel according to GlobalOffsetTable early.

The change of R_386_GOTPC makes X86::writePltHeader simpler as we don't
have to compute the offset start(.got.plt) - Ebx (it is constant 0).

We still diverge from ld.bfd (at least in most cases) and gold in that
.got.plt and .got are not adjacent, but the advantage doing that is
unclear.

Reviewers: ruiu, sivachandra, espindola

Subscribers: emaste, mehdi_amini, arichardson, dexonsmith, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59594

llvm-svn: 356968
2019-03-25 23:46:19 +00:00
Rui Ueyama d2e0ed7755 Simplify PltSection.
Previously, `Entries` contains pairs of symbols and their indices.
The indices are always 0, x, 2x, 3x, ..., where x is the size of
relocation entry. We didn't have to store that values because we can
compute them when we consume them.

llvm-svn: 356812
2019-03-22 21:17:25 +00:00
Peter Smith 54dab70bb7 [ELF][ARM] Revert Redesign of .ARM.exidx handling to use a SyntheticSection
There is a reproducible buildbot failure (segfault) on the 2 stage
clang-cmake-armv8-lld bot. Reverting while I investigate.

Differential Revision: https://reviews.llvm.org/D59216

llvm-svn: 356684
2019-03-21 17:17:54 +00:00
Peter Smith d3511a214e [ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection
Instead of creating extra Synthetic .ARM.exidx sections to account for
gaps in the table, create a single .ARM.exidx SyntheticSection that can
derive the contents of the gaps from a sorted list of the executable
InputSections. This has the benefit of moving the ARM specific code for
SyntheticSections in SHF_LINK_ORDER processing and the table merging code
into the ARM specific SyntheticSection. This also makes it easier to create
EXIDX_CANTUNWIND table entries for executable InputSections that don't
have an associated .ARM.exidx section.

Fixes pr40277

Differential Revision: https://reviews.llvm.org/D59216

llvm-svn: 356666
2019-03-21 14:06:40 +00:00
Fangrui Song 1092fc9057 [ELF] Allow sh_entsize to be unrelated to sh_addralign and not a power of 2
Summary:
This implements Rui Ueyama's idea in PR39044.
I've checked that ld.bfd and gold do not have the power-of-2 requirement
and do not require sh_entsize to be a multiple of sh_align.

Now on the updated test merge-entsize.s, all the 3 linkers happily
create .rodata that is not 3-byte aligned.

This has a use case in Linux arch/x86/crypto/sha512-avx2-asm.S
It uses sh_entsize of 640, which is not a power of 2.
See https://github.com/ClangBuiltLinux/linux/issues/417

Reviewers: ruiu, espindola

Reviewed By: ruiu

Subscribers: nickdesaulniers, E5ten, emaste, arichardson, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59478

llvm-svn: 356428
2019-03-18 23:49:18 +00:00
Peter Collingbourne 8a28673a2e ELF: Don't add .dynamic strings to .dynstr early.
This does not appear to be necessary because StringTableSection does not
need to be finalized, which also means that we can remove the call to
finalizeSynthetic on .dynstr.

Differential Revision: https://reviews.llvm.org/D59240

llvm-svn: 355977
2019-03-12 20:58:34 +00:00
Peter Collingbourne 5ee9abd4c8 ELF: De-template OutputSection::finalize() and MipsGotSection::build(). NFCI.
Differential Revision: https://reviews.llvm.org/D58810

llvm-svn: 355479
2019-03-06 03:07:57 +00:00
Peter Collingbourne 704dfd6e28 ELF: Extract a non-ELFT base class for VersionNeedSection.
We're going to need a separate VersionNeedSection for each partition, and
the partition data structure won't be templated.

With this the VersionTableSection class no longer needs ELFT, so detemplate it.

Differential Revision: https://reviews.llvm.org/D58808

llvm-svn: 355478
2019-03-06 03:07:48 +00:00
Peter Collingbourne 7fb9eabda5 ELF: Write .eh_frame_hdr explicitly after writing .eh_frame.
This lets us remove the special case from Writer::writeSections(), and also
fixes a bug where .eh_frame_hdr isn't necessarily written in the correct
order if a linker script moves .eh_frame and .eh_frame_hdr into the same
output section.

Differential Revision: https://reviews.llvm.org/D58795

llvm-svn: 355153
2019-02-28 23:11:35 +00:00
Peter Collingbourne f9232b0c00 ELF: Remove dead code. NFCI.
RelocationBaseSection is not used in -r links, so Config->Relocatable will
always be false.

Differential Revision: https://reviews.llvm.org/D58489

llvm-svn: 354607
2019-02-21 18:53:58 +00:00
Simon Atanasyan fae2a509fa [MIPS] Handle cross-mode (regular <-> microMIPS) jumps
The patch solves two tasks:

1. MIPS ABI allows to mix regular and microMIPS code and perform
cross-mode jumps. Linker needs to detect such cases and replace
jump/branch instructions by their cross-mode equivalents.

2. Other tools like dunamic linkers need to recognize cases when dynamic
table entries, e_entry field of an ELF header etc point to microMIPS
symbol. Linker should provide such information.

The first task is implemented in the `MIPS<ELFT>::relocateOne()` method.
New routine `fixupCrossModeJump` detects ISA mode change, checks and
replaces an instruction.

The main problem is how to recognize that relocation target is microMIPS
symbol. For absolute and section symbols compiler or assembler set the
less-significant bit of the symbol's value or sum of the symbol's value
and addend. And this bit signals to linker about microMIPS code. For
global symbols compiler cannot do the same trick because other tools like,
for example, disassembler wants to know an actual position of the symbol.
So compiler sets STO_MIPS_MICROMIPS flag in the `st_other` field.

In `MIPS<ELFT>::relocateOne()` method we have a symbol's value only and
cannot access any symbol's attributes. To pass type of the symbol
(regular/microMIPS) to that routine as well as other places where we
write a symbol value as-is (.dynamic section, `Elf_Ehdr::e_entry` field
etc) we set when necessary a less-significant bit in the `getSymVA`
function.

Differential revision: https://reviews.llvm.org/D40147

llvm-svn: 354311
2019-02-19 10:36:58 +00:00
Rui Ueyama 9efdd7ac5e [PPC64] Preserve LocalEntry when linking
On PowerPC64, it is necessary to keep the LocalEntry bits in st_other,
especially when -r is used. Otherwise, when the resulting object is used
in a posterior linking, LocalEntry info will be unavailable and
functions may be called through the wrong entrypoint.

Patch by Leandro Lupori.

Differential Revision: https://reviews.llvm.org/D56782

llvm-svn: 354184
2019-02-15 23:11:18 +00:00
Peter Collingbourne 8331f61a51 ELF: Allow GOT relocs pointing to non-preemptable ifunc to resolve to an IRELATIVE where possible.
Non-GOT non-PLT relocations to non-preemptible ifuncs result in the
creation of a canonical PLT, which now takes the identity of the IFUNC
in the symbol table. This (a) ensures address consistency inside and
outside the module, and (b) fixes a bug where some of these relocations
end up pointing to the resolver.

Fixes (at least) PR40474 and PR40501.

Differential Revision: https://reviews.llvm.org/D57371

llvm-svn: 353981
2019-02-13 21:49:55 +00:00
Rui Ueyama 1b11e9e8a4 Remove a small header that is used only by one file. NFC.
llvm-svn: 353331
2019-02-06 19:28:23 +00:00
George Rimar ae54e58b90 Recommit r353293 "[LLD][ELF] - Set DF_STATIC_TLS flag for i386 target."
With the following changes:
1) Compilation fix:
std::atomic<bool> HasStaticTlsModel = false; ->
std::atomic<bool> HasStaticTlsModel{false};

2) Adjusted the comment in code.

Initial commit message:

DF_STATIC_TLS flag indicates that the shared object or executable
contains code using a static thread-local storage scheme.

Patch checks if IE/LE relocations were used to check if the code uses
a static model. If so it sets the DF_STATIC_TLS flag.

Differential revision: https://reviews.llvm.org/D57749
----
Modified : /lld/trunk/ELF/Arch/X86.cpp
Modified : /lld/trunk/ELF/Config.h
Modified : /lld/trunk/ELF/SyntheticSections.cpp
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model1.s
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model2.s
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model3.s
Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model4.s
Added : /lld/trunk/test/ELF/i386-static-tls-model.s
Modified : /lld/trunk/test/ELF/i386-tls-ie-shared.s
Modified : /lld/trunk/test/ELF/tls-dynamic-i686.s
Modified : /lld/trunk/test/ELF/tls-opt-iele-i686-nopic.s

llvm-svn: 353299
2019-02-06 14:43:30 +00:00
George Rimar 52fafcb919 Revert r353293 "[LLD][ELF] - Set DF_STATIC_TLS flag for i386 target."
It broke BB:
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/43450
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/27891

Error is:
tools/lld/ELF/Config.h:84:41: error: copying member subobject of type
'std::atomic<bool>' invokes deleted constructor std::atomic<bool> HasStaticTlsModel = false;

llvm-svn: 353297
2019-02-06 13:53:32 +00:00
George Rimar da60ad220b [LLD][ELF] - Set DF_STATIC_TLS flag for i386 target.
DF_STATIC_TLS flag indicates that the shared object or executable
contains code using a static thread-local storage scheme.

Patch checks if IE/LE relocations were used to check if the code uses
a static model. If so it sets the DF_STATIC_TLS flag.

Differential revision: https://reviews.llvm.org/D57749

llvm-svn: 353293
2019-02-06 13:38:10 +00:00
Peter Collingbourne ff35dbac47 ELF: Set sh_info on RelaIplt to point to the IgotPlt output section.
Previously we were setting it to the GotPlt output section, which is
incorrect on ARM where this section is in .got. In static binaries
this can lead to sh_info being set to -1 (because there is no .got.plt)
which results in various tools rejecting the output file.

Differential Revision: https://reviews.llvm.org/D57274

llvm-svn: 352413
2019-01-28 19:29:41 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Mandeep Singh Grang f9d76dc354 [lld] Use range-based llvm::sort
llvm-svn: 351612
2019-01-18 23:41:34 +00:00
Rui Ueyama 56c5343e5a Use error() instead of fatal() to report an invalid address range.
In this patch we also use toString() to stringize a section.

llvm-svn: 350070
2018-12-26 19:15:04 +00:00
David Blaikie 059b1c5e01 gdb-index: Handle errors when parsing ranges
When parsing CU ranges for gdb-index, handle the error (now propagated
up though the API lld is calling here - previously the error was
printed within the libDebugInfo API, not allowing lld to format or
handle the message at all) - including information about the object and
archive name, as well as failing the link.

llvm-svn: 349979
2018-12-22 00:31:05 +00:00
Fangrui Song fe36417f6e [ELF] .gnu.hash bloom filter: use Shift2 = 26 instead of 6
Summary:
For the 2-bit bloom filter, we currently pick the bits Hash%64 and Hash>>6%64 (Shift2=6), but bits [6:...] are also used to select a word, causing a loss of precision.

In this patch, we choose Shift2=26, with is suggested by Ambrose Feinstein.

Note, Shift2 is computed as maskbitslog2 in bfd/elflink.c and gold/dynobj.cc
It is varying with the number of dynamic symbols but we don't
necessarily copy its rule.

Reviewers: ruiu, espindola

Reviewed By: ruiu

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D55971

llvm-svn: 349966
2018-12-21 21:59:34 +00:00
George Rimar ad667661c4 [ELF] - Allow discarding .dynsym from the linker script.
This is a part of https://bugs.llvm.org/show_bug.cgi?id=39810.
The patch allows discarding the .dynsym section using linker script.

Differential revision: https://reviews.llvm.org/D55218

llvm-svn: 348748
2018-12-10 09:13:36 +00:00
George Rimar 4af28e46ca [LLD][ELF] - Support discarding .dynstr section.
This is a part of https://bugs.llvm.org/show_bug.cgi?id=39810.
The patch allows discarding the .dynstr section using linker script.

Differential revision: https://reviews.llvm.org/D55215

llvm-svn: 348746
2018-12-10 09:07:30 +00:00
Fangrui Song f2143761d6 [ELF] --gdb-index: use lower_bound to compute relative CU index in the object file
Summary:
This reinstates what I originally intended to do in D54361.
It removes the assumption that .debug_gnu_pubnames has increasing CuOffset.

Now we do better than gold here: when .debug_gnu_pubnames contains
multiple sets, gold would think every set has the same CU index as the
first set (incorrect).

Reviewed By: ruiu

Reviewers: ruiu, dblaikie, espindola

Subscribers: emaste, arichardson, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54483

llvm-svn: 347820
2018-11-29 00:17:00 +00:00
Peter Smith 7dc5af75ae [ELF] Use more specific method to calculate DT_PLTRELSZ
The DT_PLTRELSZ dynamic tag is calculated using the size of the
OutputSection containing the In.RelaPlt InputSection. This will work for the
default no linker script case and the majority of linker scripts.
Unfortunately it doesn't work for some 'almost' sensible linker scripts. It
is permitted by ELF to have a single OutputSection containing both
In.RelaDyn, In.RelaPlt and In.RelaIPlt. It is also permissible for the range
of memory [DT_RELA, DT_RELA + DT_RELASZ) and the range
[DT_JMPREL, DT_JMPREL + DT_JMPRELSZ) to overlap as long as the the latter
range is at the end.

To support this type of linker script use the specific InputSection sizes.

Fixes pr39678

Differential Revision: https://reviews.llvm.org/D54759

llvm-svn: 347736
2018-11-28 10:04:55 +00:00
George Rimar a1b3ddbfec [ELF] - Implement -z nodefaultlib
This is https://bugs.llvm.org//show_bug.cgi?id=38978

Spec says that:
"Objects may be built with the -z nodefaultlib option to
suppress any search of the default locations at runtime.
Use of this option implies that all the dependencies of an
object can be located using its runpaths.
Without this option, which is the most common case, no
matter how you augment the runtime linker's library
search path, its last element is always /usr/lib for 32-bit
objects and /usr/lib/64 for 64-bit objects."

The patch implements this option.

Differential revision: https://reviews.llvm.org/D54577

llvm-svn: 347647
2018-11-27 09:48:17 +00:00
George Rimar a44c0f27c2 [LLD][ELF] - Remove the excessive safety return. NFC.
We explicitly call finalizeContents() only once for
DynamicSection. The code testing we do not do it twice is
just excessive.

It could be an assert, but we don't do
that for other sections, so does not seem we
should do it here too.

llvm-svn: 347543
2018-11-26 10:33:29 +00:00
Fangrui Song f5badf4905 [ELF] Write IPLT header in -static -z retpolineplt mode
Summary:
This fixes PR39711: -static -z retpolineplt does not produce retpoline PLT header.
-z now is not relevant.

Statically linked executable does not have PLT, but may have IPLT with no header. When -z retpolineplt is specified, however, the repoline PLT header should still be emitted.

I've checked that this fixes the FreeBSD reproduce in PR39711 and a Linux program statically linked against glibc. The programm print "Hi" rather than SIGILL/SIGSEGV.

getPltEntryOffset may look dirty after this patch, but it can be cleaned up later.

Another possible improvement is that when there are non-preemptible IFUNC symbols (rare case, e.g. -Bsymbolic), both In.Plt and In.Iplt can be non-empty and we'll emit the retpoline PLT header twice.

Reviewers: espindola, emaste, chandlerc, ruiu

Reviewed By: emaste

Subscribers: emaste, arichardson, krytarowski, llvm-commits

Differential Revision: https://reviews.llvm.org/D54782

llvm-svn: 347404
2018-11-21 18:10:00 +00:00
Sean Fertile 614dc11ca8 [PPC64] Long branch thunks.
On PowerPC64, when a function call offset is too large to encode in a call
instruction the address is stored in a table in the data segment. A thunk is
used to load the branch target address from the table relative to the
TOC-pointer and indirectly branch to the callee. When linking position-dependent
code the addresses are stored directly in the table, for position-independent
code the table is allocated and filled in at load time by the dynamic linker.

For position-independent code the branch targets could have gone in the .got.plt
but using the .branch_lt section for both position dependent and position
independent binaries keeps it consitent and helps keep this PPC64 specific logic
seperated from the target-independent code handling the .got.plt.

Differential Revision: https://reviews.llvm.org/D53408

llvm-svn: 346877
2018-11-14 17:56:43 +00:00
Fangrui Song 0736461b24 [ELF] Rename NameTypeEntry to NameAttrEntry and its field "Type" to CuIndexAndAttrs
Summary:
NameTypeEntry::Type is a bit-packed value of CU index+attributes (https://sourceware.org/gdb//onlinedocs/gdb/Index-Section-Format.html), which is named cu_index_and_attrs in a local variable in gdb/dwarf2read.c:dw2_symtab_iter_next

The new name CuIndexAndAttrs is more meaningful.

Reviewers: ruiu, dblaikie, espindola

Reviewed By: dblaikie

Subscribers: emaste, aprantl, arichardson, JDevlieghere, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54481

llvm-svn: 346794
2018-11-13 20:25:51 +00:00
Fangrui Song 9596848037 [ELF] .gdb_index: fix CuOff when a .debug_info section contains more than 1 DW_TAG_compile_unit
Summary:
Idx passed to readPubNamesAndTypes was an index into Chunks, not an
index into the CU list. This would be incorrect if some .debug_info
section contained more than 1 DW_TAG_compile_unit.

In real world, glibc Scrt1.o is a partial link of start.os abi-note.o init.o and contains 2 CUs in debug builds.
Without this patch, any application linking such Scrt1.o would have invalid .gdb_index
The issue could be demonstrated by:

    (gdb) py print(gdb.lookup_global_symbol('main'))
    None

Reviewers: espindola, ruiu

Reviewed By: ruiu

Subscribers: Higuoxing, grimar, dblaikie, emaste, aprantl, arichardson, JDevlieghere, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54361

llvm-svn: 346747
2018-11-13 08:43:07 +00:00
Fangrui Song 31be2f15a6 [ELF] Change GnuPub{Names,Types}Section from StringRef to LLDDWARFSection
Summary:
The debug_info_offset value may be relocated.

This is lld side change of D54375.

Reviewers: ruiu, dblaikie, grimar, espindola

Subscribers: emaste, arichardson, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D54376

llvm-svn: 346616
2018-11-11 18:57:35 +00:00
Fangrui Song 27d036e995 [ELF] Change sh_link of .rel{,a}.plt to make GNU strip happy
Summary:
D52830 sets sh_link to .symtab in static link, which breaks executable stripped by GNU strip.
It may also be odd that .rela.plt (SHF_ALLOC) points to .symtab (non-SHF_ALLOC).

Change the logic on pcc's suggestion.

Before:

% clang -fuse-ld=lld -static -xc =(printf 'int main(){}') # or gcc
% strip a.out; ./a.out
unexpected reloc type in static binary[1]    61634 segmentation fault  ./a.out

Reviewers: ruiu, grimar, emaste, espindola

Reviewed By: ruiu

Subscribers: pcc, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D53993

llvm-svn: 345899
2018-11-01 22:28:58 +00:00
Fangrui Song e0799a7268 [ELF] Fallback to sh_link=0 if neither .dynsym nor .symtab exists
Summary: .rela.plt may only contain R_*_{,I}RELATIVE relocations and not need a symbol table link. bfd/gold fallbacks to sh_link=0 in this case. Without this patch, ld.lld --strip-all caused lld to dereference a null pointer.

Reviewers: ruiu, grimar, espindola

Reviewed By: ruiu

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D53881

llvm-svn: 345648
2018-10-30 20:54:54 +00:00
Rui Ueyama 42ab6c53f8 Remove a global variable that we can live without.
Out::DebugInfo was used only by GdbIndex class to determine if
we need to create a .gdb_index section, but we can do the same
check without it.

Added a test that this patch doesn't change the existing behavior.

llvm-svn: 345058
2018-10-23 17:39:43 +00:00
Eli Friedman 8b44cc21c6 [ELF] Fix link failure with Android compressed relocation support.
Android uses a compressed relocation format, which means the size of the
relocation section isn't predictable based on the number of relocations,
and can vary if the layout changes in any way. To deal with this, the
linker normally runs multiple passes until the layout converges.

The layout should converge if the size of the compressed
relocation section increases monotonically: if the size of an encoded
offset increases by one byte, the larget value which can be encoded is
multiplied by 128, so the representable offsets grow much faster than
the size of the section itself.

The problem here is that there is no code to ensure the size of the
section doesn't decrease.  If the size of the relocation section
decreases, the relative offsets can increase due to alignment
restrictions, so that can force the size of the relocation section to
increase again.  The end result is an infinite loop; the loop gets cut
off after 10 iterations with the message "thunk creation not
converged".

To avoid this issue, this patch adds padding to the end of the
relocation section if its size would decrease.  The extra
padding is harmless because of the way the format is defined:
decoding stops after it reaches the number of relocations specified
in the section's header.

Differential Revision: https://reviews.llvm.org/D53003

llvm-svn: 344300
2018-10-11 21:43:06 +00:00
George Rimar 3368643f5e [ELF] - Set sh_info and sh_link for .rela.plt sections.
This is https://bugs.llvm.org/show_bug.cgi?id=37538,

Currently, LLD may set both sh_link and sh_info for
.rela.plt section to zero when we have only .rela.iplt section part used.

ELF spec (https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-94076/index.html)
says that for SHT_REL and SHT_RELA, sh_link references the associated symbol table
and sh_info the "section to which the relocation applies."

When we set the sh_link field, for the regular case we use the .dynsym index.
For .rela.iplt sections, it is unclear what is the associated symbol table,
because R_*_RELATIVE relocations do not use symbol names and we might have no
.dynsym section at all so this patch uses .symtab section index.

Differential revision: https://reviews.llvm.org/D52830

llvm-svn: 344226
2018-10-11 08:25:35 +00:00
Rui Ueyama e28c146423 Avoid unnecessary buffer allocation and memcpy for compressed sections.
Previously, we uncompress all compressed sections before doing anything.
That works, and that is conceptually simple, but that could results in
a waste of CPU time and memory if uncompressed sections are then
discarded or just copied to the output buffer.

In particular, if .debug_gnu_pub{names,types} are compressed and if no
-gdb-index option is given, we wasted CPU and memory because we
uncompress them into newly allocated bufers and then memcpy the buffers
to the output buffer. That temporary buffer was redundant.

This patch changes how to uncompress sections. Now, compressed sections
are uncompressed lazily. To do that, `Data` member of `InputSectionBase`
is now hidden from outside, and `data()` accessor automatically expands
an compressed buffer if necessary.

If no one calls `data()`, then `writeTo()` directly uncompresses
compressed data into the output buffer. That eliminates the redundant
memory allocation and redundant memcpy.

This patch significantly reduces memory consumption (20 GiB max RSS to
15 Gib) for an executable whose .debug_gnu_pub{names,types} are in total
5 GiB in an uncompressed form.

Differential Revision: https://reviews.llvm.org/D52917

llvm-svn: 343979
2018-10-08 16:58:59 +00:00
George Rimar f76dffb90f [ELF] - Simplify. NFCI.
Assign the `Link` to parent directly.

llvm-svn: 343762
2018-10-04 09:31:15 +00:00
Fangrui Song dbaeec6892 [ELF] llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: ruiu, espindola

Subscribers: emaste, arichardson, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D52569

llvm-svn: 343146
2018-09-26 20:54:42 +00:00
Rui Ueyama 12ef7a9575 De-template VersionDefinitionSection. NFC.
When we write a struct to a mmap'ed buffer, we usually use
write16/32/64, but we didn't for VersionDefinitionSection, so
we needed to template that class.

llvm-svn: 343024
2018-09-25 20:37:51 +00:00
Rui Ueyama 4e247522ac Reset input section pointers to null on each linker invocation.
Previously, if you invoke lld's `main` more than once in the same process,
the second invocation could fail or produce a wrong result due to a stale
pointer values of the previous run.

Differential Revision: https://reviews.llvm.org/D52506

llvm-svn: 343009
2018-09-25 19:26:58 +00:00
Rui Ueyama 7d053709d4 Parallelize .gdb_index string table writes.
When we are creating a large .gdb_index, this change makes a difference.

llvm-svn: 342978
2018-09-25 14:34:56 +00:00
Fangrui Song 3e0a54e9da [ELF] Use llvm::toLower instead of libc call tolower
tolower() has some overhead because current locale is considered (though in lld the default "C" locale is used which does not matter too much). llvm::toLower is more efficient as it compiles to a compare and a conditional jump, as opposed to a libc call if tolower is used.

Disregarding locale also matches gdb's behavior (gdb/minsyms.h):

    #define SYMBOL_HASH_NEXT(hash, c)			\
      ((hash) * 67 + TOLOWER ((unsigned char) (c)) - 113)

where TOLOWER (include/safe-ctype.h) is a macro that uses a lookup table under the hood which is similar to llvm::toLower.

Reviewers: ruiu, espindola

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D52128

llvm-svn: 342342
2018-09-15 23:59:13 +00:00
Rui Ueyama e7688e6663 Revert r342297: Discard uncompressed buffer after creating .gdb_index contents.
Looks like it broke some local builds that use -gdb-index.

llvm-svn: 342298
2018-09-14 23:28:13 +00:00
Rui Ueyama 751dfbe39b Discard uncompressed buffer after creating .gdb_index contents.
Once we create .gdb_index contents, .zdebug_gnu_pub{names,types}
are useless, so there's no need to keep their uncompressed data
in memory.

I observed that for a test case in which lld creates a 3GB .gdb_index
section, the maximum resident set size reduced from 43GB to 29GB after
this patch.

Differential Revision: https://reviews.llvm.org/D52126

llvm-svn: 342297
2018-09-14 22:57:39 +00:00
Ed Maste c0b474f67a lld: add -z interpose support
-z interpose sets the DF_1_INTERPOSE flag, marking the object as an
interposer.

Via FreeBSD PR 230604, linking Valgrind with lld failed.

Differential Revision:	https://reviews.llvm.org/D52094

llvm-svn: 342239
2018-09-14 14:25:37 +00:00
George Rimar 27bbe7d0b4 [LLF][ELF] - Support -z global.
-z global is a flag used on Android (see D49198).

Differential revision: https://reviews.llvm.org/D49374

llvm-svn: 340802
2018-08-28 08:24:34 +00:00
George Rimar 20f994d350 [LLD][ELF] - Fix warning.
This fixes the following warning when compiling with gcc version 8.0.1 20180319 (experimental) (GCC):

/home/umb/LLVM/llvm/tools/lld/ELF/SyntheticSections.cpp:1951:46: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
     return OS->SectionIndex >= SHN_LORESERVE ? SHN_XINDEX : OS->SectionIndex;

llvm-svn: 340164
2018-08-20 10:29:21 +00:00
Fangrui Song ebf9607d7d [ELF] mergeSections: remove non-alive MergeInputSection
Summary: This makes it conform to what the comment says. Otherwise when getErrPlace() is called afterwards, cast<InputSection>(D) will cause incompatible cast as MergeInputSection is not a subclass of InputSection.

Reviewers: ruiu, grimar, espindola, pcc

Reviewed By: grimar

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D50742

llvm-svn: 339904
2018-08-16 17:22:02 +00:00
George Rimar 88863a5f62 [ELF] - Get rid of SyntheticSection::postThunkContents(). NFCI.
It turns out that postThunkContents() is only used for
sorting symbols in .symtab.

Though we can instead move the logic to SymbolTableBaseSection::finalizeContents(),
postpone calling it and then get rid of postThunkContents completely.

Differential revision: https://reviews.llvm.org/D49547

llvm-svn: 339413
2018-08-10 07:24:18 +00:00
Reid Kleckner c07f9bb83a Update for DWARF API change
llvm-svn: 338642
2018-08-01 21:57:15 +00:00
Rui Ueyama 0e1ba29ac3 Simplify. NFC.
llvm-svn: 338409
2018-07-31 18:13:36 +00:00
George Rimar 9524dee72e [ELF] - Implement SHT_SYMTAB_SHNDX (.symtab_shndxr) section.
This is relative to https://bugs.llvm.org//show_bug.cgi?id=38119.

SHT_SYMTAB section is able to keep symbols with output section indices
up to 0xff00 (SHN_LORESERVE). But if we have indices that are greater
than that (PR shows that it might happen), we need to use
SHT_SYMTAB_SHNDX extended section. It was not supported by LLD.

Description of the SHT_SYMTAB_SHNDX section is here:
https://docs.oracle.com/cd/E19683-01/817-3677/chapter6-94076/index.html.

Differential revision: https://reviews.llvm.org/D49541

llvm-svn: 338247
2018-07-30 12:39:54 +00:00
Simon Atanasyan 52854504cc [ELF][MIPS] Fix primary GOT sometimes overflowing by one or two words
If we fail to merge a secondary GOT with the primary GOT but so far only
one merged GOT has been created (the primary one), the final element in
MergedGots is the primary GOT. Thus we should not try to merge with this
final element passing IsPrimary=false, since this will ignore the fact
that the destination GOT does in fact need a header, and those extra two
entries can be enough to allow the merge to incorrectly occur. Instead
we should check for this case before attempting the second merge.

Patch by James Clarke.

Differential revision: https://reviews.llvm.org/D49422

llvm-svn: 337810
2018-07-24 05:40:37 +00:00
Andrew Ng e33d691990 [ELF] Fix handling of FDE negative relative PC addr
Signed values for the FDE PC addr were not correctly handled in
readFdeAddr(). If the value is negative and the type of the value is
smaller than 64 bits, the FDE PC addr overflow error would be
incorrectly triggered.

Fixed readFdeAddr() to properly handle signed values by sign extending
where appropriate.

Differential Revision: https://reviews.llvm.org/D49557

llvm-svn: 337683
2018-07-23 11:29:46 +00:00
Fangrui Song a66d77b22b [ELF] Check eh_frame_hdr overflow with PC offsets instead of PC absolute addresses
Reviewers: grimar, ruiu, espindola

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D49607

llvm-svn: 337610
2018-07-20 20:27:42 +00:00
George Rimar ed2605d36d [ELF] - Eliminate dead code. NFC.
Code was dead because we call postThunkContents only for SHT_SYMTAB.

llvm-svn: 337460
2018-07-19 14:39:56 +00:00
George Rimar 7d8e632e98 [ELF] - Stop silently producing a broken .eh_frame_hdr.
Currently, getFdePC() returns uint64_t. Its because the following
encodings might use 8 bytes: DW_EH_PE_absptr and DW_EH_PE_udata8.

But caller assigns returned value to uint32_t field:
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L508

Value is used for building .eh_frame_hdr section.
We use DW_EH_PE_sdata4 encoding for building it at this moment:
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L2545

And that means that an overflow issue might happen if
DW_EH_PE_absptr/DW_EH_PE_udata8 address encodings are present
in .eh_frame. In that case, before this patch, we silently would
truncate the address and produced broken .eh_frame_hdr section.

It would be not hard to support real 64-bit values for
DW_EH_PE_absptr/DW_EH_PE_udata8 encodings, but it is
unclear if it is usefull and if we should do it.

Since nobody faced/reported it, int this patch I only implement
a check to stop producing broken output silently for now.

llvm-svn: 337382
2018-07-18 11:56:53 +00:00
George Rimar 46ae0afba7 [ELF] - Eliminate dead 'return' in EhFrameSection::finalizeContents(). NFC.
EhFrameSection::finalizeContents() is called from a single place:
https://github.com/llvm-mirror/lld/blob/master/ELF/Writer.cpp#L1559

So code was dead.

llvm-svn: 337287
2018-07-17 14:36:19 +00:00
George Rimar b892e4194f [ELF] - Remove dead code from EhFrameSection::addCie. NFC.
Code was dead because caller of the addCie() already
checks that ID is equal to 0:
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L431

llvm-svn: 337281
2018-07-17 13:56:23 +00:00
George Rimar 7da3f388e3 [ELF] - Eliminate dead code. NFC.
Code was dead because at the moment of BssSection creation
it can never have a parent. Also, code simply does not
make sence as alignment adjastment happens when
BssSection is added to its parent later.

llvm-svn: 337276
2018-07-17 13:13:08 +00:00
George Rimar cb17fdbe3c [ELF] - Add classof() member for ARMExidxSentinelSection.
Or code uses constructions like isa<ARMExidxSentinelSection>:
https://github.com/llvm-mirror/lld/blob/master/ELF/Writer.cpp#L1428
https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L2944

That is confusing, because without ARMExidxSentinelSection::classof()
these lines are equal to isa<SyntheticSection> and the code does not really do
the same what it expected to. I found no good way to break it though, but it is not nice.

Patch adds ARMExidxSentinelSection::classof().

llvm-svn: 336813
2018-07-11 15:11:13 +00:00
Rui Ueyama d29c039119 Parallelize GdbIndexSection's symbol table creation.
Since .gdb_index sections contain all known symbols, they can be very large.
One of my executables has a .gdb_index section of 1350 GiB. Uniquifying
symbols by name takes 3.77 seconds on my machine. This patch parallelize it.

  Time to call createSymbols() with 8.4 million unique symbols:

  Without this patch: 3773 ms
  Parallelism = 1:    4374 ms
  Parallelism = 2:    2628 ms
  Parallelism = 16:    837 ms

As you can see above, this algorithm is a bit more inefficient
than the non-parallelized version, but even with dual-core, it is
faster than that, so I think it is overall a win.

Differential Revision: https://reviews.llvm.org/D49164

llvm-svn: 336790
2018-07-11 11:37:10 +00:00
Rui Ueyama f51c853cf1 Remove a workaround for an old GCC bug.
This workaround is for GCC 5.4.1. Without this workaround, lld will
produce larger .gdb_index sections for object files compiled with the
buggy version of the compiler.

Since it is not for correctness, and it affects only debug builds (since
you are generating .gdb_index sections), perhaps the hack shouldn't have been
added in the first place. At least, I think it is time to remove this hack.

Differential Revision: https://reviews.llvm.org/D49149

llvm-svn: 336788
2018-07-11 10:52:00 +00:00
Rui Ueyama f3731d4e9c Refactor GdbIndexSection. NFC.
This patch merges createGdbIndex function and GdbIndexSection's
constructor into a single static member function of the class.

This patch also change how we keep CU vectors. Previously, CuVector
and GdbSymbols were parallel arrays, but there's no reason to choose that
design. Now, CuVector is a member of GdbSymbol class.

A lot of members are removed from GdbIndexSection. Previously, it has
members that need to be kept in sync over several phases. I belive the new
design is less error-prone, and the new code is much easier to read
than before.

llvm-svn: 336743
2018-07-10 23:48:27 +00:00
Rui Ueyama bee192f6b0 Simplify. NFC.
llvm-svn: 336686
2018-07-10 15:15:56 +00:00
Rui Ueyama 3467fac091 Rename a variable for consistency. NFC.
llvm-svn: 336674
2018-07-10 14:03:39 +00:00
Rui Ueyama 7f112ea26d Reduce memory usage when creating .gdb_index. NFC.
.gdb_index sections can be very large. When you are compiling
multi-gibibyte executables, they can be larger than 1 GiB. The previous
implementation of .gdb_index seems to consume too much memory.

This patch reduces memory consumption by eliminating temporary objects.
In one experiment, memory consumption of GdbIndexSection class is
reduced from 962 MiB to 228 MiB when creating a .gdb_index of 1350 GiB.

Differential Revision: https://reviews.llvm.org/D49094

llvm-svn: 336672
2018-07-10 13:49:13 +00:00
Rui Ueyama 2a3036fb1d Report an error for an extremely large .gdb_index section.
I believe the only way to test this functionality is to create extremely
large object files and attempt to create a .gdb_index that is greater
than 4 GiB. But I think that's too much for most environments and buildbots,
so I'm commiting this without a test that actually triggers the new
error condition.

llvm-svn: 336631
2018-07-10 01:22:25 +00:00
Rui Ueyama a9e169edff Fix a bug for packed relocations.
Previously, we didn't create multiple consecutive bitmaps.
Added a test to catch this bug too.

Differential Revision: https://reviews.llvm.org/D49107

llvm-svn: 336620
2018-07-09 23:54:24 +00:00
Rui Ueyama 703c872a4a Simplify RelrSection<ELFT>::updateAllocSize.
This patch also speeds it up by making some constants compile-time
constants. Other than that, NFC.

Differential Revision: https://reviews.llvm.org/D49101

llvm-svn: 336614
2018-07-09 22:29:57 +00:00
Rui Ueyama 11479daf2f lld: add experimental support for SHT_RELR sections.
Patch by Rahul Chaudhry!

This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg

Pass '--pack-dyn-relocs=relr' to enable generation of SHT_RELR section
and DT_RELR, DT_RELRSZ, and DT_RELRENT dynamic tags.

Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!

Pass '--use-android-relr-tags' with '--pack-dyn-relocs=relr' to use
SHT_ANDROID_RELR section type instead of SHT_RELR, as well as
DT_ANDROID_RELR* dynamic tags instead of DT_RELR*. The generated
section contents are identical.

'--pack-dyn-relocs=android+relr --use-android-relr-tags' enables both
'--pack-dyn-relocs=android' and '--pack-dyn-relocs=relr': lld will
encode the relative relocations in a SHT_ANDROID_RELR section, and pack
the rest of the dynamic relocations in a SHT_ANDROID_REL(A) section.

Differential Revision: https://reviews.llvm.org/D48247

llvm-svn: 336594
2018-07-09 20:08:55 +00:00
Zaara Syeda de54f584cc [PPC64] Add support for R_PPC64_GOT_DTPREL16* relocations
The local dynamic TLS access on PPC64 ELF v2 ABI uses R_PPC64_GOT_DTPREL16*
relocations when a TLS variables falls outside 2 GB of the thread storage
block. This patch adds support for these relocations by adding a new RelExpr
called R_TLSLD_GOT_OFF which emits a got entry for the TLS variable relative
to the dynamic thread pointer using the relocation R_PPC64_DTPREL64. It then
evaluates the R_PPC64_GOT_DTPREL16* relocations as the got offset for the
R_PPC64_DTPREL64 got entries.

Differential Revision: https://reviews.llvm.org/D48484

llvm-svn: 335732
2018-06-27 13:55:41 +00:00
George Rimar 8f53b6d71e [ELF] - Change the way of sorting local symbols.
rLLD329787 added the stable sorting to SymbolTableBaseSection::postThunkContents.

I profiled the Mozilla (response-O0.txt) from lld-speed-test package and found
std::stable_sort is showing up in profile results and consuming the 3.1% of the total
CPU time in the RelWithDebug build. Total time of postThunkContents is 3.54%, 238ms.

This change reduces postTimeContents time to 50ms, making it to take 0.73% of Total CPU time.

So, instead of sorting the local part I suggest to just rebuild it.
That is what this patch does.

Differential revision: https://reviews.llvm.org/D45519

llvm-svn: 335583
2018-06-26 08:50:09 +00:00
Simon Atanasyan 6d163c5feb [ELF][MIPS] Fill a primary-GOT as much as possible
While building a Global Offset Table try to fill the primary GOT as much
as possible because the primary GOT can be accessed in the most
effective way. If it is not possible, try to fill the last GOT in the
multi-GOT list, and finally create a new GOT if both attempts failed.

llvm-svn: 335140
2018-06-20 15:58:48 +00:00
Fangrui Song bd3684f25b [ELF] Support -z initfirst
Summary:
glibc uses this option to link libpthread.so

glibc/nptl/Makefile:
LDFLAGS-pthread.so = -Wl,--enable-new-dtags,-z,nodelete,-z,initfirst

Reviewers: ruiu, echristo, espindola

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D48329

llvm-svn: 335090
2018-06-20 02:06:01 +00:00
Simon Atanasyan 9655fe6322 [ELF][MIPS] Fix stable_sort predicate to satisfy strict-ordering requirement. NFC
Fix for PR37785.

llvm-svn: 334851
2018-06-15 18:15:26 +00:00
Simon Atanasyan 9629d7897d [ELF][MIPS] Replace calls to MapVector::find by MapVector::lookup. NFC
llvm-svn: 334705
2018-06-14 11:53:31 +00:00
Alexander Richardson 127176e59e [ELF][MIPS] Fix TLS GOT entries for local symbols in shared libraries
Summary:
Previously LLD would not add any dynamic relocations and write a module
index of 1 which is not correct for the shared library case.
This can happen when a thread-local global variable is marked as local with
a version script. With this change I am now able to link all of the FreeBSD
base system for MIPS64 with LLD.

Differential Revision: https://reviews.llvm.org/D48002

llvm-svn: 334483
2018-06-12 08:00:38 +00:00