A user program may enumerate sections named with a C identifier using
__start_* and __stop_* symbols. We cannot ICF any such sections because
that could change program semantics.
Differential Revision: https://reviews.llvm.org/D47242
llvm-svn: 333054
Note that this doesn't do the right thing in the case where there is
a linker script. We probably need to move output section assignment
before ICF to get the correct behaviour here.
Differential Revision: https://reviews.llvm.org/D47241
llvm-svn: 333052
When a symbol is GC'd it can still be references by relocations
in the debug sections, but such symbols are not assigned virtual
addresses.
This change adds a new global data symbol which gets GC'd but
should still appears in the output debug info, albeit with a 0
address.
Fixes 37555
Differential Revision: https://reviews.llvm.org/D47238
llvm-svn: 333047
Previously, we had a loop to iterate over options starting with
`--plugin-opt=` and parse them by hand. But we can make OptTable
do that job for us.
Differential Revision: https://reviews.llvm.org/D47167
llvm-svn: 332935
This change adds the ability for lld to remove LEB padding from
code section. This effectively shrinks the size of the resulting
binary in proportion to the number of code relocations.
Since there will be a performance cost this is currently only active for
-O1 and above. Some toolchains may instead want to perform this
compression as a post linker step (for example running a binary through
binaryen will automatically compress these values).
I imagine we might want to make this the default in the future.
Differential Revision: https://reviews.llvm.org/D46416
llvm-svn: 332783
_init_array_start/end are placed at 0 if no ".init_array" presents,
this causes .text relocation against them become more prone to overflow.
This CL sets ".init_array" address to that of ".text" to mitigate the situation.
Review: https://reviews.llvm.org/D46200
llvm-svn: 332688
Previously we emitted 20-byte SHA1 hashes. This is overkill
for identifying debug info records, and has the negative side
effect of making object files bigger and links slower. By
using only the last 8 bytes of a SHA1, we get smaller object
files and ~10% faster links.
This modifies the format of the .debug$H section by adding a new
value for the hash algorithm field, so that the linker will still
work when its object files have an old format.
Differential Revision: https://reviews.llvm.org/D46855
llvm-svn: 332669
The prefix includes type kind, which is important to preserve. Two
different type leafs can easily have the same interior record contents
as another type.
We ran into this issue in PR37492 where a bitfield type record collided
with a const modifier record. Their contents were bitwise identical, but
their kinds were different.
llvm-svn: 332664
Patch by Mark Kettenis.
Make ALIGN work in linker scripts used with the -r option. This works in
GNU ld (ld.bfd) and is used to generate the "random gap" object for
linking the OpenBSD kernel.
Differential Revision: https://reviews.llvm.org/D46839
llvm-svn: 332656
Previously we would always write a hash of the binary into the
PE file, for reproducible builds. This breaks AppCompat, which
is a feature of Windows that relies on the timestamp in the PE
header being set to a real value (or at the very least, a value
that satisfies certain properties).
To address this, we put the old behavior of writing the hash
behind the /Brepro flag, which mimics MSVC linker behavior. We
also match MSVC default behavior, which is to write an actual
timestamp to the PE header. Finally, we add the /TIMESTAMP
option (an lld extension) so that the user can specify the exact
value to be used in case he/she manually constructs a value which
is both reproducible and satisfies AppCompat.
Differential Revision: https://reviews.llvm.org/D46966
llvm-svn: 332613
Currently, LLD marks all non-allocatable sections except SHF_REL[A] as Live
when doing GC.
This can be a reason of the crash when SHF_LINK_ORDER sections
are involved, because their parents can be dead.
We should do GC for them correctly. The patch implements it.
Differential revision: https://reviews.llvm.org/D46880
llvm-svn: 332589
This CL places .dynsym and .dynstr at the beginning of SHF_ALLOC
sections. We do this to mitigate the possibility that huge .dynsym and
.dynstr sections placed between ro-data and text sections cause
relocation overflow.
Differential Revision: https://reviews.llvm.org/D45788
llvm-svn: 332374
The --keep-unique <symbol> option is taken from gold. The intention is that
<symbol> will be prevented from being folded by ICF. Although not
specifically mentioned in the documentation <symbol> only matches
global symbols, with a warning if the symbol is not found.
The implementation finds the Section defining <symbol> and removes it from
the set of sections considered for ICF.
Differential Revision: https://reviews.llvm.org/D46755
llvm-svn: 332332
This allows producing pdb debug info. This is an LLD specific option
since GCC and GNU binutils doesn't support the PDB file format.
Differential Revision: https://reviews.llvm.org/D46796
llvm-svn: 332327
Since we a no longer using this function for the wasm start
section we don't actually care what its signature is.
Differential Revision: https://reviews.llvm.org/D46594
llvm-svn: 332308
The relocation R_PPC64_REL64 should return R_PC for getRelExpr since it
computes S + A - P.
Differential Revision: https://reviews.llvm.org/D46766
llvm-svn: 332259
The relocation R_PPC64_REL32 should return R_PC for getRelExpr since it
computes S + A - P.
Differential Revision: https://reviews.llvm.org/D46586
llvm-svn: 332252
If a symbol with an undefined version in a DSO is not going to be
exported into the dynamic symbol table then do not give an error message
for the missing version. This can happen with the --exclude-libs option
which implicitly gives all symbols in a static library the local version.
This matches the behavior of ld.gold and is exploited by the Bionic
dynamic linker on Arm.
Differential Revision: https://reviews.llvm.org/D43126
llvm-svn: 332224
This is needed to avoid merging two functions with identical
instructions but different xdata. It also reduces binary size by
deduplicating identical pdata sections.
Fixes PR35337.
Differential Revision: https://reviews.llvm.org/D46672
llvm-svn: 332169
We discovered (crbug.com/838449#c24) that string tail merging can
negatively affect compressed binary size, so provide a flag to turn
it off for users who care more about compressed size than uncompressed
size.
Differential Revision: https://reviews.llvm.org/D46780
llvm-svn: 332149
Both R_PPC_CALL and R_PPC_CALL_PLT Exprs map to the R_PPC64_REL24 relocation
which has the form Sym + addend - P.
Differential Revision: https://reviews.llvm.org/D46654
llvm-svn: 332127
Summary:
Suppose we visit symbols in this order:
1. weak definition of foo in a lazy object
2. reference of foo
3 (optional). definition of foo
bfd/gold allows 123 but not 12.
Current --warn-backrefs implementation will report both cases as a backward reference. With this change, both 123 (intended) and 12 (unintended) are allowed. The usage of weak definitions usually imply there are also global definitions, so the trade-off is justified.
Reviewers: ruiu, espindola
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D46624
llvm-svn: 332061
The previous CL changes the order of output sections, which causes address changes in test cases.
Review: https://reviews.llvm.org/D46730
llvm-svn: 332054
This CL is to mitigate R_X86_64_PC32 relocation overflow problems for huge binaries that has near 4G allocated sections.
By examining those binaries, there're 2 issues contributes to the problem:
1). huge ".dynsym" and ".dynstr" stands in the way between .rodata and .text
2). _init_array_start/end are placed at 0 if no ".init_array" presents, this causes .text relocation against them become more prone to overflow.
This CL addresses 1st problem (the 2nd will be addressed in another CL.) by assigning a smaller sortrank to .dynsym and .dynstr thus they no longer stand in between.
llvm-svn: 332038
Merging data segments produces smaller code sizes because each segment
has some boilerplate. Therefore, merging data segments is generally the
right approach, especially with wasm where binaries are typically
delivered over the network.
However, when analyzing wasm binaries, it can be helpful to get a
conservative picture of which functions are using which data
segments[0]. Perhaps there is a large data segment that you didn't
expect to be included in the wasm, introduced by some library you're
using, and you'd like to know which library it was. In this scenario,
merging data segments only makes the analysis worse.
Alternatively, perhaps you will remove some dead functions by-hand[1]
that can't be statically proven dead by the compiler or lld, and
removing these functions might make some data garbage collect-able, and
you'd like to run `--gc-sections` again so that this now-unused data can
be collected. If the segments were originally merged, then a single use
of the merged data segment will entrench all of the data.
[0] https://github.com/rustwasm/twiggy
[1] https://github.com/fitzgen/wasm-snip
Patch by Nick Fitzgerald!
Differential Revision: https://reviews.llvm.org/D46417
llvm-svn: 332013
A non-alloc note section should not have a PT_NOTE program header.
Found while linking ghc (Haskell compiler) with lld on FreeBSD.
ghc emits a .debug-ghc-link-info note section (as the name suggests, it
contains link information) as a SHT_NOTE section without SHF_ALLOC set.
For this case ld.bfd does not emit a PT_NOTE segment for the
.debug-ghc-link-info section. lld previously emitted a PT_NOTE with
p_vaddr = 0 and FreeBSD's rtld segfaulted when trying to parse a note at
address 0.
llvm.org/pr37361
Differential Revision: https://reviews.llvm.org/D46623
llvm-svn: 331973