Ported the D64906 technique to EM_386.
If `sh_addralign(.tdata) < sh_addralign(.tbss)`,
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`.
ld.so that are known to have problems if p_vaddr%p_align!=0:
* FreeBSD 13.0-CURRENT rtld-elf
* glibc https://sourceware.org/bugzilla/show_bug.cgi?id=24606
New test i386-tls-vaddr-align.s checks our workaround makes p_vaddr%p_align = 0.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D65865
llvm-svn: 369347
Ported the D64906 technique to AArch64. It deletes 3 alignments at
PT_LOAD boundaries for the default case: the size of an aarch64 binary
decreases by at most 192kb.
If `sh_addralign(.tdata) < sh_addralign(.tbss)`,
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`.
ld.so that are known to have problems if p_vaddr%p_align!=0:
* musl<=1.1.22
* FreeBSD 13.0-CURRENT (and before) rtld-elf arm64
New test aarch64-tls-vaddr-align.s checks that our workaround makes p_vaddr%p_align = 0.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64930
llvm-svn: 369344
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
After D66007/r369262, if the control flow reaches `if (sym.isUndefined())`, we know:
* The relocation is not a link-time constant => symbol is preemptable => Undefined or SharedSymbol
* Not an undef weak.
* -no-pie.
* The symbol type is neither STT_OBJECT nor STT_FUNC.
ld.lld --export-dynamic --unresolved-symbols=ignore-all %t.o can satisfy
these conditions. Delete the isUndefined() test so that we error
`symbol '...' has no type`, because we don't know the type to make the
decision to create copy relocation/canonical PLT.
llvm-svn: 369271
In processRelocAux(), we handle errors before copy relocation/canonical PLT.
This makes error checking a bit complex because we have to check for
conditions that will be allowed by copy relocation/canonical PLT.
Instead, move copy relocation/canonical PLT before error checking. This
simplifies the previous clumsy error checking code
`config->shared || (config->pie && expr == R_ABS && type != target->symbolicRel)`
to the simple `config->isPic`. Some diagnostics can be reported in
different ways. The code motion changes diagnostics for some contrived
test cases:
* copy-rel-pie-error.s -> copy-rel-pie2.s:
It was rejected before but accepted now. ld.bfd also accepts the case.
* copy-errors.s: "cannot preempt symbol" changes to "symbol 'bar' has no type"
* got32{,x}-i386.s: the suggestion changes from "-fPIC or -Wl,-z,notext" to "-fPIE"
* x86-64-dyn-rel-error5.s: one diagnostic changes for -pie case
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D66007
llvm-svn: 369262
Like rLLD354040
Previously, for unknown relocation types, in -no-pie/-pie mode, we got something like:
foo.o: unrecognized relocation ...
In -shared mode:
error: can't create dynamic relocation ... against symbol: yyy in readonly segment
Delete the default case from Hexagon::getRelExpr and add the error there. We will get consistent error message like `error: unknown relocation (1024) against symbol foo`
Reviewed By: sidneym
Differential Revision: https://reviews.llvm.org/D66275
llvm-svn: 369260
Fixes https://github.com/ClangBuiltLinux/linux/issues/640
R_PPC64_REL16_HI was incorrectly computed as an R_ABS relocation.
rLLD368964 made it a linker failure. Change it to use R_PC to fix the
failures.
Add ppc64-reloc-rel.s for these R_PPC64_REL* tests.
llvm-svn: 369184
R_GOTPLT is relative to .got.plt since D59594. Since R_HEXAGON_GOT
relocations always have 0 r_addend, they can use R_GOTPLT instead.
Reviewed By: sidneym
Differential Revision: https://reviews.llvm.org/D66274
llvm-svn: 369128
Summary:
libstdc++ in GCC 5.1 has some bugs. The move to C++14 in D66195 triggered one
such bug caused by the new constexpr support in C++14, and the implementation
doing SFINAE wrong with the comparator to std::stable_sort.
Here's a small repro: https://godbolt.org/z/2QC3-n
The fix is to inline the lambdas directly into the llvm::stable_sort call
instead of erasing them through a std::function. The code is more readable as
well.
Reviewers: thakis, ruiu, espindola
Subscribers: emaste, arichardson, MaskRay, jkorous, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66306
llvm-svn: 369023
Like rLLD354040.
Previously, for unrecognized relocation types, in -no-pie/-pie mode, we got something like:
foo.o: unrecognized relocation ...
In -shared mode:
error: can't create dynamic relocation ... against symbol: yyy in readonly segment
Delete the default case from AArch64::getRelExpr and add the error there.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D66277
llvm-svn: 368983
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368936
A new symbol is added to elf::symtab in 3 steps:
1) SymbolTable::insert creates a placeholder.
2) Symbol::mergeProperties
3) Symbol::replace
Fields referenced by steps 2) and 3) should be initialized in
SymbolTable::insert. `traced` and `referenced` were missed previously.
This did not cause problems because compilers generated code that
initialized them (bit fields) to 0.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D66130
llvm-svn: 368784
Currently the following 3 relocation types do not trigger the creation
of a canonical PLT (which changes STT_GNU_IFUNC to STT_FUNC and
redirects all references):
1) GOT-generating (`needsGot`)
2) PLT-generating (`needsPlt`)
3) R_ABS with 0 addend in a writable location. This is used for
for ifunc function pointers in writable sections such as .data and .toc.
This patch deletes case 3) to simplify the R_*_IRELATIVE generating
logic added in D57371. Other advantages:
* It is guaranteed no more than 1 R_*_IRELATIVE is created for an ifunc.
* PPC64: no need to special case ifunc in toc-indirect to toc-relative relaxation. See D65755
The deleted elf::addIRelativeRelocs demonstrates that one-pass scan
through relocations makes several optimizations difficult. This is
something we can think about in the future.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D65995
llvm-svn: 368661
In Writer::includeInDynSym(), exportDynamic is used by a Defined with
protected or default visibility, to record whether it is required to be
exported into .dynsym. It is set when any of the following conditions
hold:
1) There is an interposable symbol from a DSO (Undefined or SharedSymbol with default visibility)
2) If -shared or --export-dynamic is specified, any symbol in an object file/bitcode sets this property, unless suppressed by canBeOmittedFromSymbolTable().
3) --dynamic-list when producing an executable
4) protected symbol from a DSO preempted by copy relocation/canonical PLT when
--ignore-{data,function}-address-equality is specified
5) ifunc is exported when -z ifunc-noplt is specified
Bullet points 4) and 5) are irrelevant in this patch.
Bullet 3) does not play well with 1) and 2). When -shared is specified,
exportDynamic of most symbols is true. This makes it incapable to record
--dynamic-list marked symbols. We thus have obscure:
if (!config->shared)
b->exportDynamic = true;
else if (b->includeInDynsym())
b->isPreemptible = true;
This patch adds another bit `Symbol::inDynamicList` to record
3). We can thus simplify handleDynamicList() by unifying the DSO and
executable cases. It also allows us to simplify isPreemptible - now
the field is only used in finalizeSections() and later stages.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D66091
llvm-svn: 368659
After r367869, VER_NDX_LOCAL can only be assigned to Defined and
CommonSymbol. CommonSymbol becomes Defined after replaceCommonSymbols(),
thus `versionId == VER_NDX_LOCAL` will imply `isDefined()`.
In maybeReportUndefined(), computeBinding() is called when the symbol is
unknown to be Undefined. computeBinding() != STB_LOCAL will always be
true.
llvm-svn: 368536
!isPreemptible was added in r343668 to fix PR39104: symbols redefined by
replaceWithDefined() might be incorrectly considered STB_LOCAL if a
version script specified `local: *;`.
After r367869 (`config->defaultSymbolVersion` was removed), we will
assign VER_NDX_LOCAL to only regular Defined and CommonSymbol, not
Defined created by replaceWithDefined() (because scanVersionScript() is
called before scanRelocations()). The !isPreemptible is thus redundant
and can be deleted.
llvm-svn: 368535
If the dot gets moved by an explicit section address, an empty gap between sections could be created. The encompassing region for the section being parsed needs to be expanded to include the gap.
Differential Revision: https://reviews.llvm.org/D65722
Patch by Gabriel Smith!
llvm-svn: 368379
This ensures these errors produce a non-zero exit and improves the
context (providing the name of the input object and section being
parsed).
llvm-svn: 368378
In the case where C identifier sections have SHF_LINK_ORDER they will most
likely be placed in the same partition as the section that they are associated
with. But unless this happens to be the main partition, this will cause them
to be excluded from the range covered by the __start_ and __stop_ symbols,
which may lead to incorrect program behaviour. So we need to move them
all into the main partition so that they will be covered by the __start_
and __stop_ symbols.
We may want to refine this approach later and allow different __start_/__stop_
symbol values for different partitions. This would only make sense for
relocations from SHT_NOTE sections since they are duplicated into each
partition.
Differential Revision: https://reviews.llvm.org/D65909
llvm-svn: 368375
This patch Implements the R_AARCH64_TLSLE_MOVW_TPREL_G*[_NC]. These are
logically the same calculation as the existing TLSLE relocations with
the result written back to mov[nz] and movk instructions. A typical code
sequence is:
movz x0, #:tprel_g2:foo // bits [47:32] of R_TLS with overflow check
movk x0, #:tprel_g1_nc:foo // bits [31:16] of R_TLS with no overflow check
movk x0, #:tprel_g0_nc:foo // bits [15:0] of R_TLS with no overflow check
This type of code sequence is usually used with a large code model.
Differential Revision: https://reviews.llvm.org/D65882
Fixes: PR42853
llvm-svn: 368293
There's still a need for a deeper fix to the way libDebugInfoDWARF error
messages are propagated up to lld - if lld had exited non-zero on this
error message we would've found the issue sooner.
llvm-svn: 368229
Fixes PR42759.
```
// If ifunc is taken address in -fPIC code, it may have a toc entry
.section .toc,"aw",@progbits
.quad ifunc
// ifunc may be defined as STT_GNU_IFUNC in another object file
.type ifunc, %gnu_indirect_function
```
If ifunc is non-preemptable (e.g. when linking an executable), the toc
entry will be relocated by R_PPC64_IRELATIVE.
R_*_IRELATIVE represents the symbolic value of a
non-preemptable ifunc (not associated with a canonical PLT) in a writable location. It has an unknown value at
link time, so we cannot apply toc-indirect to toc-relative relaxation.
Reviewed By: luporl, sfertile
Differential Revision: https://reviews.llvm.org/D65755
llvm-svn: 368057
The combineEhSections runs, by design, before processSectionCommands so
that input exception sections like .ARM.exidx and .eh_frame are not assigned
to OutputSections. Unfortunately if /DISCARD/ removes InputSections that
have associated .ARM.exidx sections without discarding the .ARM.exidx
synthetic section then we will end up crashing when trying to sort the
InputSections in ascending address order.
We fix this by filtering out the sections that have been discarded prior
to processing the InputSections in finalizeContents().
fixes pr42890
Differential Revision: https://reviews.llvm.org/D65759
llvm-svn: 368041
This is a case missed by D64136. If %t1.o has a weak reference on foo,
and %t2.so has a non-weak reference on foo:
```
0. ld.lld %t1.o %t2.so # ok; STB_WEAK; accepted since D64136
1. ld.lld %t2.so %t1.o # undefined symbol: foo; STB_GLOBAL
2. gold %t1.o %t2.so # ok; STB_WEAK
3. gold %t2.so %t1.o # undefined reference to 'foo'; STB_GLOBAL
4. ld.bfd %t1.o %t2.so # undefined reference to `foo'; STB_WEAK
5. ld.bfd %t2.so %t1.o # undefined reference to `foo'; STB_WEAK
```
It can be argued that in both cases, the binding of the undefined foo
should be set to STB_WEAK, because the binding should not be affected by
referenced from shared objects.
--allow-shlib-undefined doesn't suppress errors (3,4,5), but -shared or
--noinhibit-exec allows ld.bfd/gold to produce a binary:
```
3. gold -shared %t2.so %t1.o # ok; STB_GLOBAL
4. ld.bfd -shared %t2.so %t1.o # ok; STB_WEAK
5. ld.bfd -shared %t1.o %t1.o # ok; STB_WEAK
```
If %t2.so has DT_NEEDED entries, ld.bfd will load them (lld/gold don't
have the behavior). If one of the DSO defines foo and it is in the
link-time search path (e.g. DT_NEEDED entry is an absolute path, via
-rpath=, via -rpath-link=, etc),
`ld.bfd %t1.o %t2.so` and `ld.bfd %t1.o %t2.so` will not error.
In this patch, we make Undefined and SharedSymbol share the same binding
computing logic. Case 1 will be allowed:
```
0. ld.lld %t1.o %t2.so # ok; STB_WEAK; accepted since D64136
1. ld.lld %t2.so %t1.o # ok; STB_WEAK; changed by this patch
```
In the future, we can explore the option that turns both (0,1) into
errors if --no-allow-shlib-undefined (default when linking an
executable) is in action.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D65584
llvm-svn: 368038
We prioritize non-* wildcards overs VER_NDX_LOCAL/VER_NDX_GLOBAL "*".
This patch generalizes the rule to "*" of other versions and thus fixes PR40176.
I don't feel strongly about this GNU linkers' behavior but the
generalization simplifies code.
Delete `config->defaultSymbolVersion` which was used to special case
VER_NDX_LOCAL/VER_NDX_GLOBAL "*".
In `SymbolTable::scanVersionScript`, custom versions are handled the same
way as VER_NDX_LOCAL/VER_NDX_GLOBAL. So merge
`config->versionScript{Locals,Globals}` into `config->versionDefinitions`.
Overall this seems to simplify the code.
In `SymbolTable::assign{Exact,Wildcard}Versions`,
`sym->verdefIndex == config->defaultSymbolVersion` is changed to
`verdefIndex == UINT32_C(-1)`.
This allows us to give duplicate assignment diagnostics for
`{ global: foo; };` `V1 { global: foo; };`
In test/linkerscript/version-script.s:
vs_index of an undefined symbol changes from 0 to 1. This doesn't matter (arguably 1 is better because the binding is STB_GLOBAL) because vs_index of an undefined symbol is ignored.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D65716
llvm-svn: 367869
An R_*_IRELATIVE represents the address of a STT_GNU_IFUNC symbol
(redirected at runtime) which is non-preemptable and is not associated
with a canonical PLT (associated with a symbol with a section index of
SHN_UNDEF but a non-zero st_value).
.rel[a].plt [DT_JMPREL, DT_JMPREL+DT_JMPRELSZ) contains relocations that
can be lazily resolved. R_*_IRELATIVE are always eagerly resolved, so
conceptually they do not belong to .rela.plt. "iplt" is mostly a misnomer.
glibc powerpc and powerpc64 do not resolve R_*_IRELATIVE if they are in .rela.plt.
// a.o - synthesized PLT call stub has an R_*_IRELATIVE
void ifunc(); int main() { ifunc(); }
// b.o
static void real() {}
asm (".type ifunc, %gnu_indirect_function");
void *ifunc() { return ℜ }
The lld-linked executable crashes. ld.bfd places R_*_IRELATIVE in
.rela.dyn and the executable works.
glibc i386, x86_64, and aarch64 have logic
(glibc/sysdeps/*/dl-machine.h:elf_machine_lazy_rel) to eagerly resolve
R_*_IRELATIVE in .rel[a].plt so the lld-linked executable works.
Move R_*_IRELATIVE from .rel[a].plt to .rel[a].dyn to fix the crashes on
glibc powerpc/powerpc64. This also helps simplifying ifunc
implementation in FreeBSD rtld-elf powerpc64.
If --pack-dyn-relocs=android[+relr] is specified, the Android packed
dynamic relocation format is used for .rela.dyn. We cannot name
in.relaIplt ".rela.dyn" because the output section will have mixed
formats. This can be improved in the future.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D65651
llvm-svn: 367745
1. raw_ostream supports ANSI colors so that you can write messages to
the termina with colors. Previously, in order to change and reset
color, you had to call `changeColor` and `resetColor` functions,
respectively.
So, if you print out "error: " in red, for example, you had to do
something like this:
OS.changeColor(raw_ostream::RED);
OS << "error: ";
OS.resetColor();
With this patch, you can write the same code as follows:
OS << raw_ostream::RED << "error: " << raw_ostream::RESET;
2. Add a boolean flag to raw_ostream so that you can disable colored
output. If you disable colors, changeColor, operator<<(Color),
resetColor and other color-related functions have no effect.
Most LLVM tools automatically prints out messages using colors, and
you can disable it by passing a flag such as `--disable-colors`.
This new flag makes it easy to write code that works that way.
Differential Revision: https://reviews.llvm.org/D65564
llvm-svn: 367649
This patch
1) adds -z separate-code and -z noseparate-code (default).
2) changes the condition that the last page of last PF_X PT_LOAD is
padded with trap instructions.
Current condition (after D33630): if there is no `SECTIONS` commands.
After this change: if -z separate-code is specified.
-z separate-code was introduced to ld.bfd in 2018, to place the text
segment in its own pages. There is no overlap in pages between an
executable segment and a non-executable segment:
1) RX cannot load initial contents from R or RW(or non-SHF_ALLOC).
2) R and RW(or non-SHF_ALLOC) cannot load initial contents from RX.
lld's current status:
- Between R and RX: in `Writer<ELFT>::fixSectionAlignments()`, the start of a
segment is always aligned to maxPageSize, so the initial contents loaded by R
and RX do not overlap. I plan to allow overlaps in D64906 if -z noseparate-code
is in effect.
- Between RX and RW(or non-SHF_ALLOC if RW doesn't exist):
we currently unconditionally pad the last page to commonPageSize
(defaults to 4096 on all targets we support).
This patch will make it effective only if -z separate-code is specified.
-z separate-code is a dubious feature that intends to reduce the number
of ROP gadgets (which is actually ineffective because attackers can find
plenty of gadgets in the text segment, no need to find gadgets in
non-code regions).
With the overlapping PT_LOAD technique D64906, -z noseparate-code
removes two more alignments at segment boundaries than -z separate-code.
This saves at most defaultCommonPageSize*2 bytes, which are significant
on targets with large defaultCommonPageSize (AArch64/MIPS/PPC: 65536).
Issues/feedback on alignment at segment boundaries to help understand
the implication:
* binutils PR24490 (the situation on ld.bfd is worse because they have
two R-- on both sides of R-E so more alignments.)
* In binutils, the 2018-02-27 commit "ld: Add --enable-separate-code" made -z separate-code the default on Linux.
d969dea983
In musl-cross-make, binutils is configured with --disable-separate-code
to address size regressions caused by -z separate-code. (lld actually has the same
issue, which I plan to fix in a future patch. The ld.bfd x86 status is
worse because they default to max-page-size=0x200000).
* https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237676 people want
smaller code size. This patch will remove one alignment boundary.
* Stef O'Rear: I'm opposed to any kind of page alignment at the
text/rodata line (having a partial page of text aliased as rodata and
vice versa has no demonstrable harm, and I actually care about small
systems).
So, make -z noseparate-code the default.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64903
llvm-svn: 367537
That allows to remove duplicated code which subtracts 0x7000 from the
R_MIPS_TLS_TPREL_XXX relocations values in the `MIPS::relocateOne`
function.
llvm-svn: 366888
This ports r366573 from COFF to ELF.
There are now to toString(Archive::Symbol), one doing MSVC demangling
in COFF and one doing Itanium demangling in ELF, so rename these two
to toCOFFString() and to toELFString() to not get a duplicate symbol.
Nothing ever passes a raw Archive::Symbol to CHECK(), so these not
being part of the normal toString() machinery seems ok.
There are two code paths in the ELF linker that emits this type of
diagnostic:
1. The "normal" one in InputFiles.cpp. This is covered by the tweaked test.
2. An additional one that's only used for libcalls if there's at least
one bitcode in the link, and if the libcall symbol is lazy, and
lazily loaded from an archive (i.e. not from a lazy .o file).
(This code path was added in r339301.) Since all libcall names so far
are C symbols and never mangled, the change there is not observable
and hence not covered by tests.
Differential Revision: https://reviews.llvm.org/D65095
llvm-svn: 366836
lld currently selects the relocation model automatically depending on
the link flags specified, but in some cases it'd be useful to allow
explicitly overriding the relocation model using a flag.
llvm-svn: 366644
It's possible to create IR that uses !associated to refer to a global that
appears later in the module, which can result in these types of forward
references being generated. Unfortunately our assembler does not currently
accept the resulting .s so I needed to use yaml2obj to test this.
Differential Revision: https://reviews.llvm.org/D64880
llvm-svn: 366460