Recommit r373168, which was reverted by r373242. This actually exposed a
boringssl bug which has been fixed for more than one month.
For the following two cases, we currently suppress the symbols. This
patch emits them (compatible with GNU as).
* `test2_a = undef`: if `undef` is otherwise unused.
* `.hidden hidden`: if `hidden` is unused. This is the main point of the
patch, because omitting the symbol would cause a linker semantic
difference.
It causes a behavior change that is not compatible with GNU as:
.weakref foo1, bar1
When neither foo1 nor bar1 is used, we now emit bar1, which is arguably
more consistent.
Another change is that we will emit .TOC. for .TOC.@tocbase . For this
directive, suppressing .TOC. can be seen as a size optimization, but we
choose to drop it for simplicity and consistency.
For the following two cases, we currently suppress the symbols. This
patch emits them (compatible with GNU as).
* `test2_a = undef`: if `undef` is otherwise unused.
* `.hidden hidden`: if `hidden` is unused. This is the main point of the
patch, because omitting the symbol would cause a linker semantic
difference.
It causes a behavior change that is not compatible with GNU as:
.weakref foo1, bar1
When neither foo1 nor bar1 is used, we now emit bar1, which is arguably
more consistent.
Another change is that we will emit .TOC. for .TOC.@tocbase . For this
directive, suppressing .TOC. can be seen as a size optimization, but we
choose to drop it for simplicity and consistency.
llvm-svn: 373168
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
Fixes https://github.com/ClangBuiltLinux/linux/issues/640
R_PPC64_REL16_HI was incorrectly computed as an R_ABS relocation.
rLLD368964 made it a linker failure. Change it to use R_PC to fix the
failures.
Add ppc64-reloc-rel.s for these R_PPC64_REL* tests.
llvm-svn: 369184
The following abstract relocation types (RelExpr) are PPC64 ELFv2 ABI specific,
not used by PPC32. So rename them to prevent confusion when the PPC32 port is improved.
* R_PPC_CALL R_PPC_CALL_PLT:
R_PPC_CALL_PLT represents R_PPC64_REL14 and R_PPC64_REL24.
If the function is not preemptable, R_PPC_CALL_PLT can be optimized to R_PPC_CALL:
the formula adjusts the symbol VA from the global entry point to the local entry point.
* R_PPC_TOC: represents R_PPC64_TOC. We don't have a test. Add one to ppc64-relocs.s
Rename it to R_PPC64_TOCBASE because `@tocbase` is the assembly form.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D62800
llvm-svn: 362359
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
This improves readability and the behavior is consistent with GNU objdump.
The new test test/tools/llvm-objdump/X86/disassemble-section-name.s
checks we print newlines before and after "Disassembly of section ...:"
Differential Revision: https://reviews.llvm.org/D61127
llvm-svn: 359668
Various improvement:
Some offsets in disassembly are incorrect after several layout adjustment. Fix them.
llvm-objdump -D should not be used. -D dumps unrelated non-text sections. Replace them with llvm-objdump -d, llvm-readelf -x, etc
Many llvm-objdump -d tests use {{.*}} . Add the option --no-show-raw-insn to avoid check hex bytes.
ppc64-long-branch.s does not need a shared object. Delete it.
Make ppc64-ifunc.s check 2 ifuncs.
Reviewers: ruiu, espindola
Subscribers: emaste, nemanjai, arichardson, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60998
llvm-svn: 358975
Summary:
Based on Peter Collingbourne's suggestion in D56828.
Before D56828: PT_LOAD(.data PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .bss)
Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)
The new layout reflects the runtime memory mappings.
By having two PT_LOAD segments, we can utilize the NOBITS part of the
first PT_LOAD and save bytes for .bss.rel.ro.
.bss.rel.ro is currently small and only used by copy relocations of
symbols in read-only segments, but it can be used for other purposes in
the future, e.g. if a relro section's statically relocated data is all
zeros, we can move it to .bss.rel.ro.
Reviewers: espindola, ruiu, pcc
Reviewed By: ruiu
Subscribers: nemanjai, jvesely, nhaehnle, javed.absar, kbarton, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58892
llvm-svn: 356226
When linking the linux kernel on ppc64 and ppc
ld.lld: error: unrecognized reloc 11
11 is PPC_REL14 and PPC64_REL14
Differential revision: https://reviews.llvm.org/D54868
llvm-svn: 348255
The access sequence for global variables in the medium and large code models use
2 instructions to add an offset to the toc-pointer. If the offset fits whithin
16-bits then the instruction that sets the high 16 bits is redundant.
This patch adds the --toc-optimize option, (on by default) and enables rewriting
of 2 instruction global variable accesses into 1 when the offset from the
TOC-pointer to the variable (or .got entry) fits in 16 signed bits. eg
addis %r3, %r2, 0 --> nop
addi %r3, %r3, -0x8000 --> addi %r3, %r2, -0x8000
This rewriting can be disabled with the --no-toc-optimize flag
Differential Revision: https://reviews.llvm.org/D49237
llvm-svn: 342602
PPC64 maintains a compiler managed got in the .toc section. When accessing a
global variable through got-indirect access, a .toc entry is created for the
variable. The relocation for the got-indirect access will refer to the .toc
section rather than the symbol that is actually accessed. The .toc entry
contains the address of the global variable. We evaluate the offset from
r2 (which is the TOC base) to the address of the toc entry for the global
variable. Currently, the .toc is not near the .got. This causes errors because
the offset from r2 to the toc section is too large. The linker needs to add
all the .toc input sections to the .got output section, merging the compiler
managed got with the linker got. This ensures that the offsets from the TOC
base to the toc entries are not too large.
This patch puts the .toc section right after the .got section.
Differential Revision: https://reviews.llvm.org/D45833
llvm-svn: 333199
The relocation R_PPC64_REL64 should return R_PC for getRelExpr since it
computes S + A - P.
Differential Revision: https://reviews.llvm.org/D46766
llvm-svn: 332259
The relocation R_PPC64_REL32 should return R_PC for getRelExpr since it
computes S + A - P.
Differential Revision: https://reviews.llvm.org/D46586
llvm-svn: 332252
The current support for V1 ABI in LLD is incomplete.
This patch removes V1 ABI support and changes the default behavior to V2 ABI,
issuing an error when using the V1 ABI. It also updates the testcases to V2
and removes any V1 specific tests.
Differential Revision: https://reviews.llvm.org/D46316
llvm-svn: 331529
This is in preparation for my next change, which will introduce a relro
nobits section. That requires that relro sections appear at the end of the
progbits part of the r/w segment so that the relro nobits section can appear
contiguously.
Because of the amount of churn required in the test suite, I'm making this
change separately.
llvm-svn: 291523