This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
In the PPC64 target we map toc-relative relocations, dynamic thread pointer
relative relocations, and got relocations into a corresponding ADDR16 relocation
type for handling in relocateOne. This patch saves the orignal RelType before
mapping to an ADDR16 relocation so that any diagnostic messages will not
mistakenly use the mapped type.
Differential Revision: https://reviews.llvm.org/D56448
llvm-svn: 350827
Relanding r340564, original commit message:
Fixes the handling of *_DS relocations used on DQ-form instructions where we
were overwriting some of the extended opcode bits. Also adds an alignment check
so that the user will receive a diagnostic error if the value we are writing
is not properly aligned.
Differential Revision: https://reviews.llvm.org/D51124
llvm-svn: 340832
This reverts commit 5125b44dbb5d06b715213e4bec75c7346bfcc7d3.
ppc64-dq.s and ppc64-error-missaligned-dq.s fail on several of the build-bots.
Reverting to investigate.
llvm-svn: 340568
Fixes the handling of *_DS relocations used on DQ-form instructions where we
were overwriting some of the extended opcode bits. Also adds an alignment check
so that the user will receive a diagnostic error if the value we are writing
is not properly aligned.
Differential Revision: https://reviews.llvm.org/D51124
llvm-svn: 340564