2018-09-20 08:26:44 +08:00
|
|
|
# REQUIRES: ppc
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# XFAIL: *
|
2018-09-20 08:26:44 +08:00
|
|
|
|
|
|
|
# RUN: llvm-readelf -relocations --wide %p/Inputs/ppc64le-quadword-ldst.o | FileCheck --check-prefix=QuadInputRelocs %s
|
|
|
|
|
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/shared-ppc64.s -o %t2.o
|
|
|
|
# RUN: ld.lld -shared %t2.o -o %t2.so
|
|
|
|
|
|
|
|
# RUN: ld.lld %t2.so %p/Inputs/ppc64le-quadword-ldst.o -o %t
|
2019-04-23 19:47:28 +08:00
|
|
|
# RUN: llvm-objdump -d %t | FileCheck --check-prefix=Dis %s
|
2018-09-20 08:26:44 +08:00
|
|
|
|
|
|
|
# RUN: ld.lld --no-toc-optimize %t2.so %p/Inputs/ppc64le-quadword-ldst.o -o %t
|
2019-04-23 19:47:28 +08:00
|
|
|
# RUN: llvm-objdump -d %t | FileCheck --check-prefix=NoOpt %s
|
2018-09-20 08:26:44 +08:00
|
|
|
|
|
|
|
# QuadInputRelocs: Relocation section '.rela.text'
|
|
|
|
# QuadInputRelocs: R_PPC64_TOC16_LO_DS 0000000000000000 quadLd
|
|
|
|
# QuadInputRelocs: R_PPC64_TOC16_LO_DS 0000000000000010 quadSt
|
|
|
|
|
|
|
|
# The powerpc backend doesn't support the quadword load/store instructions yet.
|
|
|
|
# So they are tested by linking against an object file assembled with
|
|
|
|
# `as -mpower9 -o ppc64le-quadword-ldst.o in.s` and checking the encoding of
|
|
|
|
# the unknown instructions in the dissasembly. Source used as input:
|
|
|
|
#quads:
|
|
|
|
#.Lbegin_quads:
|
|
|
|
#.Lgep_quads:
|
|
|
|
# addis 2, 12, .TOC.-.Lgep_quads@ha
|
|
|
|
# addi 2, 2, .TOC.-.Lgep_quads@l
|
|
|
|
#.Llep_quads:
|
|
|
|
#.localentry quads, .Llep_quads-.Lgep_quads
|
|
|
|
# addis 3, 2, quadLd@toc@ha
|
|
|
|
# lq 4, quadLd@toc@l(3)
|
|
|
|
# addis 3, 2, quadSt@toc@ha
|
|
|
|
# stq 4, quadSt@toc@l(3)
|
|
|
|
# blr
|
|
|
|
#
|
|
|
|
# .p2align 4
|
|
|
|
# .global quadLd
|
|
|
|
# .lcomm quadLd, 16
|
|
|
|
#
|
|
|
|
# .global quadSt
|
|
|
|
# .lcomm quadSt, 16
|
|
|
|
|
|
|
|
|
|
|
|
# e0 82 7f 70 decodes to | 111000 | 00100 | 00010 | 16-bit imm |
|
|
|
|
# | 56 | 4 | 2 | 32624 |
|
|
|
|
# which is `lq r4, 32624(r2)`
|
|
|
|
# f8 82 7f 82 decodes to | 111110 | 00100 | 00010 | 14-bit imm | 10 |
|
|
|
|
# | 62 | 4 | 2 | 8160 | 2 |
|
|
|
|
# The immediate represents a word offset so this dissasembles to:
|
|
|
|
# `stq r4, 32640(r2)`
|
|
|
|
# Dis-LABEL: quads:
|
|
|
|
# Dis-NEXT: addis
|
|
|
|
# Dis-NEXT: addi
|
|
|
|
# Dis-NEXT: nop
|
|
|
|
# Dis-NEXT: 70 7f 82 e0 <unknown>
|
|
|
|
# Dis-NEXT: nop
|
|
|
|
# Dis-NEXT: 82 7f 82 f8 <unknown>
|
|
|
|
# Dis-NEXT: blr
|
|
|
|
|
|
|
|
# e0 83 7f 70 decodes to | 111000 | 00100 | 00011 | 16-bit imm |
|
|
|
|
# | 56 | 4 | 3 | 32624 |
|
|
|
|
# `lq r4, 32624(r3)`
|
|
|
|
# f8 83 7f 82 decodes to | 111110 | 00100 | 00010 | 14-bit imm | 10 |
|
|
|
|
# | 62 | 4 | 2 | 8160 | 2 |
|
|
|
|
# `stq r4, 32640(r3)`
|
|
|
|
# NoOpt-LABEL: quads:
|
|
|
|
# NoOpt-NEXT: addis
|
|
|
|
# NoOpt-NEXT: addi
|
|
|
|
# NoOpt-NEXT: addis 3, 2, 0
|
|
|
|
# NoOpt-NEXT: 70 7f 83 e0 <unknown>
|
|
|
|
# NoOpt-NEXT: addis 3, 2, 0
|
|
|
|
# NoOpt-NEXT: 82 7f 83 f8 <unknown>
|
|
|
|
# NoOpt-NEXT: blr
|