2018-06-01 02:44:12 +08:00
|
|
|
// REQUIRES: ppc
|
|
|
|
|
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %s -o %t.o
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// RUN: ld.lld -shared %t.o -z separate-code -o %t.so
|
2018-09-27 03:48:07 +08:00
|
|
|
// RUN: llvm-readelf -r %t.o | FileCheck --check-prefix=InputRelocs %s
|
|
|
|
// RUN: llvm-readelf -r %t.so | FileCheck --check-prefix=OutputRelocs %s
|
2018-06-01 02:44:12 +08:00
|
|
|
// RUN: llvm-objdump --section-headers %t.so | FileCheck --check-prefix=CheckGot %s
|
2018-08-29 10:07:58 +08:00
|
|
|
// RUN: llvm-objdump -d %t.so | FileCheck --check-prefix=Dis %s
|
2018-06-01 02:44:12 +08:00
|
|
|
|
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %s -o %t.o
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// RUN: ld.lld -shared %t.o -z separate-code -o %t.so
|
2018-09-27 03:48:07 +08:00
|
|
|
// RUN: llvm-readelf -r %t.o | FileCheck --check-prefix=InputRelocs %s
|
|
|
|
// RUN: llvm-readelf -r %t.so | FileCheck --check-prefix=OutputRelocs %s
|
2018-06-01 02:44:12 +08:00
|
|
|
// RUN: llvm-objdump --section-headers %t.so | FileCheck --check-prefix=CheckGot %s
|
2018-08-29 10:07:58 +08:00
|
|
|
// RUN: llvm-objdump -d %t.so | FileCheck --check-prefix=Dis %s
|
2018-06-01 02:44:12 +08:00
|
|
|
|
|
|
|
.text
|
|
|
|
.abiversion 2
|
|
|
|
.globl test
|
|
|
|
.p2align 4
|
|
|
|
.type test,@function
|
|
|
|
test:
|
|
|
|
.Lfunc_gep0:
|
|
|
|
addis 2, 12, .TOC.-.Lfunc_gep0@ha
|
|
|
|
addi 2, 2, .TOC.-.Lfunc_gep0@l
|
|
|
|
.Lfunc_lep0:
|
|
|
|
.localentry test, .Lfunc_lep0-.Lfunc_gep0
|
|
|
|
mflr 0
|
|
|
|
std 0, 16(1)
|
|
|
|
stdu 1, -32(1)
|
|
|
|
addis 3, 2, i@got@tlsld@ha
|
|
|
|
addi 3, 3, i@got@tlsld@l
|
|
|
|
bl __tls_get_addr(i@tlsld)
|
|
|
|
nop
|
2018-06-12 09:47:02 +08:00
|
|
|
addis 3, 3, i@dtprel@ha
|
|
|
|
lwa 3, i@dtprel@l(3)
|
2018-06-01 02:44:12 +08:00
|
|
|
ld 0, 16(1)
|
|
|
|
mtlr 0
|
|
|
|
blr
|
|
|
|
|
|
|
|
.globl test_hi
|
|
|
|
.p2align 4
|
|
|
|
.type test_hi,@function
|
|
|
|
test_hi:
|
|
|
|
lis 3, j@got@tlsld@h
|
|
|
|
blr
|
|
|
|
|
|
|
|
.globl test_16
|
|
|
|
.p2align 4
|
|
|
|
.type test_16,@function
|
|
|
|
test_16:
|
|
|
|
li 3, k@got@tlsld
|
|
|
|
blr
|
|
|
|
|
|
|
|
.type i,@object
|
|
|
|
.section .tdata,"awT",@progbits
|
|
|
|
.p2align 2
|
|
|
|
i:
|
|
|
|
.long 55
|
|
|
|
.size i, 4
|
|
|
|
|
|
|
|
.type j,@object
|
|
|
|
.section .tbss,"awT",@nobits
|
|
|
|
.p2align 2
|
|
|
|
j:
|
|
|
|
.long 0
|
|
|
|
.size j, 4
|
|
|
|
|
|
|
|
.type k,@object
|
|
|
|
.section .tdata,"awT",@progbits
|
|
|
|
.p2align 3
|
|
|
|
k:
|
|
|
|
.quad 66
|
|
|
|
.size k, 8
|
|
|
|
|
2018-06-12 09:47:02 +08:00
|
|
|
// Verify that the input contains all the R_PPC64_GOT_TLSLD16* relocations, as
|
|
|
|
// well as the DTPREL relocations used in a typical medium code model
|
|
|
|
// local-dynamic variable access.
|
2018-06-01 02:44:12 +08:00
|
|
|
// InputRelocs: Relocation section '.rela.text'
|
2018-06-12 09:47:02 +08:00
|
|
|
// InputRelocs: R_PPC64_GOT_TLSLD16_HA {{[0-9a-f]+}} i + 0
|
|
|
|
// InputRelocs: R_PPC64_GOT_TLSLD16_LO {{[0-9a-f]+}} i + 0
|
|
|
|
// InputRelocs: R_PPC64_TLSLD {{[0-9a-f]+}} i + 0
|
|
|
|
// InputRelocs: R_PPC64_DTPREL16_HA {{[0-9a-f]+}} i + 0
|
|
|
|
// InputRelocs: R_PPC64_DTPREL16_LO_DS {{[0-9a-f]+}} i + 0
|
|
|
|
// InputRelocs: R_PPC64_GOT_TLSLD16_HI {{[0-9a-f]+}} j + 0
|
|
|
|
// InputRelocs: R_PPC64_GOT_TLSLD16 {{[0-9a-f]+}} k + 0
|
2018-06-01 02:44:12 +08:00
|
|
|
|
|
|
|
// The local dynamic version of tls needs to use the same mechanism to look up
|
|
|
|
// a variables address as general-dynamic. ie a call to __tls_get_addr with the
|
|
|
|
// address of a tls_index struct as the argument. However for local-dynamic
|
|
|
|
// variables all will have the same ti_module, and the offset field is left as
|
|
|
|
// as 0, so the same struct can be used for every local-dynamic variable
|
|
|
|
// used in the shared-object.
|
|
|
|
// OutputRelocs: Relocation section '.rela.dyn' at offset 0x{{[0-9a-f]+}} contains 1 entries:
|
|
|
|
// OutputRelocs-NEXT: Offset Info Type Symbol's Value Symbol's Name + Addend
|
|
|
|
// OutputRelocs-NEXT: R_PPC64_DTPMOD64
|
|
|
|
|
2020-04-02 00:21:08 +08:00
|
|
|
// Check that the got has 3 entries, 1 for the TOC and 1 structure of 2 entries
|
2018-06-01 02:44:12 +08:00
|
|
|
// for the tls variables. Also verify the address so we can check the offsets
|
|
|
|
// we calculate for each relocation type.
|
|
|
|
// CheckGot: got 00000018 0000000000020100
|
|
|
|
|
|
|
|
// got starts at 0x20100 so .TOC. will be 0x28100, and the tls_index struct is
|
|
|
|
// at 0x20108.
|
|
|
|
|
|
|
|
// #ha(i@got@tlsld) --> (0x20108 - 0x28100 + 0x8000) >> 16 = 0
|
|
|
|
// #lo(i@got@tlsld) --> (0x20108 - 0x28100) = -7ff8 = -32760
|
2018-06-12 09:47:02 +08:00
|
|
|
// When calculating offset relative to the dynamic thread pointer we have to
|
|
|
|
// adjust by 0x8000 since each DTV pointer points 0x8000 bytes past the start of
|
|
|
|
// its TLS block.
|
|
|
|
// #ha(i@dtprel) --> (0x0 -0x8000 + 0x8000) >> 16 = 0
|
|
|
|
// #lo(i@dtprel) --> (0x0 -0x8000) = -0x8000 = -32768
|
2020-03-06 06:18:38 +08:00
|
|
|
// Dis: <test>:
|
2018-06-01 02:44:12 +08:00
|
|
|
// Dis: addis 3, 2, 0
|
|
|
|
// Dis-NEXT: addi 3, 3, -32760
|
2020-08-17 21:38:05 +08:00
|
|
|
// Dis-NEXT: bl 0x10060
|
2018-06-12 09:47:02 +08:00
|
|
|
// Dis-NEXT: ld 2, 24(1)
|
|
|
|
// Dis-NEXT: addis 3, 3, 0
|
|
|
|
// Dis-NEXT: lwa 3, -32768(3)
|
|
|
|
|
2018-06-01 02:44:12 +08:00
|
|
|
|
|
|
|
// #hi(j@got@tlsld) --> (0x20108 - 0x28100 ) > 16 = -1
|
2020-03-06 06:18:38 +08:00
|
|
|
// Dis: <test_hi>:
|
2018-06-01 02:44:12 +08:00
|
|
|
// Dis: lis 3, -1
|
|
|
|
|
|
|
|
// k@got@tlsld --> (0x20108 - 0x28100) = -7ff8 = -32760
|
2020-03-06 06:18:38 +08:00
|
|
|
// Dis: <test_16>:
|
2018-06-01 02:44:12 +08:00
|
|
|
// Dis: li 3, -32760
|