llvm-project/lld/test/ELF/ppc64-local-dynamic.s

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

129 lines
4.4 KiB
ArmAsm
Raw Normal View History

// REQUIRES: ppc
// RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %s -o %t.o
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343
2019-08-20 16:34:25 +08:00
// RUN: ld.lld -shared %t.o -z separate-code -o %t.so
// RUN: llvm-readelf -r %t.o | FileCheck --check-prefix=InputRelocs %s
// RUN: llvm-readelf -r %t.so | FileCheck --check-prefix=OutputRelocs %s
// RUN: llvm-objdump --section-headers %t.so | FileCheck --check-prefix=CheckGot %s
// RUN: llvm-objdump -d %t.so | FileCheck --check-prefix=Dis %s
// RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %s -o %t.o
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343
2019-08-20 16:34:25 +08:00
// RUN: ld.lld -shared %t.o -z separate-code -o %t.so
// RUN: llvm-readelf -r %t.o | FileCheck --check-prefix=InputRelocs %s
// RUN: llvm-readelf -r %t.so | FileCheck --check-prefix=OutputRelocs %s
// RUN: llvm-objdump --section-headers %t.so | FileCheck --check-prefix=CheckGot %s
// RUN: llvm-objdump -d %t.so | FileCheck --check-prefix=Dis %s
.text
.abiversion 2
.globl test
.p2align 4
.type test,@function
test:
.Lfunc_gep0:
addis 2, 12, .TOC.-.Lfunc_gep0@ha
addi 2, 2, .TOC.-.Lfunc_gep0@l
.Lfunc_lep0:
.localentry test, .Lfunc_lep0-.Lfunc_gep0
mflr 0
std 0, 16(1)
stdu 1, -32(1)
addis 3, 2, i@got@tlsld@ha
addi 3, 3, i@got@tlsld@l
bl __tls_get_addr(i@tlsld)
nop
addis 3, 3, i@dtprel@ha
lwa 3, i@dtprel@l(3)
ld 0, 16(1)
mtlr 0
blr
.globl test_hi
.p2align 4
.type test_hi,@function
test_hi:
lis 3, j@got@tlsld@h
blr
.globl test_16
.p2align 4
.type test_16,@function
test_16:
li 3, k@got@tlsld
blr
.type i,@object
.section .tdata,"awT",@progbits
.p2align 2
i:
.long 55
.size i, 4
.type j,@object
.section .tbss,"awT",@nobits
.p2align 2
j:
.long 0
.size j, 4
.type k,@object
.section .tdata,"awT",@progbits
.p2align 3
k:
.quad 66
.size k, 8
// Verify that the input contains all the R_PPC64_GOT_TLSLD16* relocations, as
// well as the DTPREL relocations used in a typical medium code model
// local-dynamic variable access.
// InputRelocs: Relocation section '.rela.text'
// InputRelocs: R_PPC64_GOT_TLSLD16_HA {{[0-9a-f]+}} i + 0
// InputRelocs: R_PPC64_GOT_TLSLD16_LO {{[0-9a-f]+}} i + 0
// InputRelocs: R_PPC64_TLSLD {{[0-9a-f]+}} i + 0
// InputRelocs: R_PPC64_DTPREL16_HA {{[0-9a-f]+}} i + 0
// InputRelocs: R_PPC64_DTPREL16_LO_DS {{[0-9a-f]+}} i + 0
// InputRelocs: R_PPC64_GOT_TLSLD16_HI {{[0-9a-f]+}} j + 0
// InputRelocs: R_PPC64_GOT_TLSLD16 {{[0-9a-f]+}} k + 0
// The local dynamic version of tls needs to use the same mechanism to look up
// a variables address as general-dynamic. ie a call to __tls_get_addr with the
// address of a tls_index struct as the argument. However for local-dynamic
// variables all will have the same ti_module, and the offset field is left as
// as 0, so the same struct can be used for every local-dynamic variable
// used in the shared-object.
// OutputRelocs: Relocation section '.rela.dyn' at offset 0x{{[0-9a-f]+}} contains 1 entries:
// OutputRelocs-NEXT: Offset Info Type Symbol's Value Symbol's Name + Addend
// OutputRelocs-NEXT: R_PPC64_DTPMOD64
// Check that the got has 3 entries, 1 for the TOC and 1 stucture of 2 entries
// for the tls variables. Also verify the address so we can check the offsets
// we calculate for each relocation type.
// CheckGot: got 00000018 0000000000020100
// got starts at 0x20100 so .TOC. will be 0x28100, and the tls_index struct is
// at 0x20108.
// #ha(i@got@tlsld) --> (0x20108 - 0x28100 + 0x8000) >> 16 = 0
// #lo(i@got@tlsld) --> (0x20108 - 0x28100) = -7ff8 = -32760
// When calculating offset relative to the dynamic thread pointer we have to
// adjust by 0x8000 since each DTV pointer points 0x8000 bytes past the start of
// its TLS block.
// #ha(i@dtprel) --> (0x0 -0x8000 + 0x8000) >> 16 = 0
// #lo(i@dtprel) --> (0x0 -0x8000) = -0x8000 = -32768
// Dis: test:
// Dis: addis 3, 2, 0
// Dis-NEXT: addi 3, 3, -32760
// Dis-NEXT: bl .+60
// Dis-NEXT: ld 2, 24(1)
// Dis-NEXT: addis 3, 3, 0
// Dis-NEXT: lwa 3, -32768(3)
// #hi(j@got@tlsld) --> (0x20108 - 0x28100 ) > 16 = -1
// Dis: test_hi:
// Dis: lis 3, -1
// k@got@tlsld --> (0x20108 - 0x28100) = -7ff8 = -32760
// Dis: test_16:
// Dis: li 3, -32760