llvm-project/lld/test/ELF/ppc64-plt-stub.s

# REQUIRES: ppc

# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %s -o %t.o
# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/shared-ppc64.s -o %t2.o
# RUN: ld.lld -shared %t2.o -soname=t2.so -o %t2.so
# RUN: ld.lld %t.o %t2.so -o %t
# RUN: llvm-readelf -S -d %t | FileCheck --check-prefix=SEC %s
# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck %s

# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %s -o %t.o
# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %p/Inputs/shared-ppc64.s -o %t2.o
# RUN: ld.lld -shared %t2.o -soname=t2.so -o %t2.so
# RUN: ld.lld %t.o %t2.so -o %t
# RUN: llvm-readelf -S -d %t | FileCheck --check-prefix=SEC %s
# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck %s

## DT_PLTGOT points to .plt
# SEC: .plt NOBITS 00000000100303e8 0003e8 000018
# SEC: 0x0000000000000003 (PLTGOT) 0x100303e8

## .plt[0] holds the address of _dl_runtime_resolve.
## .plt[1] holds the link map.
## The JMP_SLOT relocation is stored at .plt[2]
# RELOC: 0x10030010 R_PPC64_JMP_SLOT foo 0x0

# CHECK:      <_start>:
# CHECK:      10010298: bl .+16

# CHECK-LABEL: 00000000100102a8 <__plt_foo>:
# CHECK-NEXT:      std 2, 24(1)
# CHECK-NEXT:      addis 12, 2, 1
# CHECK-NEXT:      ld 12, -32744(12)
# CHECK-NEXT:      mtctr 12
# CHECK-NEXT:      bctr


        .text
        .abiversion 2
        .globl  _start
        .p2align        4
        .type   _start,@function
_start:
.Lfunc_begin0:
.Lfunc_gep0:
  addis 2, 12, .TOC.-.Lfunc_gep0@ha
  addi 2, 2, .TOC.-.Lfunc_gep0@l
.Lfunc_lep0:
  .localentry     _start, .Lfunc_lep0-.Lfunc_gep0
  bl foo
  nop
  li 0, 1
  sc
  .size _start, .-.Lfunc_begin0
[ELF][PPC] Refactor some ppc64 tests Merge ppc64-dynamic-relocations.s into ppc64-plt-stub.s Add ppc64-tls-ie.s: covers ppc64-initial-exec-tls.s and ppc64-tls-ie-le.s Add ppc64-tls-gd.s: covers ppc64-general-dynamic-tls.s, ppc64-gd-to-ie.s, ppc64-tls-gd-le.s, and ppc64-tls-gd-le-small.s llvm-svn: 366424 2019-07-18 18:43:07 +08:00			`# REQUIRES: ppc`

			`# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %s -o %t.o`
			`# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/shared-ppc64.s -o %t2.o`
			`# RUN: ld.lld -shared %t2.o -soname=t2.so -o %t2.so`
			`# RUN: ld.lld %t.o %t2.so -o %t`
			`# RUN: llvm-readelf -S -d %t \| FileCheck --check-prefix=SEC %s`
			`# RUN: llvm-objdump -d --no-show-raw-insn %t \| FileCheck %s`

			`# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %s -o %t.o`
			`# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %p/Inputs/shared-ppc64.s -o %t2.o`
			`# RUN: ld.lld -shared %t2.o -soname=t2.so -o %t2.so`
			`# RUN: ld.lld %t.o %t2.so -o %t`
			`# RUN: llvm-readelf -S -d %t \| FileCheck --check-prefix=SEC %s`
			`# RUN: llvm-objdump -d --no-show-raw-insn %t \| FileCheck %s`

			`## DT_PLTGOT points to .plt`
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343 2019-08-20 16:34:25 +08:00			`# SEC: .plt NOBITS 00000000100303e8 0003e8 000018`
			`# SEC: 0x0000000000000003 (PLTGOT) 0x100303e8`
[ELF][PPC] Refactor some ppc64 tests Merge ppc64-dynamic-relocations.s into ppc64-plt-stub.s Add ppc64-tls-ie.s: covers ppc64-initial-exec-tls.s and ppc64-tls-ie-le.s Add ppc64-tls-gd.s: covers ppc64-general-dynamic-tls.s, ppc64-gd-to-ie.s, ppc64-tls-gd-le.s, and ppc64-tls-gd-le-small.s llvm-svn: 366424 2019-07-18 18:43:07 +08:00
			`## .plt[0] holds the address of _dl_runtime_resolve.`
			`## .plt[1] holds the link map.`
			`## The JMP_SLOT relocation is stored at .plt[2]`
			`# RELOC: 0x10030010 R_PPC64_JMP_SLOT foo 0x0`

[llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier. `.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive. Without `-LABEL`, the CHECK line can match the `Disassembly of section` line and causes the next `CHECK-NEXT:` to fail. ``` Disassembly of section .foo: 0000000000001634 .foo: ``` Bdragon: <> has metalinguistic connotation. it just "feels right" Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D75713 2020-03-06 06:18:38 +08:00			`# CHECK: <_start>:`
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343 2019-08-20 16:34:25 +08:00			`# CHECK: 10010298: bl .+16`
[ELF][PPC] Refactor some ppc64 tests Merge ppc64-dynamic-relocations.s into ppc64-plt-stub.s Add ppc64-tls-ie.s: covers ppc64-initial-exec-tls.s and ppc64-tls-ie-le.s Add ppc64-tls-gd.s: covers ppc64-general-dynamic-tls.s, ppc64-gd-to-ie.s, ppc64-tls-gd-le.s, and ppc64-tls-gd-le-small.s llvm-svn: 366424 2019-07-18 18:43:07 +08:00
[llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier. `.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive. Without `-LABEL`, the CHECK line can match the `Disassembly of section` line and causes the next `CHECK-NEXT:` to fail. ``` Disassembly of section .foo: 0000000000001634 .foo: ``` Bdragon: <> has metalinguistic connotation. it just "feels right" Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D75713 2020-03-06 06:18:38 +08:00			`# CHECK-LABEL: 00000000100102a8 <__plt_foo>:`
[ELF][PPC] Refactor some ppc64 tests Merge ppc64-dynamic-relocations.s into ppc64-plt-stub.s Add ppc64-tls-ie.s: covers ppc64-initial-exec-tls.s and ppc64-tls-ie-le.s Add ppc64-tls-gd.s: covers ppc64-general-dynamic-tls.s, ppc64-gd-to-ie.s, ppc64-tls-gd-le.s, and ppc64-tls-gd-le-small.s llvm-svn: 366424 2019-07-18 18:43:07 +08:00			`# CHECK-NEXT: std 2, 24(1)`
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343 2019-08-20 16:34:25 +08:00			`# CHECK-NEXT: addis 12, 2, 1`
			`# CHECK-NEXT: ld 12, -32744(12)`
[ELF][PPC] Refactor some ppc64 tests Merge ppc64-dynamic-relocations.s into ppc64-plt-stub.s Add ppc64-tls-ie.s: covers ppc64-initial-exec-tls.s and ppc64-tls-ie-le.s Add ppc64-tls-gd.s: covers ppc64-general-dynamic-tls.s, ppc64-gd-to-ie.s, ppc64-tls-gd-le.s, and ppc64-tls-gd-le-small.s llvm-svn: 366424 2019-07-18 18:43:07 +08:00			`# CHECK-NEXT: mtctr 12`
			`# CHECK-NEXT: bctr`
[PPC64] Emit plt call stubs to the text section rather then the plt section. On PowerPC calls to functions through the plt must be done through a call stub that is responsible for: 1) Saving the toc pointer to the stack. 2) Loading the target functions address from the plt into both r12 and the count register. 3) Indirectly branching to the target function. Previously we have been emitting these call stubs to the .plt section, however the .plt section should be reserved for the lazy symbol resolution stubs. This patch moves the call stubs to the text section by moving the implementation from writePlt to the thunk framework. Differential Revision: https://reviews.llvm.org/D46204 llvm-svn: 331607 2018-05-07 03:13:29 +08:00

[PPC64] Write plt stubs for ElfV2 abi Add the default version of a plt stub for the V2 Elf abi. Differential Revision: https://reviews.llvm.org/D44850 llvm-svn: 329004 2018-04-03 03:17:12 +08:00			`.text`
			`.abiversion 2`
			`.globl _start`
			`.p2align 4`
			`.type _start,@function`
			`_start:`
			`.Lfunc_begin0:`
			`.Lfunc_gep0:`
			`addis 2, 12, .TOC.-.Lfunc_gep0@ha`
			`addi 2, 2, .TOC.-.Lfunc_gep0@l`
			`.Lfunc_lep0:`
			`.localentry _start, .Lfunc_lep0-.Lfunc_gep0`
			`bl foo`
			`nop`
			`li 0, 1`
			`sc`
			`.size _start, .-.Lfunc_begin0`