llvm-project/lld/test/ELF/ppc64-toc-addis-nop-lqsq.s

# REQUIRES: ppc
# XFAIL: *

# RUN: llvm-readelf -relocations --wide  %p/Inputs/ppc64le-quadword-ldst.o | FileCheck --check-prefix=QuadInputRelocs %s

# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/shared-ppc64.s -o %t2.o
# RUN: ld.lld -shared %t2.o -o %t2.so

# RUN: ld.lld  %t2.so %p/Inputs/ppc64le-quadword-ldst.o -o %t
# RUN: llvm-objdump -d %t | FileCheck --check-prefix=Dis %s

# RUN: ld.lld --no-toc-optimize %t2.so %p/Inputs/ppc64le-quadword-ldst.o -o %t
# RUN: llvm-objdump -d %t | FileCheck --check-prefix=NoOpt %s

# QuadInputRelocs: Relocation section '.rela.text'
# QuadInputRelocs:  R_PPC64_TOC16_LO_DS    0000000000000000 quadLd
# QuadInputRelocs:  R_PPC64_TOC16_LO_DS    0000000000000010 quadSt

# The powerpc backend doesn't support the quadword load/store instructions yet.
# So they are tested by linking against an object file assembled with
# `as -mpower9 -o ppc64le-quadword-ldst.o in.s` and checking the encoding of
# the unknown instructions in the dissasembly. Source used as input:
#quads:
#.Lbegin_quads:
#.Lgep_quads:
#        addis 2, 12, .TOC.-.Lgep_quads@ha
#        addi  2, 2, .TOC.-.Lgep_quads@l
#.Llep_quads:
#.localentry quads, .Llep_quads-.Lgep_quads
#        addis 3, 2, quadLd@toc@ha
#        lq    4,    quadLd@toc@l(3)
#        addis 3, 2, quadSt@toc@ha
#        stq   4,    quadSt@toc@l(3)
#        blr
#
#        .p2align 4
#        .global quadLd
#        .lcomm  quadLd, 16
#
#        .global quadSt
#        .lcomm  quadSt, 16


# e0 82 7f 70 decodes to | 111000 | 00100 | 00010 | 16-bit imm |
#                        |   56   |   4   |   2   |   32624    |
# which is `lq r4, 32624(r2)`
# f8 82 7f 82 decodes to | 111110 | 00100 | 00010 | 14-bit imm | 10 |
#                        |   62   |   4   |   2   |    8160    | 2  |
# The immediate represents a word offset so this dissasembles to:
# `stq r4, 32640(r2)`
# Dis-LABEL: quads:
# Dis-NEXT:    addis
# Dis-NEXT:    addi
# Dis-NEXT:    nop
# Dis-NEXT:    70 7f 82 e0  <unknown>
# Dis-NEXT:    nop
# Dis-NEXT:    82 7f 82 f8  <unknown>
# Dis-NEXT:    blr

# e0 83 7f 70 decodes to | 111000 | 00100 | 00011 | 16-bit imm |
#                        |   56   |   4   |   3   |   32624    |
# `lq r4, 32624(r3)`
# f8 83 7f 82 decodes to | 111110 | 00100 | 00010 | 14-bit imm | 10 |
#                        |   62   |   4   |   2   |    8160    | 2  |
# `stq r4, 32640(r3)`
# NoOpt-LABEL: quads:
# NoOpt-NEXT:    addis
# NoOpt-NEXT:    addi
# NoOpt-NEXT:    addis 3, 2, 0
# NoOpt-NEXT:    70 7f 83 e0  <unknown>
# NoOpt-NEXT:    addis 3, 2, 0
# NoOpt-NEXT:    82 7f 83 f8  <unknown>
# NoOpt-NEXT:    blr
[PPC64] Optimize redundant instructions in global access sequences. The access sequence for global variables in the medium and large code models use 2 instructions to add an offset to the toc-pointer. If the offset fits whithin 16-bits then the instruction that sets the high 16 bits is redundant. This patch adds the --toc-optimize option, (on by default) and enables rewriting of 2 instruction global variable accesses into 1 when the offset from the TOC-pointer to the variable (or .got entry) fits in 16 signed bits. eg addis %r3, %r2, 0 --> nop addi %r3, %r3, -0x8000 --> addi %r3, %r2, -0x8000 This rewriting can be disabled with the --no-toc-optimize flag Differential Revision: https://reviews.llvm.org/D49237 llvm-svn: 342602 2018-09-20 08:26:44 +08:00			`# REQUIRES: ppc`
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343 2019-08-20 16:34:25 +08:00			`# XFAIL: *`
[PPC64] Optimize redundant instructions in global access sequences. The access sequence for global variables in the medium and large code models use 2 instructions to add an offset to the toc-pointer. If the offset fits whithin 16-bits then the instruction that sets the high 16 bits is redundant. This patch adds the --toc-optimize option, (on by default) and enables rewriting of 2 instruction global variable accesses into 1 when the offset from the TOC-pointer to the variable (or .got entry) fits in 16 signed bits. eg addis %r3, %r2, 0 --> nop addi %r3, %r3, -0x8000 --> addi %r3, %r2, -0x8000 This rewriting can be disabled with the --no-toc-optimize flag Differential Revision: https://reviews.llvm.org/D49237 llvm-svn: 342602 2018-09-20 08:26:44 +08:00
			`# RUN: llvm-readelf -relocations --wide %p/Inputs/ppc64le-quadword-ldst.o \| FileCheck --check-prefix=QuadInputRelocs %s`

			`# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/shared-ppc64.s -o %t2.o`
			`# RUN: ld.lld -shared %t2.o -o %t2.so`

			`# RUN: ld.lld %t2.so %p/Inputs/ppc64le-quadword-ldst.o -o %t`
[PPC][PPC64] Improve some llvm-objdump -d -D tests Various improvement: Some offsets in disassembly are incorrect after several layout adjustment. Fix them. llvm-objdump -D should not be used. -D dumps unrelated non-text sections. Replace them with llvm-objdump -d, llvm-readelf -x, etc Many llvm-objdump -d tests use {{.*}} . Add the option --no-show-raw-insn to avoid check hex bytes. ppc64-long-branch.s does not need a shared object. Delete it. Make ppc64-ifunc.s check 2 ifuncs. Reviewers: ruiu, espindola Subscribers: emaste, nemanjai, arichardson, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60998 llvm-svn: 358975 2019-04-23 19:47:28 +08:00			`# RUN: llvm-objdump -d %t \| FileCheck --check-prefix=Dis %s`
[PPC64] Optimize redundant instructions in global access sequences. The access sequence for global variables in the medium and large code models use 2 instructions to add an offset to the toc-pointer. If the offset fits whithin 16-bits then the instruction that sets the high 16 bits is redundant. This patch adds the --toc-optimize option, (on by default) and enables rewriting of 2 instruction global variable accesses into 1 when the offset from the TOC-pointer to the variable (or .got entry) fits in 16 signed bits. eg addis %r3, %r2, 0 --> nop addi %r3, %r3, -0x8000 --> addi %r3, %r2, -0x8000 This rewriting can be disabled with the --no-toc-optimize flag Differential Revision: https://reviews.llvm.org/D49237 llvm-svn: 342602 2018-09-20 08:26:44 +08:00
			`# RUN: ld.lld --no-toc-optimize %t2.so %p/Inputs/ppc64le-quadword-ldst.o -o %t`
[PPC][PPC64] Improve some llvm-objdump -d -D tests Various improvement: Some offsets in disassembly are incorrect after several layout adjustment. Fix them. llvm-objdump -D should not be used. -D dumps unrelated non-text sections. Replace them with llvm-objdump -d, llvm-readelf -x, etc Many llvm-objdump -d tests use {{.*}} . Add the option --no-show-raw-insn to avoid check hex bytes. ppc64-long-branch.s does not need a shared object. Delete it. Make ppc64-ifunc.s check 2 ifuncs. Reviewers: ruiu, espindola Subscribers: emaste, nemanjai, arichardson, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60998 llvm-svn: 358975 2019-04-23 19:47:28 +08:00			`# RUN: llvm-objdump -d %t \| FileCheck --check-prefix=NoOpt %s`
[PPC64] Optimize redundant instructions in global access sequences. The access sequence for global variables in the medium and large code models use 2 instructions to add an offset to the toc-pointer. If the offset fits whithin 16-bits then the instruction that sets the high 16 bits is redundant. This patch adds the --toc-optimize option, (on by default) and enables rewriting of 2 instruction global variable accesses into 1 when the offset from the TOC-pointer to the variable (or .got entry) fits in 16 signed bits. eg addis %r3, %r2, 0 --> nop addi %r3, %r3, -0x8000 --> addi %r3, %r2, -0x8000 This rewriting can be disabled with the --no-toc-optimize flag Differential Revision: https://reviews.llvm.org/D49237 llvm-svn: 342602 2018-09-20 08:26:44 +08:00
			`# QuadInputRelocs: Relocation section '.rela.text'`
			`# QuadInputRelocs: R_PPC64_TOC16_LO_DS 0000000000000000 quadLd`
			`# QuadInputRelocs: R_PPC64_TOC16_LO_DS 0000000000000010 quadSt`

			`# The powerpc backend doesn't support the quadword load/store instructions yet.`
			`# So they are tested by linking against an object file assembled with`
			# `as -mpower9 -o ppc64le-quadword-ldst.o in.s` and checking the encoding of
			`# the unknown instructions in the dissasembly. Source used as input:`
			`#quads:`
			`#.Lbegin_quads:`
			`#.Lgep_quads:`
			`# addis 2, 12, .TOC.-.Lgep_quads@ha`
			`# addi 2, 2, .TOC.-.Lgep_quads@l`
			`#.Llep_quads:`
			`#.localentry quads, .Llep_quads-.Lgep_quads`
			`# addis 3, 2, quadLd@toc@ha`
			`# lq 4, quadLd@toc@l(3)`
			`# addis 3, 2, quadSt@toc@ha`
			`# stq 4, quadSt@toc@l(3)`
			`# blr`
			`#`
			`# .p2align 4`
			`# .global quadLd`
			`# .lcomm quadLd, 16`
			`#`
			`# .global quadSt`
			`# .lcomm quadSt, 16`


			`# e0 82 7f 70 decodes to \| 111000 \| 00100 \| 00010 \| 16-bit imm \|`
			`# \| 56 \| 4 \| 2 \| 32624 \|`
			# which is `lq r4, 32624(r2)`
			`# f8 82 7f 82 decodes to \| 111110 \| 00100 \| 00010 \| 14-bit imm \| 10 \|`
			`# \| 62 \| 4 \| 2 \| 8160 \| 2 \|`
			`# The immediate represents a word offset so this dissasembles to:`
			# `stq r4, 32640(r2)`
			`# Dis-LABEL: quads:`
			`# Dis-NEXT: addis`
			`# Dis-NEXT: addi`
			`# Dis-NEXT: nop`
			`# Dis-NEXT: 70 7f 82 e0 <unknown>`
			`# Dis-NEXT: nop`
			`# Dis-NEXT: 82 7f 82 f8 <unknown>`
			`# Dis-NEXT: blr`

			`# e0 83 7f 70 decodes to \| 111000 \| 00100 \| 00011 \| 16-bit imm \|`
			`# \| 56 \| 4 \| 3 \| 32624 \|`
			# `lq r4, 32624(r3)`
			`# f8 83 7f 82 decodes to \| 111110 \| 00100 \| 00010 \| 14-bit imm \| 10 \|`
			`# \| 62 \| 4 \| 2 \| 8160 \| 2 \|`
			# `stq r4, 32640(r3)`
			`# NoOpt-LABEL: quads:`
			`# NoOpt-NEXT: addis`
			`# NoOpt-NEXT: addi`
			`# NoOpt-NEXT: addis 3, 2, 0`
			`# NoOpt-NEXT: 70 7f 83 e0 <unknown>`
			`# NoOpt-NEXT: addis 3, 2, 0`
			`# NoOpt-NEXT: 82 7f 83 f8 <unknown>`
			`# NoOpt-NEXT: blr`