[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# REQUIRES: ppc
|
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/ppc64-toc-relax-shared.s -o %t.o
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# RUN: ld.lld -shared -soname=t.so %t.o -o %t.so
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %s -o %t1.o
|
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/ppc64-toc-relax.s -o %t2.o
|
|
|
|
# RUN: llvm-readobj -r %t1.o | FileCheck --check-prefixes=RELOCS-LE,RELOCS %s
|
|
|
|
# RUN: ld.lld %t1.o %t2.o %t.so -o %t
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# RUN: llvm-nm %t | FileCheck --check-prefix=NM %s
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck --check-prefixes=COMMON,EXE %s
|
|
|
|
|
|
|
|
# RUN: ld.lld -shared %t1.o %t2.o %t.so -o %t2.so
|
|
|
|
# RUN: llvm-objdump -d --no-show-raw-insn %t2.so | FileCheck --check-prefixes=COMMON,SHARED %s
|
|
|
|
|
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %p/Inputs/ppc64-toc-relax-shared.s -o %t.o
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# RUN: ld.lld -shared -soname=t.so %t.o -o %t.so
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %s -o %t1.o
|
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %p/Inputs/ppc64-toc-relax.s -o %t2.o
|
|
|
|
# RUN: llvm-readobj -r %t1.o | FileCheck --check-prefixes=RELOCS-BE,RELOCS %s
|
|
|
|
# RUN: ld.lld %t1.o %t2.o %t.so -o %t
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# RUN: llvm-nm %t | FileCheck --check-prefix=NM %s
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck --check-prefixes=COMMON,EXE %s
|
|
|
|
|
|
|
|
# RUN: ld.lld -shared %t1.o %t2.o %t.so -o %t2.so
|
|
|
|
# RUN: llvm-objdump -d --no-show-raw-insn %t2.so | FileCheck --check-prefixes=COMMON,SHARED %s
|
|
|
|
|
|
|
|
# RELOCS-LE: .rela.text {
|
|
|
|
# RELOCS-LE-NEXT: 0x0 R_PPC64_TOC16_HA .toc 0x0
|
|
|
|
# RELOCS-LE-NEXT: 0x4 R_PPC64_TOC16_LO_DS .toc 0x0
|
|
|
|
# RELOCS-LE-NEXT: 0xC R_PPC64_TOC16_HA .toc 0x8
|
|
|
|
# RELOCS-LE-NEXT: 0x10 R_PPC64_TOC16_LO_DS .toc 0x8
|
|
|
|
# RELOCS-LE-NEXT: 0x18 R_PPC64_TOC16_HA .toc 0x10
|
|
|
|
# RELOCS-LE-NEXT: 0x1C R_PPC64_TOC16_LO_DS .toc 0x10
|
|
|
|
# RELOCS-LE-NEXT: 0x24 R_PPC64_TOC16_HA .toc 0x18
|
|
|
|
# RELOCS-LE-NEXT: 0x28 R_PPC64_TOC16_LO_DS .toc 0x18
|
|
|
|
# RELOCS-LE-NEXT: }
|
|
|
|
|
|
|
|
# RELOCS-BE: .rela.text {
|
|
|
|
# RELOCS-BE-NEXT: 0x2 R_PPC64_TOC16_HA .toc 0x0
|
|
|
|
# RELOCS-BE-NEXT: 0x6 R_PPC64_TOC16_LO_DS .toc 0x0
|
|
|
|
# RELOCS-BE-NEXT: 0xE R_PPC64_TOC16_HA .toc 0x8
|
|
|
|
# RELOCS-BE-NEXT: 0x12 R_PPC64_TOC16_LO_DS .toc 0x8
|
|
|
|
# RELOCS-BE-NEXT: 0x1A R_PPC64_TOC16_HA .toc 0x10
|
|
|
|
# RELOCS-BE-NEXT: 0x1E R_PPC64_TOC16_LO_DS .toc 0x10
|
|
|
|
# RELOCS-BE-NEXT: 0x26 R_PPC64_TOC16_HA .toc 0x18
|
|
|
|
# RELOCS-BE-NEXT: 0x2A R_PPC64_TOC16_LO_DS .toc 0x18
|
|
|
|
# RELOCS-BE-NEXT: }
|
|
|
|
|
|
|
|
# RELOCS: .rela.toc {
|
|
|
|
# RELOCS-NEXT: 0x0 R_PPC64_ADDR64 hidden 0x0
|
|
|
|
# RELOCS-NEXT: 0x8 R_PPC64_ADDR64 hidden2 0x0
|
|
|
|
# RELOCS-NEXT: 0x10 R_PPC64_ADDR64 shared 0x0
|
|
|
|
# RELOCS-NEXT: 0x18 R_PPC64_ADDR64 default 0x0
|
|
|
|
# RELOCS-NEXT: }
|
|
|
|
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# NM-DAG: 00000000100303a0 D default
|
|
|
|
# NM-DAG: 00000000100303a0 d hidden
|
|
|
|
# NM-DAG: 00000000100403a0 d hidden2
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
|
|
|
|
# 'hidden' is non-preemptable. It is relaxed.
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# address(hidden) - (.got+0x8000) = 0x100303a0 - (0x10020380+0x8000) = (1<<16) - 32736
|
|
|
|
# COMMON: addis 3, 2, 1
|
|
|
|
# COMMON: addi 3, 3, -32736
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# COMMON: lwa 3, 0(3)
|
|
|
|
addis 3, 2, .Lhidden@toc@ha
|
|
|
|
ld 3, .Lhidden@toc@l(3)
|
|
|
|
lwa 3, 0(3)
|
|
|
|
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# address(hidden2) - (.got+0x8000) = 0x100403a0 - (0x10020380+0x8000) = (2<<16) - 32736
|
|
|
|
# COMMON: addis 3, 2, 2
|
|
|
|
# COMMON: addi 3, 3, -32736
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# COMMON: lwa 3, 0(3)
|
|
|
|
addis 3, 2, .Lhidden2@toc@ha
|
|
|
|
ld 3, .Lhidden2@toc@l(3)
|
|
|
|
lwa 3, 0(3)
|
|
|
|
|
|
|
|
# 'shared' is not defined in an object file. Its definition is determined at
|
|
|
|
# runtime by the dynamic linker, so the extra indirection cannot be relaxed.
|
|
|
|
# The first addis can still be relaxed to nop, though.
|
|
|
|
# COMMON: nop
|
|
|
|
# COMMON: ld 4, -32752(2)
|
|
|
|
# COMMON: lwa 4, 0(4)
|
|
|
|
addis 4, 2, .Lshared@toc@ha
|
|
|
|
ld 4, .Lshared@toc@l(4)
|
|
|
|
lwa 4, 0(4)
|
|
|
|
|
|
|
|
# 'default' has default visibility. It is non-preemptable when producing an executable.
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
# address(default) - (.got+0x8000) = 0x100303a0 - (0x10020380+0x8000) = (1<<16) - 32736
|
|
|
|
# EXE: addis 5, 2, 1
|
|
|
|
# EXE: addi 5, 5, -32736
|
[PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
2019-05-07 12:26:05 +08:00
|
|
|
# EXE: lwa 5, 0(5)
|
|
|
|
|
|
|
|
# SHARED: nop
|
|
|
|
# SHARED: ld 5, -32744(2)
|
|
|
|
# SHARED: lwa 5, 0(5)
|
|
|
|
addis 5, 2, .Ldefault@toc@ha
|
|
|
|
ld 5, .Ldefault@toc@l(5)
|
|
|
|
lwa 5, 0(5)
|
|
|
|
|
|
|
|
.section .toc,"aw",@progbits
|
|
|
|
.Lhidden:
|
|
|
|
.tc hidden[TC], hidden
|
|
|
|
.Lhidden2:
|
|
|
|
.tc hidden2[TC], hidden2
|
|
|
|
.Lshared:
|
|
|
|
.tc shared[TC], shared
|
|
|
|
.Ldefault:
|
|
|
|
.tc default[TC], default
|