2018-04-27 23:41:19 +08:00
|
|
|
// REQUIRES: ppc
|
2018-05-04 23:09:49 +08:00
|
|
|
|
2018-04-27 23:41:19 +08:00
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %s -o %t.o
|
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/ppc64-func-global-entry.s -o %t2.o
|
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %p/Inputs/ppc64-func-local-entry.s -o %t3.o
|
2019-12-28 05:24:41 +08:00
|
|
|
// RUN: ld.lld %t.o %t2.o %t3.o -o %t
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck %s
|
2018-05-04 23:09:49 +08:00
|
|
|
|
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %s -o %t.o
|
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %p/Inputs/ppc64-func-global-entry.s -o %t2.o
|
|
|
|
// RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %p/Inputs/ppc64-func-local-entry.s -o %t3.o
|
2019-12-28 05:24:41 +08:00
|
|
|
// RUN: ld.lld %t.o %t2.o %t3.o -o %t
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck %s
|
2018-05-04 23:09:49 +08:00
|
|
|
|
2018-04-27 23:41:19 +08:00
|
|
|
.text
|
|
|
|
.abiversion 2
|
|
|
|
.globl _start # -- Begin function _start
|
|
|
|
.p2align 4
|
|
|
|
.type _start,@function
|
|
|
|
_start: # @_start
|
|
|
|
.Lfunc_begin0:
|
|
|
|
.Lfunc_gep0:
|
|
|
|
addis 2, 12, .TOC.-.Lfunc_gep0@ha
|
|
|
|
addi 2, 2, .TOC.-.Lfunc_gep0@l
|
|
|
|
.Lfunc_lep0:
|
|
|
|
.localentry _start, .Lfunc_lep0-.Lfunc_gep0
|
|
|
|
# %bb.0: # %entry
|
|
|
|
mflr 0
|
|
|
|
std 0, 16(1)
|
|
|
|
stdu 1, -48(1)
|
|
|
|
li 3, 1
|
|
|
|
li 4, 1
|
|
|
|
std 30, 32(1) # 8-byte Folded Spill
|
|
|
|
bl foo_external_same
|
|
|
|
nop
|
|
|
|
mr 30, 3
|
|
|
|
li 3, 2
|
|
|
|
li 4, 2
|
|
|
|
bl foo_external_diff
|
|
|
|
nop
|
|
|
|
addis 4, 2, .LC0@toc@ha
|
|
|
|
add 3, 3, 30
|
|
|
|
ld 30, 32(1) # 8-byte Folded Reload
|
|
|
|
ld 4, .LC0@toc@l(4)
|
|
|
|
lwz 4, 0(4)
|
|
|
|
add 3, 3, 4
|
|
|
|
extsw 3, 3
|
|
|
|
addi 1, 1, 48
|
|
|
|
ld 0, 16(1)
|
|
|
|
li 0, 1
|
|
|
|
sc
|
|
|
|
.long 0
|
|
|
|
.quad 0
|
|
|
|
.Lfunc_end0:
|
|
|
|
.size _start, .Lfunc_end0-.Lfunc_begin0
|
|
|
|
# -- End function
|
|
|
|
.section .toc,"aw",@progbits
|
|
|
|
.LC0:
|
|
|
|
.tc glob[TC],glob
|
|
|
|
.type glob,@object # @glob
|
|
|
|
.data
|
|
|
|
.globl glob
|
|
|
|
.p2align 2
|
|
|
|
glob:
|
|
|
|
.long 10 # 0xa
|
|
|
|
.size glob, 4
|
|
|
|
|
|
|
|
# Check that foo_external_diff has a global entry point and we branch to
|
|
|
|
# foo_external_diff+8. Also check that foo_external_same has no global entry
|
|
|
|
# point and we branch to start of foo_external_same.
|
|
|
|
|
2020-03-06 06:18:38 +08:00
|
|
|
// CHECK-LABEL: <_start>:
|
2020-03-23 14:03:09 +08:00
|
|
|
// CHECK: 100101f0: bl 0x10010280
|
|
|
|
// CHECK: 10010204: bl 0x10010258
|
2020-03-06 06:18:38 +08:00
|
|
|
// CHECK-LABEL: <foo_external_diff>:
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: 10010250: addis 2, 12, 2
|
|
|
|
// CHECK-NEXT: 10010254: addi 2, 2, -32696
|
|
|
|
// CHECK-NEXT: 10010258: addis 5, 2, 1
|
2020-03-06 06:18:38 +08:00
|
|
|
// CHECK-LABEL: <foo_external_same>:
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: 10010280: add 3, 4, 3
|