2018-06-27 00:58:19 +08:00
|
|
|
# REQUIRES: ppc
|
2019-07-18 12:54:58 +08:00
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64le-unknown-linux %s -o %t.o
|
|
|
|
# RUN: ld.lld --hash-style=sysv -discard-all -shared %t.o -o %t.so
|
|
|
|
# RUN: llvm-readobj --file-headers --sections --section-data -l %t.so | FileCheck --check-prefixes=CHECK,LE %s
|
|
|
|
|
|
|
|
# RUN: llvm-mc -filetype=obj -triple=powerpc64-unknown-linux %s -o %t.o
|
|
|
|
# RUN: ld.lld --hash-style=sysv -discard-all -shared %t.o -o %t.so
|
|
|
|
# RUN: llvm-readobj --file-headers --sections --section-data -l %t.so | FileCheck --check-prefixes=CHECK,BE %s
|
|
|
|
|
2018-03-20 01:40:14 +08:00
|
|
|
.abiversion 2
|
|
|
|
# Exits with return code 55 on linux.
|
|
|
|
.text
|
|
|
|
li 0,1
|
|
|
|
li 3,55
|
|
|
|
sc
|
|
|
|
|
|
|
|
// CHECK:Format: ELF64-ppc64
|
2019-07-18 12:54:58 +08:00
|
|
|
// LE-NEXT: Arch: powerpc64le
|
|
|
|
// BE-NEXT: Arch: powerpc64{{$}}
|
|
|
|
// CHECK-NEXT: AddressSize: 64bit
|
|
|
|
// CHECK-NEXT: LoadName:
|
|
|
|
// CHECK-NEXT: ElfHeader {
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Ident {
|
|
|
|
// CHECK-NEXT: Magic: (7F 45 4C 46)
|
|
|
|
// CHECK-NEXT: Class: 64-bit (0x2)
|
2019-07-18 12:54:58 +08:00
|
|
|
// LE-NEXT: DataEncoding: LittleEndian (0x1)
|
|
|
|
// BE-NEXT: DataEncoding: BigEndian (0x2)
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: FileVersion: 1
|
|
|
|
// CHECK-NEXT: OS/ABI: SystemV (0x0)
|
|
|
|
// CHECK-NEXT: ABIVersion: 0
|
|
|
|
// CHECK-NEXT: Unused: (00 00 00 00 00 00 00)
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Type: SharedObject (0x3)
|
|
|
|
// CHECK-NEXT: Machine: EM_PPC64 (0x15)
|
|
|
|
// CHECK-NEXT: Version: 1
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Entry: 0x1022C
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: ProgramHeaderOffset: 0x40
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: SectionHeaderOffset: 0x330
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Flags [ (0x2)
|
|
|
|
// CHECK-NEXT: 0x2
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: HeaderSize: 64
|
|
|
|
// CHECK-NEXT: ProgramHeaderEntrySize: 56
|
|
|
|
// CHECK-NEXT: ProgramHeaderCount: 7
|
|
|
|
// CHECK-NEXT: SectionHeaderEntrySize: 64
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: SectionHeaderCount: 11
|
|
|
|
// CHECK-NEXT: StringTableSectionIndex: 9
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT:}
|
|
|
|
// CHECK-NEXT:Sections [
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 0
|
|
|
|
// CHECK-NEXT: Name: (0)
|
|
|
|
// CHECK-NEXT: Type: SHT_NULL (0x0)
|
|
|
|
// CHECK-NEXT: Flags [ (0x0)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Address: 0x0
|
|
|
|
// CHECK-NEXT: Offset: 0x0
|
|
|
|
// CHECK-NEXT: Size: 0
|
|
|
|
// CHECK-NEXT: Link: 0
|
|
|
|
// CHECK-NEXT: Info: 0
|
|
|
|
// CHECK-NEXT: AddressAlignment: 0
|
|
|
|
// CHECK-NEXT: EntrySize: 0
|
|
|
|
// CHECK-NEXT: SectionData (
|
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 1
|
|
|
|
// CHECK-NEXT: Name: .dynsym (1)
|
|
|
|
// CHECK-NEXT: Type: SHT_DYNSYM (0xB)
|
|
|
|
// CHECK-NEXT: Flags [ (0x2)
|
|
|
|
// CHECK-NEXT: SHF_ALLOC (0x2)
|
|
|
|
// CHECK-NEXT: ]
|
[ELF] Split RW PT_LOAD on the PT_GNU_RELRO boundary
Summary:
Based on Peter Collingbourne's suggestion in D56828.
Before D56828: PT_LOAD(.data PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .bss)
Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)
The new layout reflects the runtime memory mappings.
By having two PT_LOAD segments, we can utilize the NOBITS part of the
first PT_LOAD and save bytes for .bss.rel.ro.
.bss.rel.ro is currently small and only used by copy relocations of
symbols in read-only segments, but it can be used for other purposes in
the future, e.g. if a relro section's statically relocated data is all
zeros, we can move it to .bss.rel.ro.
Reviewers: espindola, ruiu, pcc
Reviewed By: ruiu
Subscribers: nemanjai, jvesely, nhaehnle, javed.absar, kbarton, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58892
llvm-svn: 356226
2019-03-15 09:29:57 +08:00
|
|
|
// CHECK-NEXT: Address: 0x200
|
|
|
|
// CHECK-NEXT: Offset: 0x200
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Size: 24
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: Link: 3
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Info: 1
|
|
|
|
// CHECK-NEXT: AddressAlignment: 8
|
|
|
|
// CHECK-NEXT: EntrySize: 24
|
|
|
|
// CHECK-NEXT: SectionData (
|
|
|
|
// CHECK-NEXT: 0000: 00000000 00000000 00000000 00000000 |................|
|
|
|
|
// CHECK-NEXT: 0010: 00000000 00000000 |........|
|
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 2
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: Name: .hash (9)
|
|
|
|
// CHECK-NEXT: Type: SHT_HASH (0x5)
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Flags [ (0x2)
|
|
|
|
// CHECK-NEXT: SHF_ALLOC (0x2)
|
|
|
|
// CHECK-NEXT: ]
|
[ELF] Split RW PT_LOAD on the PT_GNU_RELRO boundary
Summary:
Based on Peter Collingbourne's suggestion in D56828.
Before D56828: PT_LOAD(.data PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .bss)
Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)
The new layout reflects the runtime memory mappings.
By having two PT_LOAD segments, we can utilize the NOBITS part of the
first PT_LOAD and save bytes for .bss.rel.ro.
.bss.rel.ro is currently small and only used by copy relocations of
symbols in read-only segments, but it can be used for other purposes in
the future, e.g. if a relro section's statically relocated data is all
zeros, we can move it to .bss.rel.ro.
Reviewers: espindola, ruiu, pcc
Reviewed By: ruiu
Subscribers: nemanjai, jvesely, nhaehnle, javed.absar, kbarton, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58892
llvm-svn: 356226
2019-03-15 09:29:57 +08:00
|
|
|
// CHECK-NEXT: Address: 0x218
|
|
|
|
// CHECK-NEXT: Offset: 0x218
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: Size: 16
|
|
|
|
// CHECK-NEXT: Link: 1
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Info: 0
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: AddressAlignment: 4
|
|
|
|
// CHECK-NEXT: EntrySize: 4
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: SectionData (
|
2019-07-18 12:54:58 +08:00
|
|
|
// LE-NEXT: 0000: 01000000 01000000 00000000 00000000
|
|
|
|
// BE-NEXT: 0000: 00000001 00000001 00000000 00000000
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 3
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: Name: .dynstr (15)
|
|
|
|
// CHECK-NEXT: Type: SHT_STRTAB (0x3)
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Flags [ (0x2)
|
|
|
|
// CHECK-NEXT: SHF_ALLOC (0x2)
|
|
|
|
// CHECK-NEXT: ]
|
[ELF] Split RW PT_LOAD on the PT_GNU_RELRO boundary
Summary:
Based on Peter Collingbourne's suggestion in D56828.
Before D56828: PT_LOAD(.data PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .bss)
Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)
The new layout reflects the runtime memory mappings.
By having two PT_LOAD segments, we can utilize the NOBITS part of the
first PT_LOAD and save bytes for .bss.rel.ro.
.bss.rel.ro is currently small and only used by copy relocations of
symbols in read-only segments, but it can be used for other purposes in
the future, e.g. if a relro section's statically relocated data is all
zeros, we can move it to .bss.rel.ro.
Reviewers: espindola, ruiu, pcc
Reviewed By: ruiu
Subscribers: nemanjai, jvesely, nhaehnle, javed.absar, kbarton, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58892
llvm-svn: 356226
2019-03-15 09:29:57 +08:00
|
|
|
// CHECK-NEXT: Address: 0x228
|
|
|
|
// CHECK-NEXT: Offset: 0x228
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: Size: 1
|
|
|
|
// CHECK-NEXT: Link: 0
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Info: 0
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: AddressAlignment: 1
|
|
|
|
// CHECK-NEXT: EntrySize: 0
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: SectionData (
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: 0000: 00 |.|
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 4
|
|
|
|
// CHECK-NEXT: Name: .text (23)
|
|
|
|
// CHECK-NEXT: Type: SHT_PROGBITS (0x1)
|
|
|
|
// CHECK-NEXT: Flags [ (0x6)
|
|
|
|
// CHECK-NEXT: SHF_ALLOC (0x2)
|
|
|
|
// CHECK-NEXT: SHF_EXECINSTR (0x4)
|
|
|
|
// CHECK-NEXT: ]
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Address: 0x1022C
|
|
|
|
// CHECK-NEXT: Offset: 0x22C
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Size: 12
|
|
|
|
// CHECK-NEXT: Link: 0
|
|
|
|
// CHECK-NEXT: Info: 0
|
|
|
|
// CHECK-NEXT: AddressAlignment: 4
|
|
|
|
// CHECK-NEXT: EntrySize: 0
|
|
|
|
// CHECK-NEXT: SectionData (
|
2019-07-18 12:54:58 +08:00
|
|
|
// LE-NEXT: 0000: 01000038 37006038 02000044
|
|
|
|
// BE-NEXT: 0000: 38000001 38600037 44000002
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 5
|
|
|
|
// CHECK-NEXT: Name: .dynamic (29)
|
|
|
|
// CHECK-NEXT: Type: SHT_DYNAMIC (0x6)
|
|
|
|
// CHECK-NEXT: Flags [ (0x3)
|
|
|
|
// CHECK-NEXT: SHF_ALLOC (0x2)
|
|
|
|
// CHECK-NEXT: SHF_WRITE (0x1)
|
|
|
|
// CHECK-NEXT: ]
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Address: 0x20238
|
|
|
|
// CHECK-NEXT: Offset: 0x238
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Size: 96
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: Link: 3
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Info: 0
|
|
|
|
// CHECK-NEXT: AddressAlignment: 8
|
|
|
|
// CHECK-NEXT: EntrySize: 16
|
|
|
|
// CHECK-NEXT: SectionData (
|
2019-07-18 12:54:58 +08:00
|
|
|
// LE-NEXT: 0000: 06000000 00000000 00020000 00000000 |
|
|
|
|
// LE-NEXT: 0010: 0B000000 00000000 18000000 00000000 |
|
|
|
|
// LE-NEXT: 0020: 05000000 00000000 28020000 00000000 |
|
|
|
|
// LE-NEXT: 0030: 0A000000 00000000 01000000 00000000 |
|
|
|
|
// LE-NEXT: 0040: 04000000 00000000 18020000 00000000 |
|
|
|
|
// LE-NEXT: 0050: 00000000 00000000 00000000 00000000 |
|
|
|
|
// BE-NEXT: 0000: 00000000 00000006 00000000 00000200 |
|
|
|
|
// BE-NEXT: 0010: 00000000 0000000B 00000000 00000018 |
|
|
|
|
// BE-NEXT: 0020: 00000000 00000005 00000000 00000228 |
|
|
|
|
// BE-NEXT: 0030: 00000000 0000000A 00000000 00000001 |
|
|
|
|
// BE-NEXT: 0040: 00000000 00000004 00000000 00000218 |
|
|
|
|
// BE-NEXT: 0050: 00000000 00000000 00000000 00000000 |
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 6
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: Name: .branch_lt (38)
|
|
|
|
// CHECK-NEXT: Type: SHT_NOBITS (0x8)
|
|
|
|
// CHECK-NEXT: Flags [ (0x3)
|
|
|
|
// CHECK-NEXT: SHF_ALLOC (0x2)
|
|
|
|
// CHECK-NEXT: SHF_WRITE (0x1)
|
|
|
|
// CHECK-NEXT: ]
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Address: 0x30298
|
|
|
|
// CHECK-NEXT: Offset: 0x298
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: Size: 0
|
|
|
|
// CHECK-NEXT: Link: 0
|
|
|
|
// CHECK-NEXT: Info: 0
|
|
|
|
// CHECK-NEXT: AddressAlignment: 8
|
|
|
|
// CHECK-NEXT: EntrySize: 0
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
|
|
|
// CHECK-NEXT: Index: 7
|
|
|
|
// CHECK-NEXT: Name: .comment (49)
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Type: SHT_PROGBITS (0x1)
|
|
|
|
// CHECK-NEXT: Flags [ (0x30)
|
|
|
|
// CHECK-NEXT: SHF_MERGE (0x10)
|
|
|
|
// CHECK-NEXT: SHF_STRINGS (0x20)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Address: 0x0
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x298
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Size: 8
|
|
|
|
// CHECK-NEXT: Link: 0
|
|
|
|
// CHECK-NEXT: Info: 0
|
|
|
|
// CHECK-NEXT: AddressAlignment: 1
|
|
|
|
// CHECK-NEXT: EntrySize: 1
|
|
|
|
// CHECK-NEXT: SectionData (
|
2018-05-16 01:02:35 +08:00
|
|
|
// CHECK-NEXT: 0000: 4C4C4420 312E3000 |LLD 1.0.|
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: Index: 8
|
|
|
|
// CHECK-NEXT: Name: .symtab (58)
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Type: SHT_SYMTAB (0x2)
|
|
|
|
// CHECK-NEXT: Flags [ (0x0)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Address: 0x0
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x2A0
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Size: 48
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: Link: 10
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Info: 2
|
|
|
|
// CHECK-NEXT: AddressAlignment: 8
|
|
|
|
// CHECK-NEXT: EntrySize: 24
|
|
|
|
// CHECK-NEXT: SectionData (
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// LE-NEXT: 0000: 00000000 00000000 00000000 00000000
|
|
|
|
// LE-NEXT: 0010: 00000000 00000000 01000000 00020500
|
|
|
|
// LE-NEXT: 0020: 38020200 00000000 00000000 00000000
|
|
|
|
// BE-NEXT: 0000: 00000000 00000000 00000000 00000000
|
|
|
|
// BE-NEXT: 0010: 00000000 00000000 00000001 00020005
|
|
|
|
// BE-NEXT: 0020: 00000000 00020238 00000000 00000000
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: Index: 9
|
|
|
|
// CHECK-NEXT: Name: .shstrtab (66)
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Type: SHT_STRTAB (0x3)
|
|
|
|
// CHECK-NEXT: Flags [ (0x0)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Address: 0x0
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x2D0
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: Size: 84
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Link: 0
|
|
|
|
// CHECK-NEXT: Info: 0
|
|
|
|
// CHECK-NEXT: AddressAlignment: 1
|
|
|
|
// CHECK-NEXT: EntrySize: 0
|
|
|
|
// CHECK-NEXT: SectionData (
|
2018-06-27 06:13:32 +08:00
|
|
|
// CHECK-NEXT: 0000: 002E6479 6E73796D 002E6861 7368002E |..dynsym..hash..|
|
|
|
|
// CHECK-NEXT: 0010: 64796E73 7472002E 74657874 002E6479 |dynstr..text..dy|
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: 0020: 6E616D69 63002E62 72616E63 685F6C74 |namic..branch_lt|
|
|
|
|
// CHECK-NEXT: 0030: 002E636F 6D6D656E 74002E73 796D7461 |..comment..symta|
|
|
|
|
// CHECK-NEXT: 0040: 62002E73 68737472 74616200 2E737472 |b..shstrtab..str|
|
|
|
|
// CHECK-NEXT: 0050: 74616200 |tab.|
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: Section {
|
2018-11-15 01:56:43 +08:00
|
|
|
// CHECK-NEXT: Index: 10
|
|
|
|
// CHECK-NEXT: Name: .strtab (76)
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Type: SHT_STRTAB (0x3)
|
|
|
|
// CHECK-NEXT: Flags [ (0x0)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Address: 0x0
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x324
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Size: 10
|
|
|
|
// CHECK-NEXT: Link: 0
|
|
|
|
// CHECK-NEXT: Info: 0
|
|
|
|
// CHECK-NEXT: AddressAlignment: 1
|
|
|
|
// CHECK-NEXT: EntrySize: 0
|
|
|
|
// CHECK-NEXT: SectionData (
|
|
|
|
// CHECK-NEXT: 0000: 005F4459 4E414D49 4300 |._DYNAMIC.|
|
|
|
|
// CHECK-NEXT: )
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT:]
|
|
|
|
// CHECK-NEXT:ProgramHeaders [
|
|
|
|
// CHECK-NEXT: ProgramHeader {
|
|
|
|
// CHECK-NEXT: Type: PT_PHDR (0x6)
|
|
|
|
// CHECK-NEXT: Offset: 0x40
|
|
|
|
// CHECK-NEXT: VirtualAddress: 0x40
|
|
|
|
// CHECK-NEXT: PhysicalAddress: 0x40
|
[ELF] Split RW PT_LOAD on the PT_GNU_RELRO boundary
Summary:
Based on Peter Collingbourne's suggestion in D56828.
Before D56828: PT_LOAD(.data PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .bss)
Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)
The new layout reflects the runtime memory mappings.
By having two PT_LOAD segments, we can utilize the NOBITS part of the
first PT_LOAD and save bytes for .bss.rel.ro.
.bss.rel.ro is currently small and only used by copy relocations of
symbols in read-only segments, but it can be used for other purposes in
the future, e.g. if a relro section's statically relocated data is all
zeros, we can move it to .bss.rel.ro.
Reviewers: espindola, ruiu, pcc
Reviewed By: ruiu
Subscribers: nemanjai, jvesely, nhaehnle, javed.absar, kbarton, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58892
llvm-svn: 356226
2019-03-15 09:29:57 +08:00
|
|
|
// CHECK-NEXT: FileSize: 448
|
|
|
|
// CHECK-NEXT: MemSize: 448
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Flags [ (0x4)
|
|
|
|
// CHECK-NEXT: PF_R (0x4)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Alignment: 8
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: ProgramHeader {
|
|
|
|
// CHECK-NEXT: Type: PT_LOAD (0x1)
|
|
|
|
// CHECK-NEXT: Offset: 0x0
|
|
|
|
// CHECK-NEXT: VirtualAddress: 0x0
|
|
|
|
// CHECK-NEXT: PhysicalAddress: 0x0
|
[ELF] Split RW PT_LOAD on the PT_GNU_RELRO boundary
Summary:
Based on Peter Collingbourne's suggestion in D56828.
Before D56828: PT_LOAD(.data PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .bss)
Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)
The new layout reflects the runtime memory mappings.
By having two PT_LOAD segments, we can utilize the NOBITS part of the
first PT_LOAD and save bytes for .bss.rel.ro.
.bss.rel.ro is currently small and only used by copy relocations of
symbols in read-only segments, but it can be used for other purposes in
the future, e.g. if a relro section's statically relocated data is all
zeros, we can move it to .bss.rel.ro.
Reviewers: espindola, ruiu, pcc
Reviewed By: ruiu
Subscribers: nemanjai, jvesely, nhaehnle, javed.absar, kbarton, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58892
llvm-svn: 356226
2019-03-15 09:29:57 +08:00
|
|
|
// CHECK-NEXT: FileSize: 553
|
|
|
|
// CHECK-NEXT: MemSize: 553
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Flags [ (0x4)
|
|
|
|
// CHECK-NEXT: PF_R (0x4)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Alignment: 65536
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: ProgramHeader {
|
|
|
|
// CHECK-NEXT: Type: PT_LOAD (0x1)
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x22C
|
|
|
|
// CHECK-NEXT: VirtualAddress: 0x1022C
|
|
|
|
// CHECK-NEXT: PhysicalAddress: 0x1022C
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: FileSize: 12
|
|
|
|
// CHECK-NEXT: MemSize: 12
|
|
|
|
// CHECK-NEXT: Flags [ (0x5)
|
|
|
|
// CHECK-NEXT: PF_R (0x4)
|
|
|
|
// CHECK-NEXT: PF_X (0x1)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Alignment: 65536
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: ProgramHeader {
|
|
|
|
// CHECK-NEXT: Type: PT_LOAD (0x1)
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x238
|
|
|
|
// CHECK-NEXT: VirtualAddress: 0x20238
|
|
|
|
// CHECK-NEXT: PhysicalAddress: 0x20238
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: FileSize: 96
|
[ELF] Split RW PT_LOAD on the PT_GNU_RELRO boundary
Summary:
Based on Peter Collingbourne's suggestion in D56828.
Before D56828: PT_LOAD(.data PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .bss)
Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)
The new layout reflects the runtime memory mappings.
By having two PT_LOAD segments, we can utilize the NOBITS part of the
first PT_LOAD and save bytes for .bss.rel.ro.
.bss.rel.ro is currently small and only used by copy relocations of
symbols in read-only segments, but it can be used for other purposes in
the future, e.g. if a relro section's statically relocated data is all
zeros, we can move it to .bss.rel.ro.
Reviewers: espindola, ruiu, pcc
Reviewed By: ruiu
Subscribers: nemanjai, jvesely, nhaehnle, javed.absar, kbarton, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58892
llvm-svn: 356226
2019-03-15 09:29:57 +08:00
|
|
|
// CHECK-NEXT: MemSize: 96
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Flags [ (0x6)
|
|
|
|
// CHECK-NEXT: PF_R (0x4)
|
|
|
|
// CHECK-NEXT: PF_W (0x2)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Alignment: 65536
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: ProgramHeader {
|
|
|
|
// CHECK-NEXT: Type: PT_DYNAMIC (0x2)
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x238
|
|
|
|
// CHECK-NEXT: VirtualAddress: 0x20238
|
|
|
|
// CHECK-NEXT: PhysicalAddress: 0x20238
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: FileSize: 96
|
|
|
|
// CHECK-NEXT: MemSize: 96
|
|
|
|
// CHECK-NEXT: Flags [ (0x6)
|
|
|
|
// CHECK-NEXT: PF_R (0x4)
|
|
|
|
// CHECK-NEXT: PF_W (0x2)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Alignment: 8
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: ProgramHeader {
|
|
|
|
// CHECK-NEXT: Type: PT_GNU_RELRO (0x6474E552)
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: Offset: 0x238
|
|
|
|
// CHECK-NEXT: VirtualAddress: 0x20238
|
|
|
|
// CHECK-NEXT: PhysicalAddress: 0x20238
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: FileSize: 96
|
[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
2019-08-20 16:34:25 +08:00
|
|
|
// CHECK-NEXT: MemSize: 3528
|
2018-03-20 01:40:14 +08:00
|
|
|
// CHECK-NEXT: Flags [ (0x4)
|
|
|
|
// CHECK-NEXT: PF_R (0x4)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Alignment: 1
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT: ProgramHeader {
|
|
|
|
// CHECK-NEXT: Type: PT_GNU_STACK (0x6474E551)
|
|
|
|
// CHECK-NEXT: Offset: 0x0
|
|
|
|
// CHECK-NEXT: VirtualAddress: 0x0
|
|
|
|
// CHECK-NEXT: PhysicalAddress: 0x0
|
|
|
|
// CHECK-NEXT: FileSize: 0
|
|
|
|
// CHECK-NEXT: MemSize: 0
|
|
|
|
// CHECK-NEXT: Flags [ (0x6)
|
|
|
|
// CHECK-NEXT: PF_R (0x4)
|
|
|
|
// CHECK-NEXT: PF_W (0x2)
|
|
|
|
// CHECK-NEXT: ]
|
|
|
|
// CHECK-NEXT: Alignment: 0
|
|
|
|
// CHECK-NEXT: }
|
|
|
|
// CHECK-NEXT:]
|