llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	45acc35ac2	[ELF][PPC64] Implement IPLT code sequence for non-preemptible IFUNC Non-preemptible IFUNC are placed in in.iplt (.glink on EM_PPC64). If there is a non-GOT non-PLT relocation, for pointer equality, we change the type of the symbol from STT_IFUNC and STT_FUNC and bind it to the .glink entry. On EM_386, EM_X86_64, EM_ARM, and EM_AARCH64, the PLT code sequence loads the address from its associated .got.plt slot. An IPLT also has an associated .got.plt slot and can use the same code sequence. On EM_PPC64, the PLT code sequence is actually a bl instruction in .glink . It jumps to `__glink_PLTresolve` (the PLT header). and `__glink_PLTresolve` computes the .plt slot (relocated by R_PPC64_JUMP_SLOT). An IPLT does not have an associated R_PPC64_JUMP_SLOT, so we cannot use `bl` in .iplt . Instead, create a call stub which has a similar code sequence as PPC64PltCallStub. We don't save the TOC pointer, so such scenarios will not work: a function pointer to a non-preemptible ifunc, which resolves to a function defined in another DSO. This is the restriction described by https://sourceware.org/glibc/wiki/GNU_IFUNC (though on many architectures it works in practice): Requirement (a): Resolver must be defined in the same translation unit as the implementations. If an ifunc is taken address but not called, technically we don't need an entry for it, but we currently do that. This patch makes // clang -fuse-ld=lld -fno-pie -no-pie a.c // clang -fuse-ld=lld -fPIE -pie a.c #include <stdio.h> static void impl(void) { puts("meow"); } void thefunc(void) __attribute__((ifunc("resolver"))); void resolver(void) { return &impl; } int main(void) { thefunc(); void (theptr)(void) = &thefunc; theptr(); } work on Linux glibc and FreeBSD. Calling a function pointer pointing to a Non-preemptible IFUNC never worked before. Differential Revision: https://reviews.llvm.org/D71509	2019-12-29 22:40:03 -08:00
Fangrui Song	345f59667d	[ELF] Rename .plt to .iplt and decrease EM_PPC{,64} alignment of .glink to 4 GNU ld creates the synthetic section .iplt, and has a built-in linker script that assigns .iplt to the output section .plt . There is no output section named .iplt . Making .iplt an output section actually has a benefit that makes the tricky toolchain feature stand out. Symbolizers don't have to deal with mixed PLT entries (e.g. llvm-objdump -d incorrectly annotates such jump targets). On EM_PPC{,64}, .glink contains a PLT resolver and a series of jump instructions. The 4-byte entry size makes it unnecessary to have an alignment of 16. Mark ppc32-gnu-ifunc.s and ppc32-gnu-ifunc-nonpreemptable.s as `XFAIL: *`. They test IPLT on EM_PPC, which never works. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D71520	2019-12-17 00:15:59 -08:00
Fangrui Song	01c7f4b606	[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343	2019-08-20 08:34:25 +00:00
Fangrui Song	dc06b0bc9a	[ELF] Don't special case symbolic relocations with 0 addend to ifunc in writable locations Currently the following 3 relocation types do not trigger the creation of a canonical PLT (which changes STT_GNU_IFUNC to STT_FUNC and redirects all references): 1) GOT-generating (`needsGot`) 2) PLT-generating (`needsPlt`) 3) R_ABS with 0 addend in a writable location. This is used for for ifunc function pointers in writable sections such as .data and .toc. This patch deletes case 3) to simplify the R__IRELATIVE generating logic added in D57371. Other advantages: It is guaranteed no more than 1 R__IRELATIVE is created for an ifunc. PPC64: no need to special case ifunc in toc-indirect to toc-relative relaxation. See D65755 The deleted elf::addIRelativeRelocs demonstrates that one-pass scan through relocations makes several optimizations difficult. This is something we can think about in the future. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D65995 llvm-svn: 368661	2019-08-13 09:43:40 +00:00
Fangrui Song	c55c0598f9	[ELF][PPC] Don't relax ifunc toc-indirect accesses to toc-relative Fixes PR42759. ``` // If ifunc is taken address in -fPIC code, it may have a toc entry .section .toc,"aw",@progbits .quad ifunc // ifunc may be defined as STT_GNU_IFUNC in another object file .type ifunc, %gnu_indirect_function ``` If ifunc is non-preemptable (e.g. when linking an executable), the toc entry will be relocated by R_PPC64_IRELATIVE. R_*_IRELATIVE represents the symbolic value of a non-preemptable ifunc (not associated with a canonical PLT) in a writable location. It has an unknown value at link time, so we cannot apply toc-indirect to toc-relative relaxation. Reviewed By: luporl, sfertile Differential Revision: https://reviews.llvm.org/D65755 llvm-svn: 368057	2019-08-06 16:57:54 +00:00

5 Commits