llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	cf783be8d7	Reland D114783/D115603 [ELF] Split scanRelocations into scanRelocations/postScanRelocations (Fixed an issue about GOT on a copy relocated alias.) (Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.) The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc) and postpone the real work to postScanRelocations. It gives some flexibility: * Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed. * Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot * -z nocopyrel: report all copy relocation places for one symbol * Make GOT deduplication feasible * Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice Since this patch moves a large chunk of code out of ELFT templates. My x86-64 executable is actually a few hundred bytes smaller. For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc because absolute relocation references are incorrect in -fpie mode. Reviewed By: peter.smith, ikudrin Differential Revision: https://reviews.llvm.org/D114783	2021-12-14 16:28:41 -08:00
Fangrui Song	ea15b862d7	Revert D114783 [ELF] Split scanRelocations into scanRelocations/postScanRelocations May cause a failure for non-preemptible `bcmp` in a glibc -static link.	2021-12-14 14:33:50 -08:00
Fangrui Song	e7a95b0674	Reland [ELF] Split scanRelocations into scanRelocations/postScanRelocations (Fixed an issue about GOT on a copy relocated alias.) The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc) and postpone the real work to postScanRelocations. It gives some flexibility: * Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed. * Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot * -z nocopyrel: report all copy relocation places for one symbol * Make GOT deduplication feasible * Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice Since this patch moves a large chunk of code out of ELFT templates. My x86-64 executable is actually a few hundred bytes smaller. For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc because absolute relocation references are incorrect in -fpie mode. Reviewed By: peter.smith, ikudrin Differential Revision: https://reviews.llvm.org/D114783	2021-12-13 20:11:24 -08:00
Fangrui Song	0b8b86e30f	Revert "[ELF] Split scanRelocations into scanRelocations/postScanRelocations" This reverts commit `fc33861d48`. `replaceWithDefined` should copy needsGot, otherwise an alias for a copy relocated symbol may not have GOT entry if its needsGot was originally true.	2021-12-13 19:29:53 -08:00
Fangrui Song	fc33861d48	[ELF] Split scanRelocations into scanRelocations/postScanRelocations The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc) and postpone the real work to postScanRelocations. It gives some flexibility: * Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed. * Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot * -z nocopyrel: report all copy relocation places for one symbol * Make parallel relocation scanning possible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice * Make GOT deduplication feasible Since this patch moves a large chunk of code out of ELFT templates. My x86-64 executable is actually a few hundred bytes smaller. For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc because absolute relocation references are incorrect in -fpie mode. Reviewed By: peter.smith, ikudrin Differential Revision: https://reviews.llvm.org/D114783	2021-12-13 09:56:52 -08:00
Victor Huang	7b391245d8	[PowerPC] Fix thunk alignment issue when using pc-rel instruction Thunk alignment is added in thie patch when using pc-rel instructions to avoid crossing the 64 byte boundary. Patched by: nemanjai, NeHuang Reviewed By: sfertile, MaskRay Differential Revision: https://reviews.llvm.org/D85973	2020-08-17 09:09:36 -05:00
Fangrui Song	3eef47407b	[PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 0: bl .-4 4: bl .+0 8: bl .+4 // llvm-objdump -d output (after) ; GNU objdump -d 0: bl 0xfffffffc / bl 0xfffffffffffffffc 4: bl 0x4 8: bl 0xc ``` Many Operand's are not annotated as OPERAND_PCREL. They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches. Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test address space wraparound for powerpc32 and powerpc64. Reviewed By: sfertile, jhenderson Differential Revision: https://reviews.llvm.org/D76591	2020-03-26 08:32:29 -07:00
Fangrui Song	71e2ca6e32	[llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier. `.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive. Without `-LABEL`, the CHECK line can match the `Disassembly of section` line and causes the next `CHECK-NEXT:` to fail. ``` Disassembly of section .foo: 0000000000001634 .foo: ``` Bdragon: <> has metalinguistic connotation. it just "feels right" Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D75713	2020-03-05 18:05:28 -08:00
Fangrui Song	870094decf	[ELF] Decrease alignment of ThunkSection on 64-bit targets from 8 to 4 ThunkSection contains 4-byte instructions on all targets that use thunks. Thunks should not be used in any performance sensitive places, and locality/cache line/instruction fetching arguments should not apply. We use 16 bytes as preferred function alignments for modern PowerPC cores. In any case, 8 is not optimal. Differential Revision: https://reviews.llvm.org/D72819	2020-01-16 10:36:33 -08:00
Fangrui Song	45acc35ac2	[ELF][PPC64] Implement IPLT code sequence for non-preemptible IFUNC Non-preemptible IFUNC are placed in in.iplt (.glink on EM_PPC64). If there is a non-GOT non-PLT relocation, for pointer equality, we change the type of the symbol from STT_IFUNC and STT_FUNC and bind it to the .glink entry. On EM_386, EM_X86_64, EM_ARM, and EM_AARCH64, the PLT code sequence loads the address from its associated .got.plt slot. An IPLT also has an associated .got.plt slot and can use the same code sequence. On EM_PPC64, the PLT code sequence is actually a bl instruction in .glink . It jumps to `__glink_PLTresolve` (the PLT header). and `__glink_PLTresolve` computes the .plt slot (relocated by R_PPC64_JUMP_SLOT). An IPLT does not have an associated R_PPC64_JUMP_SLOT, so we cannot use `bl` in .iplt . Instead, create a call stub which has a similar code sequence as PPC64PltCallStub. We don't save the TOC pointer, so such scenarios will not work: a function pointer to a non-preemptible ifunc, which resolves to a function defined in another DSO. This is the restriction described by https://sourceware.org/glibc/wiki/GNU_IFUNC (though on many architectures it works in practice): Requirement (a): Resolver must be defined in the same translation unit as the implementations. If an ifunc is taken address but not called, technically we don't need an entry for it, but we currently do that. This patch makes // clang -fuse-ld=lld -fno-pie -no-pie a.c // clang -fuse-ld=lld -fPIE -pie a.c #include <stdio.h> static void impl(void) { puts("meow"); } void thefunc(void) __attribute__((ifunc("resolver"))); void resolver(void) { return &impl; } int main(void) { thefunc(); void (theptr)(void) = &thefunc; theptr(); } work on Linux glibc and FreeBSD. Calling a function pointer pointing to a Non-preemptible IFUNC never worked before. Differential Revision: https://reviews.llvm.org/D71509	2019-12-29 22:40:03 -08:00
Fangrui Song	ba1cdba4c4	[llvm-nm] Display STT_GNU_IFUNC as 'i' Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D71803	2019-12-25 09:47:53 -08:00
Fangrui Song	01c7f4b606	[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343	2019-08-20 08:34:25 +00:00
Fangrui Song	25ab1c6471	[ELF] Move R__IRELATIVE from .rel[a].plt to .rel[a].dyn unless --pack-dyn-relocs=android[+relr] An R__IRELATIVE represents the address of a STT_GNU_IFUNC symbol (redirected at runtime) which is non-preemptable and is not associated with a canonical PLT (associated with a symbol with a section index of SHN_UNDEF but a non-zero st_value). .rel[a].plt [DT_JMPREL, DT_JMPREL+DT_JMPRELSZ) contains relocations that can be lazily resolved. R__IRELATIVE are always eagerly resolved, so conceptually they do not belong to .rela.plt. "iplt" is mostly a misnomer. glibc powerpc and powerpc64 do not resolve R__IRELATIVE if they are in .rela.plt. // a.o - synthesized PLT call stub has an R__IRELATIVE void ifunc(); int main() { ifunc(); } // b.o static void real() {} asm (".type ifunc, %gnu_indirect_function"); void ifunc() { return &real; } The lld-linked executable crashes. ld.bfd places R__IRELATIVE in .rela.dyn and the executable works. glibc i386, x86_64, and aarch64 have logic (glibc/sysdeps//dl-machine.h:elf_machine_lazy_rel) to eagerly resolve R__IRELATIVE in .rel[a].plt so the lld-linked executable works. Move R__IRELATIVE from .rel[a].plt to .rel[a].dyn to fix the crashes on glibc powerpc/powerpc64. This also helps simplifying ifunc implementation in FreeBSD rtld-elf powerpc64. If --pack-dyn-relocs=android[+relr] is specified, the Android packed dynamic relocation format is used for .rela.dyn. We cannot name in.relaIplt ".rela.dyn" because the output section will have mixed formats. This can be improved in the future. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D65651 llvm-svn: 367745	2019-08-03 02:26:52 +00:00
Fangrui Song	6cdd68e386	[PPC64] Define getThunkSectionSpacing() based on the range of R_PPC64_REL24 Suggested by Sean Fertile and Peter Smith. Thunk section spacing decrease the total number of thunks. I measured a decrease of 1% or less in some large programs, with no perceivable slowdown in link time. Override getThunkSectionSpacing() to enable it. 0x2000000 is the farthest point R_PPC64_REL24 can reach. I tried several numbers and found 0x2000000 works the best. Numbers near 0x2000000 work as well but let's just use the simpler number. As demonstrated by the updated tests, this essentially changes placement of most thunks to the end of the output section. We leverage this property to fix PR40740 reported by Alfredo Dal'Ava Júnior: The output section .init consists of input sections from several object files (crti.o crtbegin.o crtend.o crtn.o). Sections other than the last one do not have a terminator. With this patch, we create the thunk after the last .init input section and thus fix the issue. This is not foolproof but works quite well for such sections (with no terminator) in practice. Reviewed By: ruiu, sfertile Differential Revision: https://reviews.llvm.org/D61720 llvm-svn: 360405	2019-05-10 05:51:00 +00:00
Fangrui Song	003c18a39c	[PPC][PPC64] Improve some llvm-objdump -d -D tests Various improvement: Some offsets in disassembly are incorrect after several layout adjustment. Fix them. llvm-objdump -D should not be used. -D dumps unrelated non-text sections. Replace them with llvm-objdump -d, llvm-readelf -x, etc Many llvm-objdump -d tests use {{.*}} . Add the option --no-show-raw-insn to avoid check hex bytes. ppc64-long-branch.s does not need a shared object. Delete it. Make ppc64-ifunc.s check 2 ifuncs. Reviewers: ruiu, espindola Subscribers: emaste, nemanjai, arichardson, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60998 llvm-svn: 358975	2019-04-23 11:47:28 +00:00
Sean Fertile	0205828be4	[PPC64] Update tests to reflect change in printing of call operand. [NFC] The printing of branch operands for call instructions was changed to properly handle negative offsets. Updating the tests to reflect that. llvm-svn: 353866	2019-02-12 17:49:04 +00:00
Sean Fertile	4b25b3e4a6	Fix regex in lld ppc64-ifunc test. One of the regexes for reading in a hex address was missing the a-f part. llvm-svn: 333209	2018-05-24 17:07:16 +00:00
Sean Fertile	fd69969e54	[PPC64] Set the IRelative relocation type. Set the IRelative relocation type and extend the related test to verify. Differential Revision: https://reviews.llvm.org/D46877 llvm-svn: 333203	2018-05-24 16:32:14 +00:00
Sean Fertile	49914cc807	[PPC64] Add lazy symbol resolution stubs. Adds support for .glink resolver stubs from the example implementation in the V2 ABI (Section 4.2.5.3. Procedure Linkage Table). The stubs are written to the PltSection, and the sections are renamed to match the PPC64 ABI: .got.plt --> .plt Type = SHT_NOBITS .plt --> .glink And adds the DT_PPC64_GLINK dynamic tag to the dynamic section when the plt is not empty. Differential Revision: https://reviews.llvm.org/D45642 llvm-svn: 331840	2018-05-09 02:07:53 +00:00
Sean Fertile	d2e887d2f6	[PPC64] Emit plt call stubs to the text section rather then the plt section. On PowerPC calls to functions through the plt must be done through a call stub that is responsible for: 1) Saving the toc pointer to the stack. 2) Loading the target functions address from the plt into both r12 and the count register. 3) Indirectly branching to the target function. Previously we have been emitting these call stubs to the .plt section, however the .plt section should be reserved for the lazy symbol resolution stubs. This patch moves the call stubs to the text section by moving the implementation from writePlt to the thunk framework. Differential Revision: https://reviews.llvm.org/D46204 llvm-svn: 331607	2018-05-06 19:13:29 +00:00
Zaara Syeda	f61b0733a8	[PPC64] Remove support for ELF V1 ABI in LLD The current support for V1 ABI in LLD is incomplete. This patch removes V1 ABI support and changes the default behavior to V2 ABI, issuing an error when using the V1 ABI. It also updates the testcases to V2 and removes any V1 specific tests. Differential Revision: https://reviews.llvm.org/D46316 llvm-svn: 331529	2018-05-04 15:09:49 +00:00
Zaara Syeda	25b488b0ea	[PPC64] Fix toc restore nops offset for V2 ABI The PPC64 V2 ABI restores the toc base by loading from an offset of 24 from r1. This patch fixes the offset and updates the testcases from V1 to V2. It also issues an error when a nop is missing after a call to an external function. Differential Revision: https://reviews.llvm.org/D45892 llvm-svn: 330600	2018-04-23 15:01:24 +00:00
George Rimar	9fc2c64b35	[ELF] - Do not use HeaderSize for conditions in PltSection. Previously we checked (HeaderSize == 0) to find out if PltSection section is IPLT or PLT. Some targets does not set HeaderSize though. For example PPC64 has no lazy binding implemented and does not set PltHeaderSize constant. Because of that using of both IPLT and PLT relocations worked incorrectly there (testcase is provided). Patch fixes the issue. Differential revision: https://reviews.llvm.org/D41613 llvm-svn: 322362	2018-01-12 09:35:57 +00:00

23 Commits