llvm-project

Commit Graph

Author	SHA1	Message	Date
Victor Huang	7b391245d8	[PowerPC] Fix thunk alignment issue when using pc-rel instruction Thunk alignment is added in thie patch when using pc-rel instructions to avoid crossing the 64 byte boundary. Patched by: nemanjai, NeHuang Reviewed By: sfertile, MaskRay Differential Revision: https://reviews.llvm.org/D85973	2020-08-17 09:09:36 -05:00
Fangrui Song	a13212582a	[ELF][test] Test that thunks are processed before finalizeSynthetic(in.symTab) finalizeSynthetic(in.symTab) calls sortSymTabSymbols() to order local symbols before non-local symbols. The newly added tests ensure that thunk symbols are added before finalizeSynthetic(in.symTab), otherwise .symtab would be out of order.	2020-04-04 10:57:37 -07:00
Fangrui Song	3eef47407b	[PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 0: bl .-4 4: bl .+0 8: bl .+4 // llvm-objdump -d output (after) ; GNU objdump -d 0: bl 0xfffffffc / bl 0xfffffffffffffffc 4: bl 0x4 8: bl 0xc ``` Many Operand's are not annotated as OPERAND_PCREL. They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches. Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test address space wraparound for powerpc32 and powerpc64. Reviewed By: sfertile, jhenderson Differential Revision: https://reviews.llvm.org/D76591	2020-03-26 08:32:29 -07:00
Fangrui Song	71e2ca6e32	[llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier. `.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive. Without `-LABEL`, the CHECK line can match the `Disassembly of section` line and causes the next `CHECK-NEXT:` to fail. ``` Disassembly of section .foo: 0000000000001634 .foo: ``` Bdragon: <> has metalinguistic connotation. it just "feels right" Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D75713	2020-03-05 18:05:28 -08:00
Fangrui Song	870094decf	[ELF] Decrease alignment of ThunkSection on 64-bit targets from 8 to 4 ThunkSection contains 4-byte instructions on all targets that use thunks. Thunks should not be used in any performance sensitive places, and locality/cache line/instruction fetching arguments should not apply. We use 16 bytes as preferred function alignments for modern PowerPC cores. In any case, 8 is not optimal. Differential Revision: https://reviews.llvm.org/D72819	2020-01-16 10:36:33 -08:00
Fangrui Song	c8f0d3e130	[ELF][PPC64] Support long branch thunks with addends Fixes PPC64 part of PR40438 // clang -target ppc64le -c a.cc // .text.unlikely may be placed in a separate output section (via -z keep-text-section-prefix) // The distance between bar in .text.unlikely and foo in .text may be larger than 32MiB. static void foo() {} __attribute__((section(".text.unlikely"))) static int bar() { foo(); return 0; } __attribute__((used)) static int dummy = bar(); This patch makes such thunks with addends work for PPC64. AArch64: .text -> `__AArch64ADRPThunk_ (adrp x16, ...; add x16, x16, ...; br x16)` -> target PPC64: .text -> `__long_branch_ (addis 12, 2, ...; ld 12, ...(12); mtctr 12; bctr)` -> target AArch64 can leverage ADRP to jump to the target directly, but PPC64 needs to load an address from .branch_lt . Before Power ISA v3.0, the PC-relative ADDPCIS was not available. .branch_lt was invented to work around the limitation. Symbol::ppc64BranchltIndex is replaced by PPC64LongBranchTargetSection::entry_index which take addends into consideration. The tests are rewritten: ppc64-long-branch.s tests -no-pie and ppc64-long-branch-pi.s tests -pie and -shared. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D70937	2019-12-05 10:17:45 -08:00
Fangrui Song	01c7f4b606	[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343	2019-08-20 08:34:25 +00:00
Fangrui Song	003c18a39c	[PPC][PPC64] Improve some llvm-objdump -d -D tests Various improvement: Some offsets in disassembly are incorrect after several layout adjustment. Fix them. llvm-objdump -D should not be used. -D dumps unrelated non-text sections. Replace them with llvm-objdump -d, llvm-readelf -x, etc Many llvm-objdump -d tests use {{.*}} . Add the option --no-show-raw-insn to avoid check hex bytes. ppc64-long-branch.s does not need a shared object. Delete it. Make ppc64-ifunc.s check 2 ifuncs. Reviewers: ruiu, espindola Subscribers: emaste, nemanjai, arichardson, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60998 llvm-svn: 358975	2019-04-23 11:47:28 +00:00
Fangrui Song	07f8daf05e	[ELF] Simplify RelRo, TLS, NOBITS section ranks and make RW PT_LOAD start with RelRo Old: PT_LOAD(.data \| PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) \| .bss) New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) \| .data .bss) The placement of \| indicates page alignment caused by PT_GNU_RELRO. The new layout has simpler rules and saves space for many cases. Old size: roundup(.data) + roundup(.data.rel.ro) New size: roundup(.data.rel.ro + .bss.rel.ro) + .data Other advantages: * At runtime the 3 memory mappings decrease to 2. * start(PT_TLS) = start(PT_GNU_RELRO) = start(RW PT_LOAD). This simplifies binary manipulation tools. GNU strip before 2.31 discards PT_GNU_RELRO if its address is not equal to the start of its associated PT_LOAD. This has been fixed by https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f2731e0c374e5323ce4cdae2bcc7b7fe22da1a6f But with this change, we will be compatible with GNU strip before 2.31 * Before, .got.plt (non-relro by default) was placed before .got (relro by default), which made it impossible to have _GLOBAL_OFFSET_TABLE_ (start of .got.plt on x86-64) equal to the end of .got (R_GOT*_FROM_END) (https://bugs.llvm.org/show_bug.cgi?id=36555). With the new ordering, we can improve on this regard if we'd like to. Reviewers: ruiu, espindola, pcc Subscribers: emaste, arichardson, llvm-commits, joerg, jdoerfert Differential Revision: https://reviews.llvm.org/D56828 llvm-svn: 356117	2019-03-14 03:47:45 +00:00
Sean Fertile	614dc11ca8	[PPC64] Long branch thunks. On PowerPC64, when a function call offset is too large to encode in a call instruction the address is stored in a table in the data segment. A thunk is used to load the branch target address from the table relative to the TOC-pointer and indirectly branch to the callee. When linking position-dependent code the addresses are stored directly in the table, for position-independent code the table is allocated and filled in at load time by the dynamic linker. For position-independent code the branch targets could have gone in the .got.plt but using the .branch_lt section for both position dependent and position independent binaries keeps it consitent and helps keep this PPC64 specific logic seperated from the target-independent code handling the .got.plt. Differential Revision: https://reviews.llvm.org/D53408 llvm-svn: 346877	2018-11-14 17:56:43 +00:00

10 Commits