Commit Graph

12 Commits

Author SHA1 Message Date
Fangrui Song 3eef47407b [PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
0: bl .-4
4: bl .+0
8: bl .+4

// llvm-objdump -d output (after) ; GNU objdump -d
0: bl 0xfffffffc / bl 0xfffffffffffffffc
4: bl 0x4
8: bl 0xc
```

Many Operand's are not annotated as OPERAND_PCREL.
They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches.

Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test
address space wraparound for powerpc32 and powerpc64.

Reviewed By: sfertile, jhenderson

Differential Revision: https://reviews.llvm.org/D76591
2020-03-26 08:32:29 -07:00
Fangrui Song 71e2ca6e32 [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:`
The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier.

`.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive.
Without `-LABEL`, the CHECK line can match the `Disassembly of section`
line and causes the next `CHECK-NEXT:` to fail.

```
Disassembly of section .foo:

0000000000001634 .foo:
```

Bdragon: <> has metalinguistic connotation. it just "feels right"

Reviewed By: rupprecht

Differential Revision: https://reviews.llvm.org/D75713
2020-03-05 18:05:28 -08:00
Fangrui Song dce7a362be [ELF] Improve the condition to create .interp
This restores commit 1417558e4a and its follow-up, reverted by commit c3dbd782f1.

After this commit:

clang -fuse-ld=bfd -no-pie -nostdlib a.c => .interp not created
clang -fuse-ld=bfd -pie -fPIE -nostdlib a.c => .interp created

clang -fuse-ld=gold -no-pie -nostdlib a.c => .interp not created
clang -fuse-ld=gold -pie -fPIE -nostdlib a.c => .interp created

clang -fuse-ld=lld -no-pie -nostdlib a.c => .interp created
clang -fuse-ld=lld -pie -fPIE -nostdlib a.c => .interp created
2019-12-27 15:34:25 -08:00
Reid Kleckner c3dbd782f1 Revert "[ELF] Improve the condition to create .interp"
This reverts commit 1417558e4a.
Also reverts commit 019a92bb28.

This causes check-sanitizer to fail. The "-Nolib" variant of the test
crashes on startup in the loader.
2019-12-27 13:05:41 -08:00
Fangrui Song 1417558e4a [ELF] Improve the condition to create .interp
Similar to rL362355, but with the `!config->shared` guard.

(1) {gcc,clang} -fuse-ld=bfd -pie -fPIE -nostdlib a.c => .interp created
(2) {gcc,clang} -fuse-ld=lld -pie -fPIE -nostdlib a.c => .interp not created
(3) {gcc,clang} -fuse-ld=lld -pie -fPIE -nostdlib a.c a.so => .interp created

The inconsistency of (2) is due to the condition `!Config->SharedFiles.empty()`.
To make lld behave more like ld.bfd, we could change the condition to:

    config->hasDynSymTab && !config->dynamicLinker.empty() && script->needsInterpSection();

However, that would bring another inconsistency as can be observed with:

(4) {gcc,clang} -fuse-ld=bfd -no-pie -nostdlib a.c => .interp not created
2019-12-26 13:26:43 -08:00
Fangrui Song 01c7f4b606 [ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.

Let me demonstrate the idea with a maxPageSize=65536 example:

When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.

Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.

ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.

Two places that rely on p_vaddr%pagesize = 0 have to be updated.

1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
  to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
  The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
  Fix them. See the updated comments in InputSection.cpp for details.

  On targets that we enable the technique (only PPC64 now),
  we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
  if `sh_addralign(.tdata) < sh_addralign(.tbss)`

  This exposes many problems in ld.so implementations, especially the
  offsets of dynamic TLS blocks. Known issues:

  FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
  glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
  musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)

  So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).

The technique will be enabled (with updated tests) for other targets in
subsequent patches.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D64906

llvm-svn: 369343
2019-08-20 08:34:25 +00:00
Fangrui Song 912251e82f [PPC64] toc-indirect to toc-relative relaxation
This is based on D54720 by Sean Fertile.

When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.

If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.

    addis 3, 2, .LC0@toc@ha  # R_PPC64_TOC16_HA
    ld    3, .LC0@toc@l(3)   # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
    ld/lwa 3, 0(3)           # load the value from the address

    .section .toc,"aw",@progbits
    .LC0: .tc var[TC],var

can be relaxed to

    addis 3,2,var@toc@ha     # this may be relaxed to a nop,
    addi  3,3,var@toc@l      # then this becomes addi 3,2,var@toc
    ld/lwa 3, 0(3)           # load the value from the address

We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s

Reviewed By: ruiu, sfertile

Differential Revision: https://reviews.llvm.org/D60958

llvm-svn: 360112
2019-05-07 04:26:05 +00:00
Fangrui Song 07f8daf05e [ELF] Simplify RelRo, TLS, NOBITS section ranks and make RW PT_LOAD start with RelRo
Old: PT_LOAD(.data | PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) | .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) | .data .bss)

The placement of | indicates page alignment caused by PT_GNU_RELRO. The
new layout has simpler rules and saves space for many cases.

Old size: roundup(.data) + roundup(.data.rel.ro)
New size: roundup(.data.rel.ro + .bss.rel.ro) + .data

Other advantages:

* At runtime the 3 memory mappings decrease to 2.
* start(PT_TLS) = start(PT_GNU_RELRO) = start(RW PT_LOAD). This
  simplifies binary manipulation tools.
  GNU strip before 2.31 discards PT_GNU_RELRO if its
  address is not equal to the start of its associated PT_LOAD.
  This has been fixed by https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f2731e0c374e5323ce4cdae2bcc7b7fe22da1a6f
  But with this change, we will be compatible with GNU strip before 2.31
* Before, .got.plt (non-relro by default) was placed before .got (relro
  by default), which made it impossible to have _GLOBAL_OFFSET_TABLE_
  (start of .got.plt on x86-64) equal to the end of .got (R_GOT*_FROM_END)
  (https://bugs.llvm.org/show_bug.cgi?id=36555). With the new ordering, we
  can improve on this regard if we'd like to.

Reviewers: ruiu, espindola, pcc

Subscribers: emaste, arichardson, llvm-commits, joerg, jdoerfert

Differential Revision: https://reviews.llvm.org/D56828

llvm-svn: 356117
2019-03-14 03:47:45 +00:00
Sean Fertile 7f3f05e0b7 [PPC64] Optimize redundant instructions in global access sequences.
The access sequence for global variables in the medium and large code models use
2 instructions to add an offset to the toc-pointer. If the offset fits whithin
16-bits then the instruction that sets the high 16 bits is redundant.

This patch adds the --toc-optimize option, (on by default) and enables rewriting
of 2 instruction global variable accesses into 1 when the offset from the
TOC-pointer to the variable (or .got entry) fits in 16 signed bits. eg

addis %r3, %r2, 0           -->     nop
addi  %r3, %r3, -0x8000     -->     addi %r3, %r2, -0x8000

This rewriting can be disabled with the --no-toc-optimize flag

Differential Revision: https://reviews.llvm.org/D49237

llvm-svn: 342602
2018-09-20 00:26:44 +00:00
Zaara Syeda 85197a0842 [PPC64] Add .toc section after .got section
PPC64 maintains a compiler managed got in the .toc section. When accessing a
global variable through got-indirect access, a .toc entry is created for the
variable. The relocation for the got-indirect access will refer to the .toc
section rather than the symbol that is actually accessed. The .toc entry
contains the address of the global variable. We evaluate the offset from
r2 (which is the TOC base) to the address of the toc entry for the global
variable. Currently, the .toc is not near the .got. This causes errors because
the offset from r2 to the toc section is too large. The linker needs to add
all the .toc input sections to the .got output section, merging the compiler
managed got with the linker got. This ensures that the offsets from the TOC
base to the toc entries are not too large.

This patch puts the .toc section right after the .got section.

Differential Revision: https://reviews.llvm.org/D45833

llvm-svn: 333199
2018-05-24 15:59:41 +00:00
Zaara Syeda f61b0733a8 [PPC64] Remove support for ELF V1 ABI in LLD
The current support for V1 ABI in LLD is incomplete.
This patch removes V1 ABI support and changes the default behavior to V2 ABI,
issuing an error when using the V1 ABI. It also updates the testcases to V2
and removes any V1 specific tests.

Differential Revision: https://reviews.llvm.org/D46316

llvm-svn: 331529
2018-05-04 15:09:49 +00:00
Zaara Syeda 82dd99e08e [PPC64] Add offset to local entry point when calling functions without plt
PPC64 V2 ABI describes two entry points to a function. The global entry point
sets up the TOC base pointer. When calling a local function, the call should
branch to the local entry point rather than the global entry point.
Section 3.4.1 describes using the 3 most significant bits of the st_other
field to find out how many instructions there are between the local and global
entry point. This patch adds the correct offset required to branch to the local
entry point of a function.

Differential Revision: https://reviews.llvm.org/D45729

llvm-svn: 331046
2018-04-27 15:41:19 +00:00