Commit Graph

10 Commits

Author SHA1 Message Date
Fangrui Song 85adce3d73 [PPCInstPrinter] Change B to print the target address in hexadecimal form
Follow-up of D76591 and D76907
2020-04-01 22:38:24 -07:00
Fangrui Song 3eef47407b [PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
0: bl .-4
4: bl .+0
8: bl .+4

// llvm-objdump -d output (after) ; GNU objdump -d
0: bl 0xfffffffc / bl 0xfffffffffffffffc
4: bl 0x4
8: bl 0xc
```

Many Operand's are not annotated as OPERAND_PCREL.
They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches.

Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test
address space wraparound for powerpc32 and powerpc64.

Reviewed By: sfertile, jhenderson

Differential Revision: https://reviews.llvm.org/D76591
2020-03-26 08:32:29 -07:00
Fangrui Song 71e2ca6e32 [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:`
The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier.

`.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive.
Without `-LABEL`, the CHECK line can match the `Disassembly of section`
line and causes the next `CHECK-NEXT:` to fail.

```
Disassembly of section .foo:

0000000000001634 .foo:
```

Bdragon: <> has metalinguistic connotation. it just "feels right"

Reviewed By: rupprecht

Differential Revision: https://reviews.llvm.org/D75713
2020-03-05 18:05:28 -08:00
Fangrui Song 01c7f4b606 [ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.

Let me demonstrate the idea with a maxPageSize=65536 example:

When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.

Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.

ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.

Two places that rely on p_vaddr%pagesize = 0 have to be updated.

1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
  to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
  The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
  Fix them. See the updated comments in InputSection.cpp for details.

  On targets that we enable the technique (only PPC64 now),
  we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
  if `sh_addralign(.tdata) < sh_addralign(.tbss)`

  This exposes many problems in ld.so implementations, especially the
  offsets of dynamic TLS blocks. Known issues:

  FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
  glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
  musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)

  So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).

The technique will be enabled (with updated tests) for other targets in
subsequent patches.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D64906

llvm-svn: 369343
2019-08-20 08:34:25 +00:00
Fangrui Song 6cdd68e386 [PPC64] Define getThunkSectionSpacing() based on the range of R_PPC64_REL24
Suggested by Sean Fertile and Peter Smith.

Thunk section spacing decrease the total number of thunks. I measured a
decrease of 1% or less in some large programs, with no perceivable
slowdown in link time. Override getThunkSectionSpacing() to enable it.
0x2000000 is the farthest point R_PPC64_REL24 can reach. I tried several
numbers and found 0x2000000 works the best. Numbers near 0x2000000 work
as well but let's just use the simpler number.

As demonstrated by the updated tests, this essentially changes placement
of most thunks to the end of the output section. We leverage this
property to fix PR40740 reported by Alfredo Dal'Ava Júnior:

The output section .init consists of input sections from several object
files (crti.o crtbegin.o crtend.o crtn.o). Sections other than the last
one do not have a terminator. With this patch, we create the thunk after
the last .init input section and thus fix the issue. This is not
foolproof but works quite well for such sections (with no terminator) in
practice.

Reviewed By: ruiu, sfertile

Differential Revision: https://reviews.llvm.org/D61720

llvm-svn: 360405
2019-05-10 05:51:00 +00:00
Fangrui Song 003c18a39c [PPC][PPC64] Improve some llvm-objdump -d -D tests
Various improvement:

Some offsets in disassembly are incorrect after several layout adjustment. Fix them.
llvm-objdump -D should not be used. -D dumps unrelated non-text sections. Replace them with llvm-objdump -d, llvm-readelf -x, etc
Many llvm-objdump -d tests use {{.*}} . Add the option --no-show-raw-insn to avoid check hex bytes.

ppc64-long-branch.s does not need a shared object. Delete it.
Make ppc64-ifunc.s check 2 ifuncs.

Reviewers: ruiu, espindola

Subscribers: emaste, nemanjai, arichardson, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60998

llvm-svn: 358975
2019-04-23 11:47:28 +00:00
Fangrui Song 07f8daf05e [ELF] Simplify RelRo, TLS, NOBITS section ranks and make RW PT_LOAD start with RelRo
Old: PT_LOAD(.data | PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) | .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) | .data .bss)

The placement of | indicates page alignment caused by PT_GNU_RELRO. The
new layout has simpler rules and saves space for many cases.

Old size: roundup(.data) + roundup(.data.rel.ro)
New size: roundup(.data.rel.ro + .bss.rel.ro) + .data

Other advantages:

* At runtime the 3 memory mappings decrease to 2.
* start(PT_TLS) = start(PT_GNU_RELRO) = start(RW PT_LOAD). This
  simplifies binary manipulation tools.
  GNU strip before 2.31 discards PT_GNU_RELRO if its
  address is not equal to the start of its associated PT_LOAD.
  This has been fixed by https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f2731e0c374e5323ce4cdae2bcc7b7fe22da1a6f
  But with this change, we will be compatible with GNU strip before 2.31
* Before, .got.plt (non-relro by default) was placed before .got (relro
  by default), which made it impossible to have _GLOBAL_OFFSET_TABLE_
  (start of .got.plt on x86-64) equal to the end of .got (R_GOT*_FROM_END)
  (https://bugs.llvm.org/show_bug.cgi?id=36555). With the new ordering, we
  can improve on this regard if we'd like to.

Reviewers: ruiu, espindola, pcc

Subscribers: emaste, arichardson, llvm-commits, joerg, jdoerfert

Differential Revision: https://reviews.llvm.org/D56828

llvm-svn: 356117
2019-03-14 03:47:45 +00:00
Sean Fertile 0205828be4 [PPC64] Update tests to reflect change in printing of call operand. [NFC]
The printing of branch operands for call instructions was changed to properly
handle negative offsets. Updating the tests to reflect that.

llvm-svn: 353866
2019-02-12 17:49:04 +00:00
Sean Fertile 614dc11ca8 [PPC64] Long branch thunks.
On PowerPC64, when a function call offset is too large to encode in a call
instruction the address is stored in a table in the data segment. A thunk is
used to load the branch target address from the table relative to the
TOC-pointer and indirectly branch to the callee. When linking position-dependent
code the addresses are stored directly in the table, for position-independent
code the table is allocated and filled in at load time by the dynamic linker.

For position-independent code the branch targets could have gone in the .got.plt
but using the .branch_lt section for both position dependent and position
independent binaries keeps it consitent and helps keep this PPC64 specific logic
seperated from the target-independent code handling the .got.plt.

Differential Revision: https://reviews.llvm.org/D53408

llvm-svn: 346877
2018-11-14 17:56:43 +00:00
Sean Fertile 3acfe400a2 [PPC64] Fix offset checks on rel24 call relocations.
Adjusted the range check on a call instruction from 24 bits signed to
26 bits signed. While the instruction only encodes 24 bits, the target is
assumed to be 4 byte aligned, and the value that is encoded in the instruction
gets shifted left by 2 to form the offset. Also added a check that the offset is
indeed at least 4 byte aligned.

Differential Revision: https://reviews.llvm.org/D53401

llvm-svn: 344747
2018-10-18 15:43:41 +00:00