Commit Graph

11 Commits

Author SHA1 Message Date
Victor Huang 7b391245d8 [PowerPC] Fix thunk alignment issue when using pc-rel instruction
Thunk alignment is added in thie patch when using pc-rel instructions
to avoid crossing the 64 byte boundary.

Patched by: nemanjai, NeHuang
Reviewed By: sfertile, MaskRay

Differential Revision: https://reviews.llvm.org/D85973
2020-08-17 09:09:36 -05:00
Fangrui Song 71e2ca6e32 [llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:`
The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier.

`.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive.
Without `-LABEL`, the CHECK line can match the `Disassembly of section`
line and causes the next `CHECK-NEXT:` to fail.

```
Disassembly of section .foo:

0000000000001634 .foo:
```

Bdragon: <> has metalinguistic connotation. it just "feels right"

Reviewed By: rupprecht

Differential Revision: https://reviews.llvm.org/D75713
2020-03-05 18:05:28 -08:00
Fangrui Song 870094decf [ELF] Decrease alignment of ThunkSection on 64-bit targets from 8 to 4
ThunkSection contains 4-byte instructions on all targets that use
thunks. Thunks should not be used in any performance sensitive places,
and locality/cache line/instruction fetching arguments should not apply.

We use 16 bytes as preferred function alignments for modern PowerPC cores.
In any case, 8 is not optimal.

Differential Revision: https://reviews.llvm.org/D72819
2020-01-16 10:36:33 -08:00
Fangrui Song 345f59667d [ELF] Rename .plt to .iplt and decrease EM_PPC{,64} alignment of .glink to 4
GNU ld creates the synthetic section .iplt, and has a built-in linker
script that assigns .iplt to the output section .plt . There is no
output section named .iplt .

Making .iplt an output section actually has a benefit that makes the
tricky toolchain feature stand out. Symbolizers don't have to deal with
mixed PLT entries (e.g. llvm-objdump -d incorrectly annotates such jump
targets).

On EM_PPC{,64}, .glink contains a PLT resolver and a series of jump
instructions. The 4-byte entry size makes it unnecessary to have an
alignment of 16.

Mark ppc32-gnu-ifunc.s and ppc32-gnu-ifunc-nonpreemptable.s as `XFAIL: *`.
They test IPLT on EM_PPC, which never works.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D71520
2019-12-17 00:15:59 -08:00
Fangrui Song 01c7f4b606 [ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.

Let me demonstrate the idea with a maxPageSize=65536 example:

When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.

Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.

ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.

Two places that rely on p_vaddr%pagesize = 0 have to be updated.

1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
  to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
  The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
  Fix them. See the updated comments in InputSection.cpp for details.

  On targets that we enable the technique (only PPC64 now),
  we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
  if `sh_addralign(.tdata) < sh_addralign(.tbss)`

  This exposes many problems in ld.so implementations, especially the
  offsets of dynamic TLS blocks. Known issues:

  FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
  glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
  musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)

  So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).

The technique will be enabled (with updated tests) for other targets in
subsequent patches.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D64906

llvm-svn: 369343
2019-08-20 08:34:25 +00:00
Fangrui Song ce1534b405 [ELF][PPC64] Don't apply LD->LE relaxation on R_PPC64_GOT_DTPREL16*
In ELF v2 ABI, R_PPC64_GOT_DTPREL16* are not relaxed.

This family of relocation types are used for variables outside of 2GiB
of the TLS block. 2 instructions cannot materialize a DTPREL offset that
is not 32-bit.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D62737

llvm-svn: 362357
2019-06-03 05:41:31 +00:00
Fangrui Song 003c18a39c [PPC][PPC64] Improve some llvm-objdump -d -D tests
Various improvement:

Some offsets in disassembly are incorrect after several layout adjustment. Fix them.
llvm-objdump -D should not be used. -D dumps unrelated non-text sections. Replace them with llvm-objdump -d, llvm-readelf -x, etc
Many llvm-objdump -d tests use {{.*}} . Add the option --no-show-raw-insn to avoid check hex bytes.

ppc64-long-branch.s does not need a shared object. Delete it.
Make ppc64-ifunc.s check 2 ifuncs.

Reviewers: ruiu, espindola

Subscribers: emaste, nemanjai, arichardson, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60998

llvm-svn: 358975
2019-04-23 11:47:28 +00:00
Fangrui Song 5dffc95b26 [ELF][test] Use llvm-readelf's short option -r instead of -relocations and remove ignored --wide
Reviewers: ruiu, sfertile, espindola

Reviewed By: ruiu

Subscribers: jsji, emaste, nemanjai, arichardson, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D52124

llvm-svn: 343135
2018-09-26 19:48:07 +00:00
Zaara Syeda de54f584cc [PPC64] Add support for R_PPC64_GOT_DTPREL16* relocations
The local dynamic TLS access on PPC64 ELF v2 ABI uses R_PPC64_GOT_DTPREL16*
relocations when a TLS variables falls outside 2 GB of the thread storage
block. This patch adds support for these relocations by adding a new RelExpr
called R_TLSLD_GOT_OFF which emits a got entry for the TLS variable relative
to the dynamic thread pointer using the relocation R_PPC64_DTPREL64. It then
evaluates the R_PPC64_GOT_DTPREL16* relocations as the got offset for the
R_PPC64_DTPREL64 got entries.

Differential Revision: https://reviews.llvm.org/D48484

llvm-svn: 335732
2018-06-27 13:55:41 +00:00
Fangrui Song 3773c196fc [ELF][PPC64] Support R_PPC64_DTPREL64 which may be emitted in .rela.debug_addr
llvm-svn: 334533
2018-06-12 20:26:49 +00:00
Sean Fertile e6b2e06f28 [PPC64] Support R_PPC64_DTPREL relocations.
Patch adds support for most of the dynamic thread pointer based relocations
for local-dynamic tls. The HIGH and HIGHA versions are missing becuase they
are not supported by the llvm integrated assembler yet.

llvm-svn: 334465
2018-06-12 01:47:02 +00:00