Commit Graph

15417 Commits

Author SHA1 Message Date
Fangrui Song f77b77e8db [ELF][RISCV] Relax local-exec TLS model
In -mrelax mode, GCC/Clang may generate a local-exec TLS code sequence like:
```
# R_RISCV_TPREL_HI20, R_RISCV_RELAX
lui rd, %tprel_hi(x)
# R_RISCV_TPREL_ADD, R_RISCV_RELAX
add rd, rd, tp, %tprel_add(x)
# (R_RISCV_TPREL_LO12_I || R_RISCV_TPREL_LO12_S), R_RISCV_RELAX
addi rd, rd, %tprel_lo(x) || sw rs, %tprel(x)(rd)
```

Note: st_value(x) for TLS should be in the range [0,p_memsz(PT_TLS)).
When st_value(x) < 2048 (i.e. hi20(x) == 0), the linker can relax the code
sequence to:
```
addi rd, tp, st_value(x) || sw rs, st_value(x)(rd)
```

Differential Revision: https://reviews.llvm.org/D129425
2022-07-15 10:08:08 -07:00
Fangrui Song 51b9e099d5 [ELF] Reword --no-allow-shlib-undefined diagnostic
Use a format more similar to unresolved references from regular object
files. It's probably easier to read for people who are less familiar
with the linker diagnostics.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D129790
2022-07-15 01:29:58 -07:00
Alexandre Ganea 17a4427e82 [LLD][COFF] On Windows, fix the date formatting in the 'incremental' test.
On my system the date formatting is a bit different from what the test used to
support. I'm using:

  Windows 11 version 21H2, build 22000.795 using the English(Canada) region.
  ls from BusyBox 1.36
  VS 2022 17.2.5
  WinSDK 10.0.22000
2022-07-14 17:10:09 -04:00
Fangrui Song 889c6f3996 [ELF][test] Fix a typo in aarch64-ifunc-bti.s to actually test what was intended
Thanks to Alex Brachet for spotting it in D110217.
2022-07-14 13:46:38 -07:00
Jez Ng 403d61aedd [lld-macho] Enable EH frame relocation / pruning
This just removes the code that gates the logic. The main issue here is
perf impact: without {D122258}, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:

**chromium_framework**

This is without the use of `-femit-dwarf-unwind=no-compact-unwind`:

             base           diff           difference (95% CI)
  sys_time   1.826 ± 0.019  1.962 ± 0.034  [  +6.5% ..   +8.4%]
  user_time  9.306 ± 0.054  9.926 ± 0.082  [  +6.2% ..   +7.1%]
  wall_time  8.225 ± 0.068  8.947 ± 0.128  [  +8.0% ..   +9.6%]
  samples    15             22

With that flag enabled, the regression mostly disappears, as hoped:

             base           diff           difference (95% CI)
  sys_time   1.839 ± 0.062  1.866 ± 0.068  [  -0.9% ..   +3.8%]
  user_time  9.452 ± 0.068  9.490 ± 0.067  [  -0.1% ..   +0.9%]
  wall_time  8.383 ± 0.127  8.452 ± 0.114  [  -0.1% ..   +1.8%]
  samples    17             21

**Unnamed internal app**

Without `-femit-dwarf-unwind`, this is the perf hit:

             base           diff           difference (95% CI)
  sys_time   1.372 ± 0.029  1.317 ± 0.024  [  -4.6% ..   -3.5%]
  user_time  2.835 ± 0.028  2.980 ± 0.027  [  +4.8% ..   +5.4%]
  wall_time  3.205 ± 0.079  3.383 ± 0.066  [  +4.9% ..   +6.2%]
  samples    102            83

With `-femit-dwarf-unwind`, the perf hit almost disappears:

             base           diff           difference (95% CI)
  sys_time   1.274 ± 0.026  1.270 ± 0.025  [  -0.9% ..   +0.3%]
  user_time  2.812 ± 0.023  2.822 ± 0.035  [  +0.1% ..   +0.7%]
  wall_time  3.166 ± 0.047  3.174 ± 0.059  [  -0.2% ..   +0.7%]
  samples    95             97

Just for fun, I measured the impact of `-femit-dwarf-unwind` on ld64
(`base` has the extra DWARF unwind info in the input object files,
`diff` doesn't):

             base           diff           difference (95% CI)
  sys_time   1.128 ± 0.010  1.124 ± 0.023  [  -1.3% ..   +0.6%]
  user_time  7.176 ± 0.030  7.106 ± 0.094  [  -1.5% ..   -0.4%]
  wall_time  7.874 ± 0.041  7.795 ± 0.121  [  -1.7% ..   -0.3%]
  samples    16             25

And for LLD:

             base           diff           difference (95% CI)
  sys_time   1.315 ± 0.019  1.280 ± 0.019  [  -3.2% ..   -2.0%]
  user_time  2.980 ± 0.022  2.822 ± 0.016  [  -5.5% ..   -5.0%]
  wall_time  3.369 ± 0.038  3.175 ± 0.033  [  -6.2% ..   -5.3%]
  samples    47             47

So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D129540
2022-07-13 21:14:05 -04:00
Stefan Pintilie c1f3cffee1 [PowerPC][LLD] Change PPC64R2SaveStub to only use non-PC-relative code
Currently the PPC64R2SaveStub thunk will produce Power 10 code by default.
This produced an issue when linking older code that made use of the st_other=1
bit but was never meant to be linked or run on Power 10.

This patch makes it so that only the R_PPC64_REL24_NOTOC relocation can produce
Power 10 code.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D129580
2022-07-13 19:34:33 -05:00
Fangrui Song e690137dde [Support] Change compression::zlib::{compress,uncompress} to use uint8_t *
It's more natural to use uint8_t * (std::byte needs C++17 and llvm has
too much uint8_t *) and most callers use uint8_t * instead of char *.
The functions are recently moved into `llvm::compression::zlib::`, so
downstream projects need to make adaption anyway.
2022-07-13 16:26:54 -07:00
Daniel Bertalan 94e0f8e001 [lld-macho] Accept dylibs with LC_DYLD_EXPORTS_TRIE
This load command specifies the offset and size of the exports trie.
This information used to be a field in LC_DYLD_INFO, but in newer
libraries, it has a dedicated load command: LC_DYLD_EXPORTS_TRIE.

The format of the trie is the same for both load commands, so the code
for parsing it can be shared.

LLD does not generate this yet; it is mainly useful when chained fixups
are in use, as the other members of LC_DYLD_INFO are unused then, so the
smaller LC_DYLD_EXPORTS_TRIE can be output instead.

LLDB gained support for this in D107673.

Fixes #54550

Differential Revision: https://reviews.llvm.org/D129430
2022-07-13 22:34:11 +02:00
Daniel Bertalan ecb14fd872 [lld-macho] Add LOH_ARM64_ADRP_LDR_GOT_LDR optimization hint support
This hint instructs the linker to relax a GOT-indirect load.
If the referenced symbol is external and its GOT entry is within +/- 1
MiB, the GOT entry can be loaded with a single literal ldr instruction.
If the referenced symbol is local, its address may be loaded directly if
it's close enough, or with an adr(p) + ldr pair if it's not.

This type accounts for more than half of all LOHs in chromium_framework.

This commit moves the eligibility checks into helper functions to
improve the readability of the LOH processing code. Ho functional
changes are intended to the previously implemented LOH types.

Differential Revision: https://reviews.llvm.org/D129427
2022-07-13 12:20:14 +02:00
Kazu Hirata e5f568a49f Use has_value instead of hasValue (NFC) 2022-07-13 01:58:03 -07:00
Fangrui Song 9ea5b34f05 [ELF][RISCV] Use unshifted value for overflow check
The unshifted value indicates an displacement in bytes which is more meaningful.
2022-07-13 00:28:29 -07:00
Fangrui Song 6b1d151fe3 [ELF] Fix displacement computation for intra-section branch after D127611
D127611 computed st_value is inaccurate:

* For a backward branch, the destination address may be wrong if there is no
  relaxable relocation between it and the current location due to `if (remove)`.
  We may incorrectly relax a branch to c.j which ends up an overflow.
* For a forward branch, the destination address may be overestimated
  and lose relaxation opportunities.

To fix the issues,

* Don't reset st_value to the original value.
* Save the st_value delta from the previous iteration into valueDelta, and use
  `sa[0].d->value -= delta - valueDelta.find(sa[0].d)->second`.
2022-07-13 00:17:17 -07:00
Fangrui Song 67d760dd49 [ELF][test] Remove unneeded --mcpu=future from llvm-objdump commands 2022-07-12 21:08:52 -07:00
Fangrui Song 4864aba631 [ELF][test] Remove unneeded --mcpu=pwr10 from llvm-objdump commands
llvm-objdump has defaulted to decode all known instructions for PPC64.
2022-07-12 21:07:45 -07:00
Jez Ng 61ace8f78b [lld-macho][nfc] Change force-load.s test to actually test
I'd forgotten to change a copypasted line...
2022-07-12 17:57:09 -04:00
YongKang Zhu 2324c2e3c3 [LLD] Two tweaks to symbol ordering scheme
When `--symbol-ordering-file` is specified, the linker today will always put
hot contributions in the middle of cold ones when targeting RISC machine, so
to minimize the chances that branch thunks need be generated for hot code
calling into cold code. This is not necessary when user specifies an ordering
of read-only data (vs. function) symbols, or when output section is small such
that no branch thunk would ever be required. The latter is common for mobile
apps. For example, among all the native ARM64 libraries in Facebook Instagram
App for Android, 80% of them have text section smaller than 64KB and the
largest text section seen is less than 8MB, well below the distance that a
BRANCH26 can reach.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D128382
2022-07-12 11:34:17 -07:00
Alex Brachet 5176a7671f Fix build on Windows
It seems like the `sed` on Windows is not particularly
smart. It's not actually needed in this place, so I've
removed it's usage and just created an invalid yaml
another way.
2022-07-11 22:47:26 +00:00
Alex Brachet d27984a651 Fix build on Windows
Error message is not capitalized on Windows
2022-07-11 21:44:28 +00:00
Alex Brachet fd9962e75d [COFF] Add vfsoverlay flag
This patch adds a new flag vfsoverlay similar to clang’s
ivfsoverlay flag. This is helpful when compiling on case
sensitive file systems when cross compiling to Windows.
Particularly when compiling third party code containing
\#pragma comment(“linker”, “/defaultlib:...”) which
can’t be easily changed.

Differential Revision: https://reviews.llvm.org/D125800
2022-07-11 21:31:01 +00:00
Kaining Zhong 6c641d0de6 [lld-macho] Handle user-provided dtrace symbols to avoid linking failure
This fixes https://github.com/llvm/llvm-project/issues/56238. ld64.lld currently does not generate __dof section in Mach-O, and -no_dtrace_dof option is on by default. However when there are user-defined dtrace symbols, ld64.lld will treat them as undefined symbols, which causes the linking to fail because lld cannot find their definitions. This patch allows ld64.lld to rewrite the instructions calling dtrace symbols to instructions like nop as what ld64 does; therefore, when encountered with user-provided dtrace probes, the linking can still succeed.

I'm not sure whether support for dtrace is expected in lld, so for now I didn't add codes to make lld emit __dof section like ld64, and only made it possible to link with dtrace symbols provided. If this feature is needed, I can add that part in Dtrace.cpp & Dtrace.h.

Reviewed By: int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D129062
2022-07-11 15:32:26 -04:00
David Spickett 79942d32a6 [lld-macho] Fix compact unwind output for 32 bit builds
This test was failing on our 32 bit build bot:
https://lab.llvm.org/buildbot/#/builders/178/builds/2463

This happened because in UnwindInfoSectionImpl::finalize
a decision is made whether to write out regular or compressed
unwind info.

One check in this does:
```
if (cuPtr->functionAddress >= functionAddressMax) {
        break;
```

Where cuPtr->functionAddress was uint64_t and functionAddressMax
was uintptr_t, which is 4 bytes on a 32 bit system.

Using uint64_t for functionAddressMax fixes this problem.
Presumably because at only 4 bytes, the max is much lower than
we expect. We're targetting 64 bit though so the size of the max
should match the size of the addresses.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D129363
2022-07-11 08:21:03 +00:00
Nico Weber 109d7fb4e6 fix comment typo to cycle bots 2022-07-09 22:41:58 +02:00
Fangrui Song dd74d3117d [ELF] Refactor ELFCOMPRESS_ZLIB handling and improve diagnostics
And add some tests.
2022-07-08 14:04:19 -07:00
Leonard Chan 474c873148 Revert "[llvm] cmake config groundwork to have ZSTD in LLVM"
This reverts commit f07caf20b9 which seems to break upstream https://lab.llvm.org/buildbot/#/builders/109/builds/42253.
2022-07-08 13:48:05 -07:00
Cole Kissane f07caf20b9 [llvm] cmake config groundwork to have ZSTD in LLVM
- added `FindZSTD.cmake`
- added a CMake option `LLVM_ENABLE_ZSTD` with behavior mirroring that of `LLVM_ENABLE_ZLIB`
- likewise added have_zstd to compiler-rt/test/lit.common.cfg.py, clang-tools-extra/clangd/test/lit.cfg.py, and several lit.site.cfg.py.in files mirroring have_zlib behavior

Reviewed By: leonardchan, MaskRay

Differential Revision: https://reviews.llvm.org/D128465
2022-07-08 11:46:52 -07:00
Cole Kissane ea61750c35 [NFC] Refactor llvm::zlib namespace
* Refactor compression namespaces across the project, making way for a possible
  introduction of alternatives to zlib compression.
  Changes are as follows:
  * Relocate the `llvm::zlib` namespace to `llvm::compression::zlib`.

Reviewed By: MaskRay, leonardchan, phosek

Differential Revision: https://reviews.llvm.org/D128953
2022-07-08 11:19:07 -07:00
Fangrui Song 75e551e5d8 [ELF] Relax R_RISCV_CALL and R_RISCV_CALL_PLT
A pair of auipc+jalr relocated by R_RISCV_CALL or R_RISCV_CALL_PLT can be
converted to c.j, c.jal, or jal.

* c.j: RVC and displacement is representable as an int12
* c.jal: RV32C and displacement is representable as an int12
* jal: displacement is representable as an int21

Use the D127581 relaxation framework to implement the relaxation. If a shorter
sequence is satisfied, we record the new relocation type in `relocTypes` and
saves the new instruction into `writes`. Finally let `riscvFinalizeRelax` rewrite the
instruction by setting `skip`.

Differential Revision: https://reviews.llvm.org/D127611
2022-07-07 10:18:45 -07:00
Fangrui Song 6611d58f5b [ELF] Relax R_RISCV_ALIGN
Alternative to D125036. Implement R_RISCV_ALIGN relaxation so that we can handle
-mrelax object files (i.e. -mno-relax is no longer needed) and creates a
framework for future relaxation.

`relaxAux` is placed in a union with InputSectionBase::jumpInstrMod, storing
auxiliary information for relaxation. In the first pass, `relaxAux` is allocated.
The main data structure is `relocDeltas`: when referencing `relocations[i]`, the
actual offset is `r_offset - (i ? relocDeltas[i-1] : 0)`.

`relaxOnce` performs one relaxation pass. It computes `relocDeltas` for all text
section. Then, adjust st_value/st_size for symbols relative to this section
based on `SymbolAnchor`. `bytesDropped` is set so that `assignAddresses` knows
that the size has changed.

Run `relaxOnce` in the `finalizeAddressDependentContent` loop to wait for
convergence of text sections and other address dependent sections (e.g.
SHT_RELR). Note: extrating `relaxOnce` into a separate loop works for many cases
but has issues in some linker script edge cases.

After convergence, compute section contents: shrink the NOP sequence of each
R_RISCV_ALIGN as appropriate. Instead of deleting bytes, we run a sequence of
memcpy on the content delimitered by relocation locations. For R_RISCV_ALIGN let
the next memcpy skip the desired number of bytes. Section content computation is
parallelizable, but let's ensure the implementation is mature before
optimizations. Technically we can save a copy if we interleave some code with
`OutputSection::writeTo`, but let's not pollute the generic code (we don't have
templated relocation resolving, so using conditions can impose overhead to
non-RISCV.)

Tested:
`make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- LLVM=1 defconfig all` built Linux kernel using -mrelax is bootable.
FreeBSD RISCV64 system using -mrelax is bootable.
bash/curl/firefox/libevent/vim/tmux using -mrelax works.

Differential Revision: https://reviews.llvm.org/D127581
2022-07-07 10:16:09 -07:00
Tim Northover 0f4339a835 lld test fix: don't check the precise hex emitted as a comment.
It can vary depending on the platform, so as with the NO-FMA test just check
for "0x".
2022-07-07 13:25:24 +01:00
Tim Northover fe62019387 lld: fix test after x86 instruction comments now end in newline 2022-07-07 13:01:32 +01:00
Jin Xin Ng 65001f5777
[LTO][ELF] Add selective --save-temps= option
Allows specific “temps” to be saved, instead of the current all-or-nothing nature of --save-temps. Multiple of these “temps” can be saved by specifying the argument multiple times.

Differential Revision: https://reviews.llvm.org/D127778
2022-07-06 10:06:18 -07:00
Fangrui Song e0612c91cd [ELF] Optimize getInputSections. NFC
In the majority of cases (e.g. orphan sections), an OutputSection has at most
one InputSectionDescription (isd). By changing the return type to
ArrayRef<InputSection *> we can just reference the isd->sections. For
OutputSections with more than one InputSectionDescription we use a caller
provided SmallVector to copy the elements as before.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D129111
2022-07-05 23:31:09 -07:00
Ben Dunbobbin c35a6454b1 [BUILD] Add missed CMakeLists.txt change from dfb77f2
See: https://reviews.llvm.org/D128195
2022-07-05 16:04:58 +01:00
Ben Dunbobbin dfb77f2e99 [LLD][ELF] Add FORCE_LLD_DIAGNOSTICS_CRASH to force LLD to crash
Add FORCE_LLD_DIAGNOSTICS_CRASH inspired by the existing
FORCE_CLANG_DIAGNOSTICS_CRASH.

This is particularly useful for people customizing LLD as they may
want to modify the crash reporting behavior.

Differential Revision: https://reviews.llvm.org/D128195
2022-07-05 09:43:09 +01:00
Daniel Bertalan 2028fe6fbc [lld-macho] Handle LOH_ARM64_ADRP_LDR_GOT optimization hints
This hint instructs the linker to perform the AdrpLdr or AdrpAdd
transformation depending on whether the GOT load has been relaxed to
load a local symbol's address.

Differential Revision: https://reviews.llvm.org/D129059
2022-07-05 07:33:13 +02:00
Pengxuan Zheng b5e49cdea9 [LLD][COFF] Ignore /kernel flag
There exists some description of the flag from Microsoft, but not sure if
there's more to it. We ignore the flag for now until we find out more about it.

https://docs.microsoft.com/en-us/cpp/build/reference/kernel-create-kernel-mode-binary?view=msvc-170

Reviewed By: thieta, hans

Differential Revision: https://reviews.llvm.org/D128238
2022-07-01 10:03:02 -07:00
Daniel Bertalan 73b659ff55 [lld-macho] Fix left shift of negative value UB
I introduced this mistake in 573c7e6b3c.

Fixes the failure on this UBSan bot:
https://lab.llvm.org/buildbot/#/builders/5/builds/25537
2022-07-01 12:00:16 +02:00
Daniel Bertalan 573c7e6b3c [lld-macho] Handle LOH_ARM64_ADRP_LDR linker optimization hints
This linker optimization hint transforms a pair of adrp+ldr (immediate)
instructions into an ldr (literal) load from a PC-relative address if
it is 4-byte aligned and within +/- 1 MiB, as ldr can encode a signed
19-bit offset that gets multiplied by 4.

In the wild, only a small number of these hints are applicable because
not many loads end up close enough to the data segment. However, the
added helper functions will be useful in implementing the rest of the
LOH types.

Differential Revision: https://reviews.llvm.org/D128942
2022-07-01 09:44:24 +02:00
Daniel Bertalan a3f67f0920 [lld-macho] Initial support for Linker Optimization Hints
Linker optimization hints mark a sequence of instructions used for
synthesizing an address, like ADRP+ADD. If the referenced symbol ends up
close enough, it can be replaced by a faster sequence of instructions
like ADR+NOP.

This commit adds support for 2 of the 7 defined ARM64 optimization
hints:
- LOH_ARM64_ADRP_ADD, which transforms a pair of ADRP+ADD into ADR+NOP
  if the referenced address is within +/- 1 MiB
- LOH_ARM64_ADRP_ADRP, which transforms two ADRP instructions into
  ADR+NOP if they reference the same page

These two kinds already cover more than 50% of all LOHs in
chromium_framework.

Differential Review: https://reviews.llvm.org/D128093
2022-06-30 06:28:42 +02:00
Fangrui Song 9a572164d5 [ELF] Move InputFiles global variables (memoryBuffers, objectFiles, etc) into Ctx. NFC 2022-06-29 18:53:38 -07:00
Fangrui Song e980f16d52 [ELF] Move whyExtract/backwardReferences from LinkerDriver to Ctx. NFC
Ctx was recently added as a more suitable place for such singletons.
2022-06-29 17:34:31 -07:00
Daniel Bertalan 8d29f0fdb9 [lld-macho] Emit REBASE_OPCODE_ADD_ADDR_IMM_SCALED if possible
An ADD_ADDR rebase opcode's argument can be encoded as an immediate if
the offset is less than 15 * word size. This change reduces the size of
chromium_framework by 100+ KiB.

Differential Revision: https://reviews.llvm.org/D128798
2022-06-29 22:28:39 +02:00
Brad Smith 84b2e04aea [docs] Remove outdated status update for FreeBSD
Reviewed By: emaste, MaskRay

Differential Revision: https://reviews.llvm.org/D128592
2022-06-27 19:41:53 -04:00
Sam Clegg 53217ecb88 [lld][WebAssembly] Don't apply data relocations at static constructor time
Instead, export `__wasm_apply_data_relocs` and `__wasm_call_ctors`
separately.

This is required since user code in a shared library (such as static
constructors) should not be run until relocations have been applied to
all loaded libraries.

See: https://github.com/emscripten-core/emscripten/issues/17295

Differential Revision: https://reviews.llvm.org/D128515
2022-06-27 15:50:02 -07:00
Kazu Hirata 586fb81eee [lld] Don't use Optional::hasValue (NFC)
This patch replaces x.hasValue() with x where x is contextually
convertible to bool.
2022-06-26 19:37:14 -07:00
Fangrui Song 0688b00fc3 [ELF] Remove deprecated -dc
-dc is deprecated in release/14.x. Remove it for 15.0.
The only usage I know was FreeBSD crungen which was removed by https://reviews.freebsd.org/D34215

glibc just dropped -Wl,-d today. Keep -d for now.
2022-06-26 17:26:44 -07:00
Fangrui Song b95cca03cd [ELF] Improve compound assignment tests
Also use strchr instead of is_contained.
2022-06-25 22:30:52 -07:00
Fangrui Song 0a0effdd5b [ELF] Support -= *= /= <<= >>= &= |= in symbol assignments 2022-06-25 22:22:59 -07:00
Fangrui Song 77295c5486 [ELF] Allow ? without adjacent space
GNU ld allows 1 ? 2?3:4 : 5?6 :7
2022-06-25 21:16:59 -07:00
Fangrui Song e3f3d2abf0 [ELF][test] Improve expression test 2022-06-25 21:11:32 -07:00