llvm-project/lld/test/MachO/arm64-thunk-starvation.s

# REQUIRES: aarch64
# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %s -o %t.o
# RUN: %lld -arch arm64 -lSystem -o %t.out %t.o

## Regression test for PR51578.

.subsections_via_symbols

.globl _f1, _f2, _f3, _f4, _f5, _f6
.p2align 2
_f1: b _fn1
_f2: b _fn2
_f3: b _fn3
_f4: b _fn4
_f5: b _fn5
_f6: b _fn6
## 6 * 4 = 24 bytes for branches
## Currently leaves 12 bytes for one thunk, so 36 bytes.
## Uses < instead of <=, so 40 bytes.

.global _spacer1, _spacer2
## 0x8000000 is 128 MiB, one more than the forward branch limit,
## distributed over two functions since our thunk insertion algorithm
## can't deal with a single function that's 128 MiB.
## We leave just enough room so that the old thunking algorithm finalized
## both spacers when processing _f1 (24 bytes for the 4 bytes code for each
## of the 6 _f functions, 12 bytes for one thunk, 4 bytes because the forward
## branch range is 128 Mib - 4 bytes, and another 4 bytes because the algorithm
## uses `<` instead of `<=`, for a total of 44 bytes slop.) Of the slop, 20
## bytes are actually room for thunks.
## _fn1-_fn6 aren't finalized because then there wouldn't be room for a thunk.
## But when a thunk is inserted to jump from _f1 to _fn1, that needs 12 bytes
## but _f2 is only 4 bytes later, so after _f1 there are only
## 20-(12-4) = 12 bytes left, after _f2 only 12-(12-4) 4 bytes, and after
## _f3 there's no more room for thunks and we can't make progress.
## The fix is to leave room for many more thunks.
## The same construction as this test case can defeat that too with enough
## consecutive jumps, but in practice there aren't hundreds of consecutive
## jump instructions.

_spacer1:
.space 0x4000000
_spacer2:
.space 0x4000000 - 44

.globl _fn1, _fn2, _fn3, _fn4, _fn5, _fn6
.p2align 2
_fn1: ret
_fn2: ret
_fn3: ret
_fn4: ret
_fn5: ret
_fn6: ret

.globl _main
_main:
  ret
[lld/mac] Leave more room for thunks in thunk placement code Fixes PR51578 in practice. Currently there's only enough room for a single thunk, which for real-life code isn't enough. The error case only happens when there are many branch statements very close to each other (0 or 1 instructions apart), with the function at the finalization barrier small. There's a FIXME on what to do if we hit this case, but that suggestion sounds complicated to me (see end of PR51578 comment 5 for why). Instead, just leave more room for thunks. Chromium's unit_tests links fine with room for 3 thunks. Leave room for 100, which should fix this for most cases in practice. There's little cost for leaving lots of room: This slop value only determines when we finalize sections, and we insert thunks for forward jumps into unfinalized sections. So leaving room means we'll need a few more thunks, but the thunk jump range is 128 MiB while a single thunk is just 12 bytes. For Chromium's unit_tests: With a slop of 3: thunk calls = 355418, thunks = 10903 With a slop of 100: thunk calls = 355426, thunks = 10904 Chances are 100 is enough for all use cases we'll hit in practice, but even bumping it to 1000 would probably be fine. Differential Revision: https://reviews.llvm.org/D108930 2021-08-31 02:32:29 +08:00			`# REQUIRES: aarch64`
			`# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %s -o %t.o`
			`# RUN: %lld -arch arm64 -lSystem -o %t.out %t.o`

			`## Regression test for PR51578.`

			`.subsections_via_symbols`

			`.globl _f1, _f2, _f3, _f4, _f5, _f6`
			`.p2align 2`
			`_f1: b _fn1`
			`_f2: b _fn2`
			`_f3: b _fn3`
			`_f4: b _fn4`
			`_f5: b _fn5`
			`_f6: b _fn6`
			`## 6 * 4 = 24 bytes for branches`
			`## Currently leaves 12 bytes for one thunk, so 36 bytes.`
			`## Uses < instead of <=, so 40 bytes.`

[lld/mac] Give range extension thunks for local symbols local visibility When two local symbols (think: file-scope static functions, or functions in unnamed namespaces) with the same name in two different translation units both needed thunks, ld64.lld previously created external thunks for both of them. These thunks ended up with the same name, leading to a duplicate symbol error for the thunk symbols. Instead, give thunks for local symbols local visibility. (Hitting this requires a jump to a local symbol from over 128 MiB away. It's unlikely that a single .o file is 128 MiB large, but with ICF you can end up with a situation where the local symbol is ICF'd with a symbol in a separate translation unit. And that can introduce a large enough jump to require a thunk.) Fixes PR54599. Differential Revision: https://reviews.llvm.org/D122624 2022-03-29 08:17:08 +08:00			`.global _spacer1, _spacer2`
[lld/mac] Leave more room for thunks in thunk placement code Fixes PR51578 in practice. Currently there's only enough room for a single thunk, which for real-life code isn't enough. The error case only happens when there are many branch statements very close to each other (0 or 1 instructions apart), with the function at the finalization barrier small. There's a FIXME on what to do if we hit this case, but that suggestion sounds complicated to me (see end of PR51578 comment 5 for why). Instead, just leave more room for thunks. Chromium's unit_tests links fine with room for 3 thunks. Leave room for 100, which should fix this for most cases in practice. There's little cost for leaving lots of room: This slop value only determines when we finalize sections, and we insert thunks for forward jumps into unfinalized sections. So leaving room means we'll need a few more thunks, but the thunk jump range is 128 MiB while a single thunk is just 12 bytes. For Chromium's unit_tests: With a slop of 3: thunk calls = 355418, thunks = 10903 With a slop of 100: thunk calls = 355426, thunks = 10904 Chances are 100 is enough for all use cases we'll hit in practice, but even bumping it to 1000 would probably be fine. Differential Revision: https://reviews.llvm.org/D108930 2021-08-31 02:32:29 +08:00			`## 0x8000000 is 128 MiB, one more than the forward branch limit,`
			`## distributed over two functions since our thunk insertion algorithm`
			`## can't deal with a single function that's 128 MiB.`
			`## We leave just enough room so that the old thunking algorithm finalized`
			`## both spacers when processing _f1 (24 bytes for the 4 bytes code for each`
			`## of the 6 _f functions, 12 bytes for one thunk, 4 bytes because the forward`
			`## branch range is 128 Mib - 4 bytes, and another 4 bytes because the algorithm`
			## uses `<` instead of `<=`, for a total of 44 bytes slop.) Of the slop, 20
			`## bytes are actually room for thunks.`
			`## _fn1-_fn6 aren't finalized because then there wouldn't be room for a thunk.`
			`## But when a thunk is inserted to jump from _f1 to _fn1, that needs 12 bytes`
			`## but _f2 is only 4 bytes later, so after _f1 there are only`
			`## 20-(12-4) = 12 bytes left, after _f2 only 12-(12-4) 4 bytes, and after`
			`## _f3 there's no more room for thunks and we can't make progress.`
			`## The fix is to leave room for many more thunks.`
			`## The same construction as this test case can defeat that too with enough`
			`## consecutive jumps, but in practice there aren't hundreds of consecutive`
			`## jump instructions.`

			`_spacer1:`
			`.space 0x4000000`
			`_spacer2:`
			`.space 0x4000000 - 44`

			`.globl _fn1, _fn2, _fn3, _fn4, _fn5, _fn6`
			`.p2align 2`
			`_fn1: ret`
			`_fn2: ret`
			`_fn3: ret`
			`_fn4: ret`
			`_fn5: ret`
			`_fn6: ret`

			`.globl _main`
			`_main:`
			`ret`