Similar to D69607 but for archive member extraction unrelated to GC. This patch adds --why-extract=.
Prior art:
GNU ld -M prints
```
Archive member included to satisfy reference by file (symbol)
a.a(a.o) main.o (a)
b.a(b.o) (b())
```
-M is mainly for input section/symbol assignment <-> output section mapping
(often huge output) and the information may appear ad-hoc.
Apple ld64
```
__Z1bv forced load of b.a(b.o)
_a forced load of a.a(a.o)
```
It doesn't say the reference file.
Arm's proprietary linker
```
Selecting member vsnprintf.o(c_wfu.l) to define vsnprintf.
...
Loading member vsnprintf.o from c_wfu.l.
definition: vsnprintf
reference : _printf_a
```
---
--why-extract= gives the user the full data (which is much shorter than GNU ld
-Map). It is easy to track a chain of references to one archive member with a
one-liner, e.g.
```
% ld.lld main.o a_b.a b_c.a c.a -o /dev/null --why-extract=- | tee stdout
reference extracted symbol
main.o a_b.a(a_b.o) a
a_b.a(a_b.o) b_c.a(b_c.o) b()
b_c.a(b_c.o) c.a(c.o) c()
% ruby -ane 'BEGIN{p={}}; p[$F[1]]=[$F[0],$F[2]] if $.>1; END{x="c.a(c.o)"; while y=p[x]; puts "#{y[0]} extracts #{x} to resolve #{y[1]}"; x=y[0] end}' stdout
b_c.a(b_c.o) extracts c.a(c.o) to resolve c()
a_b.a(a_b.o) extracts b_c.a(b_c.o) to resolve b()
main.o extracts a_b.a(a_b.o) to resolve a
```
Archive member extraction happens before --gc-sections, so this may not be a live path
under --gc-sections, but I think it is a good approximation in practice.
* Specifying a file avoids output interleaving with --verbose.
* Required `=` prevents accidental overwrite of an input if the user forgets `=`. (Most of compiler drivers' long options accept `=` but not ` `)
Differential Revision: https://reviews.llvm.org/D109572
Original commit description:
[LLD] Remove global state in lld/COFF
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.
See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.
I also haven't moved the `driver` or `config` variables yet.
Differential Revision: https://reviews.llvm.org/D109634
This reverts commit a2fd05ada9.
Original commits were b4fa71eed3
and e03c7e367a.
... instead of constructing a new one each time. This allows us
to take advantage of {D105305}.
I didn't see a substantial difference when linking chromium_framework,
but this paves the way for reusing similar logic for splitting compact
unwind entries into sections. There are a lot more of those, so the
performance impact is significant.
Differential Revision: https://reviews.llvm.org/D109895
Sometimes people intentionally re-define a dylib personlity symbol as a local defined symbol as a workaround to a ld -r bug.
As a result, we could see "too many personalities" to encode. This patch tries to handle this case by ignoring the local symbols entirely.
Differential Revision: https://reviews.llvm.org/D107533
Add a test to ensure that MachO files including
a LC_CODE_SIGNATURE load command produced by lld
are signed correctly.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D109840
Move the functionality in lld that handles writing of the LC_CODE_SIGNATURE load command and associated data section to a central reusable location.
This change is in preparation for another change that modifies llvm-objcopy to reproduce the LC_CODE_SIGNATURE load command and corresponding
data section to maintain the validity of signed macho object files passed through llvm-objcopy.
Reviewed By: #lld-macho, int3, oontvoo
Differential Revision: https://reviews.llvm.org/D109803
This test checks that timers are working and printing as expected.
I also seem to have changed the order of the timers in my globals refactoring
patch, so I fixed it here.
Differential Revision: https://reviews.llvm.org/D109904
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.
See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.
I also haven't moved the `driver` or `config` variables yet.
Differential Revision: https://reviews.llvm.org/D109634
This way, we do not need to set LLVM_CMAKE_PATH to LLVM_CMAKE_DIR when (NOT LLVM_CONFIG_FOUND)
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D107717
Rather than depending on the hex dump from obj2yaml. Now the test shows the
expected function body in a human readable format.
Differential Revision: https://reviews.llvm.org/D109730
This matters for example for the iPhoneSimulator14.0.sdk, which has
a System/Library/Frameworks/UIKit.framework/UIKit that has
LC_BUILD_VERSION with minos of 14.0, so linking against that file
will produce warnings like:
.../iPhoneSimulator14.0.sdk/System/Library/Frameworks/UIKit.framework/UIKit
has version 14.0.0, which is newer than target minimum of 12.0.0
when targeting x86_64-apple-ios12.0-simulator. That doens't happen when
linking against UIKit.tbd instead, obviously.
Linking with RC_TRACE_DYLIB_SEARCHING=1 shows that ld64 also searches
the tbd file first, and we already get that right for non-framework
dylibs.
Fixes crbug.com/1249456.
Differential Revision: https://reviews.llvm.org/D109768
We previously had a limitation that TLS variables could not
be exported (and therefore could also not be imported). This
change removed that limitation.
Differential Revision: https://reviews.llvm.org/D108877
For multithreaded modules (i.e. modules with a shared memory), lld injects a
synthetic Wasm start function that is automatically called during instantiation
to initialize memory from passive data segments. Even though the module will be
instantiated separately on each thread, memory initialization should happen only
once. Furthermore, memory initialization should be finished by the time each
thread finishes instantiation. Since multiple threads may be instantiating their
modules at the same time, the synthetic function must synchronize them.
The current synchronization tries to atomically increment a flag from 0 to 1 in
memory then enters one of two cases. First, if the increment was successful, the
current thread is responsible for initializing memory. It does so, increments
the flag to 2 to signify that memory has been initialized, then notifies all
threads waiting on the flag. Otherwise, the thread atomically waits on the flag
with an expected value of 1 until memory has been initialized. Either the
initializer thread finishes initializing memory (i.e. sets the flag to 2) first
and the waiter threads do not end up blocking, or the waiter threads succesfully
start waiting before memory is initialized so they will be woken by the
initializer thread once it has finished.
One complication with this scheme is that there are various contexts on the Web,
most notably on the main browser thread, that cannot successfully execute a
wait. Executing a wait in these contexts causes a trap, and in this case would
cause instantiation to fail. The embedder must therefore ensure that these
contexts win the race and become responsible for initializing memory, since that
is the only code path that does not execute a wait.
Unfortunately, since only one thread can win the race and initialize memory,
this scheme makes it impossible to have multiple threads in contexts that cannot
wait. For example, it is not currently possible to instantiate the module on
both the main browser thread as well as in an AudioWorklet. To loosen this
restriction, this commit inserts an extra check so that the wait will not be
executed at all when memory has already been initialized, i.e. when the flag
value is 2. After this change, the module can be instantiated on threads in
non-waiting contexts as long as the embedder can guarantee either that the
thread will win the race and initialize memory (as before) or that memory has
already been initialized when instantiation begins. Threads in contexts that can
wait can continue racing to initialize memory.
Fixes (or at least improves) PR51702.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D109722
Remove some unnecessary logging from wasm-ld when running under
`--verbose`. Unlike `-debug` this logging is available in release
builds. This change makes it little more minimal/readable.
Also, avoid compiling the `debugWrite` function in releaase builds
where it does nothing. This should remove a lot debug strings from
the binary, and avoid having to construct unused debug strings at
runtime.
Differential Revision: https://reviews.llvm.org/D109583
In the case that TLS is used in the single-threaded program, and
therefore effectively lowered away, we still optionally create a
`__tls_base` symbols, but the code for setting it was assuming it was
always created.
Differential Revision: https://reviews.llvm.org/D109518
llvm::errs() is unbuffered. On a POSIX platform, composing a diagnostic
string may invoke the ::write syscall multiple times, which can be slow.
Buffer writes to a temporary SmallString when composing a single diagnostic to
reduce the number of ::write syscalls to one (also easier to read under
strace/truss).
For an invocation of ld.lld with 62000+ lines of
`ld.lld: warning: symbol ordering file: no such symbol: ` warnings (D87121),
the buffering decreases the write time from 1s to 0.4s (for /dev/tty) and
from 0.4s to 0.1s (for a tmpfs file). This can speed up
`relocation R_X86_64_PC32 out of range` diagnostic printing as well
with `--noinhibit-exec --no-fatal-warnings`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D87272
Failing to do so results in `std::bad_function_call` being
thrown when a pass tries to emit a diagnostic.
I've copied the relevant test over from LLD-ELF's test suite.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D109274
We end up calling resolveBranchVA(), which asserts for Undefineds.
As fix, just return early in Writer::run() if there are any diagnostics
after processing relocations (which is where undefined symbol errors are
emitted). This matches what the ELF port does.
Differential Revision: https://reviews.llvm.org/D109079
Fixes PR51578 in practice.
Currently there's only enough room for a single thunk, which for real-life code
isn't enough. The error case only happens when there are many branch statements
very close to each other (0 or 1 instructions apart), with the function at the
finalization barrier small.
There's a FIXME on what to do if we hit this case, but that suggestion sounds
complicated to me (see end of PR51578 comment 5 for why).
Instead, just leave more room for thunks. Chromium's unit_tests links fine with
room for 3 thunks. Leave room for 100, which should fix this for most cases in
practice.
There's little cost for leaving lots of room: This slop value only determines
when we finalize sections, and we insert thunks for forward jumps into
unfinalized sections. So leaving room means we'll need a few more thunks, but
the thunk jump range is 128 MiB while a single thunk is just 12 bytes.
For Chromium's unit_tests:
With a slop of 3: thunk calls = 355418, thunks = 10903
With a slop of 100: thunk calls = 355426, thunks = 10904
Chances are 100 is enough for all use cases we'll hit in practice, but even
bumping it to 1000 would probably be fine.
Differential Revision: https://reviews.llvm.org/D108930
- Move a few variables closer to their uses, remove some completely
(no behavior change)
- Add some comments
- Make maxPotentialThunks include calls to stubs. It's possible that
an earlier call to a stub late in the stub table will need a thunk,
and that inserted thunk could push a stub earlier in the stub table
out of range. This is unlikely to happen, but usually there are
way fewer stub calls than non-stub calls, so if we're doing a
conservative approximation here we might as well do it correctly.
(For chromium's unit_tests target, 134421/242639 stub calls are
direct calls without this change, compared to 134408/242639 with
this change)
No real, meaningful behavior difference.
Differential Revision: https://reviews.llvm.org/D108924
- Don't subtract thunkSize from branchRange. Most places care about
the actual maximal branch range. Subtract thunkSize in the one place
that wants to leave room for a thunk.
- Set it to 0x800_0000 instead of 0xFF_FFFF
- Subtract 4 for the positive branch direction since it's a
two's complement 24bit number sign-extended mutiplied by 4,
so its range is -0x800_0000..+0x7FF_FFFC
- Make boundary checks include the boundary values
This doesn't make a huge difference in practice. It's preparation
for a "real" fix for PR51578 -- but it also lets the repro in comment 0
in that bug place one more thunk before hitting the TODO.
Differential Revision: https://reviews.llvm.org/D108897
The assert is harmless and thinks worked fine in builds with asserts enabled,
but it's still nice to fix the assert.
Differential Revision: https://reviews.llvm.org/D108853
We currently complain "could not open /LTCG: no such file or directory",
which isn't very useful. We could emit a warning when we see this flag, but
just ignoring it seems fine.
Final missing part of PR38799.
Differential Revision: https://reviews.llvm.org/D108799
This is what ld64 does. Deviating in behavior here can result
in some subtle duplicate symbol errors, as detailed in the objc.s test.
Differential Revision: https://reviews.llvm.org/D108781
The previous logic was duplicated between symbol-initiated
archive loads versus flag-initiated loads (i.e. `-force_load` and
`-ObjC`). This resulted in code duplication as well as redundant work --
we would create Archive instances twice whenever we had one of those
flags; once in `getArchiveMembers` and again when we constructed the
ArchiveFile.
This was motivated by an upcoming diff where we load archive members
containing ObjC-related symbols before loading those containing
ObjC-related sections, as well as before performing symbol resolution.
Without this refactor, it would be difficult to do that while avoiding
loading the same archive member twice.
Differential Revision: https://reviews.llvm.org/D108780
This was missed by {D107035}. This fix addresses the following warning:
loop variable 'personality' has type 'const uint32_t &' (aka 'const unsigned int &') but is initialized with type 'const unsigned long long' resulting in a copy [-Wrange-loop-analysis]
In addition to fixing the size, I also removed the const reference,
since there's no performance benefit to avoiding copies of integer-sized
values.