If linking directly against a DLL without an import library, the
DLL export symbols might not contain stdcall decorations.
If we have an undefined symbol with decoration, and we happen to have
a matching undecorated symbol (which either is lazy and can be loaded,
or already defined), then alias it against that instead.
This matches what's done in reverse, when we have a def file
declaring to export a symbol without decoration, but we only have
a defined decorated symbol. In that case we do a fuzzy match
(SymbolTable::findMangle). This case is more straightforward; if we
have a decorated undefined symbol, just strip the decoration and look
for the corresponding undecorated symbol name.
Add warnings and options for either silencing the warning or disabling
the whole feature, corresponding to how ld.bfd does it.
(This feature works for any symbol decoration mismatch, not only when
linking against a DLL directly; ld.bfd also tolerates it anywhere,
and also fixes up mismatches in the other direction, like
SymbolTable::findMangle, for any symbol, not only exports. But in
practice, at least for lld, it would primarily end up used for linking
against DLLs.)
Differential Revision: https://reviews.llvm.org/D104532
As the COFF linker is capable of linking directly against a DLL now
(after D104530, as long as it is running in mingw mode), don't error
out here but successfully load libraries specified with "-l" from DLLs
if that's what ld.bfd would have matched.
Differential Revision: https://reviews.llvm.org/D104531
GNU ld.bfd supports linking directly against DLLs without using an
import library, and some projects have picked up on this habit.
(There's no one single unsurmountable issue with using import
libraries, but this is a regularly surfacing missing feature.)
As long as one is linking by name (instead of by ordinal), the DLL
export table contains most of the information needed. (One can
inspect what section a symbol points at, to see if it's a function
or data symbol. The practical implementation of this loops over all
sections for each symbol, but as long as they're not very many, that
should hopefully be tolerable performance wise.)
One exception where the information in the DLL isn't entirely enough
is on i386 with stdcall functions; depending on how they're done,
the exported function name can be a plain undecorated name, while
the import library would contain the full decorated symbol name. This
issue is addressed separately in a different patch.
This is implemented mimicing the structure of a regular import library,
with one InputFile corresponding to the static archive that just adds
lazy symbols, which then are fetched when they are needed. When such
a symbol is fetched, we synthesize a coff_import_header structure
in memory and create a regular ImportFile out of it.
The implementation could be even smaller by just creating ImportFiles
for every symbol available immediately, but that would have the
drawback of actually ending up importing all symbols unless running
with GC enabled (and mingw mode defaults to having it disabled for
historical reasons).
Differential Revision: https://reviews.llvm.org/D104530
We have been creating many ConcatInputSections with identical values due
to .subsections_via_symbols. This diff factors out the identical values
into a Shared struct, to reduce memory consumption and make copying
cheaper.
I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an
extra word.
All in all, this takes InputSection from 120 to 72 bytes (and
ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in
ConcatInputSection.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.14 4.24 4.18 4.183 0.027548999
+ 20 4.04 4.11 4.075 4.0775 0.018027756
Difference at 95.0% confidence
-0.1055 +/- 0.0149005
-2.52211% +/- 0.356215%
(Student's t, pooled s = 0.0232803)
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D105305
`__cfstring` is a special literal section, so instead of breaking it up
at symbol boundaries, we break it up at fixed-width boundaries (since
each literal is the same size). Symbols can only occur at one of those
boundaries, so this is strictly more powerful than
`.subsections_via_symbols`.
With that in place, we then run the section through ICF.
This change is about perf-neutral when linking chromium_framework.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D105045
This is a pretty big refactoring diff, so here are the motivations:
Previously, ICF ran after scanRelocations(), where we emitting
bind/rebase opcodes etc. So we had a bunch of redundant leftovers after
ICF. Having ICF run before Writer seems like a better design, and is
what LLD-ELF does, so this diff refactors it accordingly.
However, ICF had two dependencies on things occurring in Writer: 1) it
needs literals to be deduplicated beforehand and 2) it needs to know
which functions have unwind info, which was being handled by
`UnwindInfoSection::prepareRelocations()`.
In order to do literal deduplication earlier, we need to add literal
input sections to their corresponding output sections. So instead of
putting all input sections into the big `inputSections` vector, and then
filtering them by type later on, I've changed things so that literal
sections get added directly to their output sections during the 'gather'
phase. Likewise for compact unwind sections -- they get added directly
to the UnwindInfoSection now. This latter change is not strictly
necessary, but makes it easier for ICF to determine which functions have
unwind info.
Adding literal sections directly to their output sections means that we
can no longer determine `inputOrder` from iterating over
`inputSections`. Instead, we store that order explicitly on
InputSection. Bloating the size of InputSection for this purpose would
be unfortunate -- but LLD-ELF has already solved this problem: it reuses
`outSecOff` to store this order value.
One downside of this refactor is that we now make an additional pass
over the unwind info relocations to figure out which functions have
unwind info, since want to know that before `processRelocations()`. I've
made sure to run that extra loop only if ICF is enabled, so there should
be no overhead in non-optimizing runs of the linker.
The upside of all this is that the `inputSections` vector now contains
only ConcatInputSections that are destined for ConcatOutputSections, so
we can clean up a bunch of code that just existed to filter out other
elements from that vector.
I will test for the lack of redundant binds/rebases in the upcoming
cfstring deduplication diff. While binds/rebases can also happen in the
regular `.text` section, they're more common in `.data` sections, so it
seems more natural to test it that way.
This change is perf-neutral when linking chromium_framework.
Reviewed By: oontvoo
Differential Revision: https://reviews.llvm.org/D105044
Everything (including test) modified from ELF/COFF. Using the same syntax
(--lto-O3, etc) as ELF.
Differential Revision: https://reviews.llvm.org/D105223
Previously, we only applied the renames to
ConcatOutputSections.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D105079
For
```
SECTIONS {
text.0 : {}
text.1 : {}
text.2 : {}
} INSERT AFTER .data;
```
the current order is `.data text.2 text.1 text.0`. It makes more sense to
preserve the specified order and thus improve compatibility with GNU ld.
For
```
SECTIONS { text.0 : {} } INSERT AFTER .data;
SECTIONS { text.3 : {} } INSERT AFTER .data;
```
GNU ld somehow collects sections with `INSERT AFTER .data` together (IMO
inconsistent) but I think it makes more sense to execute the commands in order
and get `.data text.3 text.0` instead.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D105158
See the comment for my understanding of -no-pie and -shared expectation.
-no-pie has freedom on choices. We choose dynamic relocations to be consistent
with the handling of GOT-generating relocations.
Note: GNU ld has arch-varying behaviors and its x86 -pie has a very
complex rule:
if there is at least one GOT-generating or PLT-generating relocation and
-z dynamic-undefined-weak (enabled by default) is in effect, generate a
dynamic relocation.
We don't emulate its rule.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D105164
A couple of filecheck patterns had not been hooked up with
the patterns suffering from some drift. As this test is old
and llvm-objdump has improved a lot, take this opportunity to
hide the instruction encoding. I've also taken out a lot of
the explanatory comments that llvm-objdump improvements make
redundant, as these comments oftern don't get updated when addresses
change.
Differential Revision: https://reviews.llvm.org/D104907
There are a couple of problems with the code to patch
unrelocated BLX instructions:
1. The calculation of the PC needs to take into account
the alignment of the instruction. The Thumb BLX
uses alignDown(PC, 4) for the source address.
2. The calculation of the PC bias is hard-coded to 4
which works for Thumb, but when there is a BLX the
branch will be in Arm state so it needs an 8 byte
PC bias.
No asssembler generates an unrelocated BLX instruction
so these problems do not affect real world programs.
However we should still fix them.
Differential Revision: https://reviews.llvm.org/D104905
SymtabSection::emitStabs() writes the symbol table in the order
of externalSymbols, which has the order of symtab->getSymbols(),
which is just the order symbols are added to the symbol table.
In practice, symbols in the symbol files of input .o files are
sorted, but since that's not guaranteed we sort them in
ObjFile::parseSymbols(). To make sure several symbols with the same
address keep the order they're in the input file, we have to use
stable_sort().
In practice, std::sort() on already-sorted inputs won't change the order
of just adjacent elements, and while in theory std::sort() could use a
random pivot, in practice the code should be deterministic as it was
previously too.
But now lld/test/MachO/stabs.s passes with LLVM_ENABLE_EXPENSIVE_CHECKS=ON
(the last test that was failing with that set).
Fixes a regression from D99972.
While here, remove an empty section in stabs.s and move
.subsections_via_symbols to the end where it usually is (this part no
behavior change).
Differential Revision: https://reviews.llvm.org/D105071
Fixes PR50637.
Downstream bug: https://crbug.com/1218958
Currently, we split __cstring along symbol boundaries with .subsections_via_symbols
when not deduplicating, and along null bytes when deduplicating. This change splits
along null bytes unconditionally, and preserves original alignment in the non-
deduplicated case.
Removing subsections-section-relocs.s because with this change, __cstring
is never reordered based on the order file.
Differential Revision: https://reviews.llvm.org/D104919
The two different thread_local_regular sections (__thread_data and
more_thread_data) had nondeterminstic ordering for two reasons:
1. https://reviews.llvm.org/D102972 changed concatOutputSections
from MapVector to DenseMap, so when we iterate it to make
output segments, we would add the two sections to the __DATA
output segment in nondeterministic order.
2. The same change also moved the two stable_sort()s for segments
and sections to sort(). Since sections with assigned priority
(such as TLV data) have the same priority for all sections,
this is incorrect -- we must use stable_sort() so that the
initial (input-order-based) order remains.
As a side effect, we now (deterministically) put the __common
section in front of __bss (while previously we happened to
put it after it). (__common and __bss are both zerofill so
both have order INT_MAX, but common symbols are added to
inputSections before normal sections are collected.)
Makes lld/test/MachO/tlv.s and lld/test/MachO/tlv-dylib.s pass with
LLVM_ENABLE_EXPENSIVE_CHECKS=ON.
Differential Revision: https://reviews.llvm.org/D105054
Make sure we don't wrongly fold two sections that refer to
symbols with the same value if they are not both absolute /
non-absolute.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D104876
Literal sections can be deduplicated before running ICF. That makes it
easy to compare them during ICF: we can tell if two literals are
constant-equal by comparing their offsets in their OutputSection.
LLD-ELF takes a similar approach.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D104671
This test has always failed on 32 bit armv8 bots:
https://lab.llvm.org/buildbot/#/builders/178/builds/42
Due to the output order of some symbols changing.
I don't think this is an Arm specific issue so disabling
on 32 bit while it's investigated.
The patch reuses the common code to print memory operand addresses as
instruction comments. This helps to align the comments and enables using
target-specific comment markers when `evaluateMemoryOperandAddress()` is
implemented for them.
Differential Revision: https://reviews.llvm.org/D104861
libunwind uses unwind info to find the function address belonging
to the current instruction pointer. libunwind/src/CompactUnwinder.hpp's
step functions read functionStart for UNWIND_X86_64_MODE_STACK_IND
(and for nothing else), so these encodings need a dedicated entry
per function, so that the runtime can get the stacksize off the
`subq` instrunction in the function's prologue.
This matches ld64.
(CompactUnwinder.hpp from https://opensource.apple.com/source/libunwind/
also reads functionStart in a few more cases if `SUPPORT_OLD_BINARIES` is set,
but it defaults to 0, and ld64 seems to not worry about these additional
cases.)
Related upstream bug: https://crbug.com/1220175
Differential Revision: https://reviews.llvm.org/D104978
Modify the D13209 logic: for a script inside the sysroot, if an absolute path
does not exist, report an error instead of falling back to the path without the
sysroot prefix.
This matches GNU ld, which makes sense to me: we don't want to find an arbitrary
file in the host.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D104894
Commit 728cc0075e made comdat symbols
from LTO objects be treated as any regular comdat symbol. This works
great for symbols that actually are IMAGE_COMDAT_SELECT_ANY, but
if the symbols have a less trivial selection type that require comparing
either the section chunk size or contents, we can't check that before
actually doing the LTO compilation.
Therefore bring back one aspect of handling from before; that comdat
resolution with a leader from an LTO symbol is essentially skipped,
like it was before 728cc0075e.
Differential Revision: https://reviews.llvm.org/D104605
... even on targets preferring RELA. The section is only consumed by ld.lld
which can handle REL.
Follow-up to D104080 as I explained in the review. There are two advantages:
* The D104080 code only handles RELA, so arm/i386/mips32 etc may warn for -fprofile-use=/-fprofile-sample-use= usage.
* Decrease object file size for RELA targets
While here, change the relocation to relocate weights, instead of 0,1,2,3,..
I failed to catch the issue during review.
`icfEqClass` only makes sense on ConcatInputSections since (in contrast
to literal sections) they are deduplicated as an atomic unit.
Similarly, `hasPersonality` and `replacement` don't make sense on
literal sections.
This mirrors LLD-ELF, which stores `icfEqClass` only on non-mergeable
sections.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D104670
We previously did this only for x86_64, but it turns out that
arm64 needs this too -- see PR50791.
Ultimately this is a hack, and we should avoid over-aligning strings
that don't need it. I'm just having a hard time figuring out how ld64 is
determining the right alignment.
No new test for this since we were already testing this behavior for
x86_64, and extending it to arm64 seems too trivial.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104835
Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not. In non-visible behavior indices point to wrong symbols. The visible behavior indices point outside of Symbol table: "invalid symbol index".
This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols. The Frequency (Weight) will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.
With this approach post processing tools that handle relocations correctly work for this section also. Tools can add/remove symbols and as long as they handle relocation sections with this approach information stays correct.
Doing a quick experiment with clang-13.
The size went up from 107KB to 322KB, aggregate of all the input sections. Size of clang-13 binary is ~118MB. For users of -fprofile-use/-fprofile-sample-use the size of object files will go up slightly, it will not impact final binary size.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D104080
Add tests for pending TODOs, plus some global cleanups:
* No fold: func has personality/LSDA
* Fold: reference to absolute symbol with different name but identical value
* No fold: reloc references to absolute symbols with different values
* No fold: N_ALT_ENTRY symbols
Differential Revision: https://reviews.llvm.org/D104721
"""Bitcode symbols only exist before LTO runs, and only serve the purpose of
resolving visibility so LTO can better optimize. Running LTO creates ObjFiles
from BitcodeFiles, and those ObjFiles contain regular Defined symbols (with
isec set and all) that will replace the bitcode symbols. So things should
(hopefully) work as-is :)"""
-- https://reviews.llvm.org/rGdbbc8d8333f29cf4ad6f4793da1adf71bbfdac69#inline-6081
This particular linker invocation is only run to check that we accept
options, but we don't inspect the generated command line. As all other
commands in the file have their output piped to FileCheck, the lit test
doesn't print any other output; therefore silence this one for consistency
as well.
This is consistent with how clang prints its internal commands with
-### and -v.
When linking with -verbose, we get log messages from the actual
linking written to stderr. By printing the command to the same stream,
we make sure they appear in a sensible chronological order.
Differential Revision: https://reviews.llvm.org/D104527
getLineNumber() was counting the number of line feeds from the start of
the buffer to the current token. For large linker scripts this became a
performance bottleneck. For one 4MB linker script over 4 minutes was
spent in getLineNumber's StringRef::count.
Store the line number from the last token, and only count the additional
line feeds since the last token.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D104137
This reverts commit e1adf90826.
This appears to affect the way that C++ mangled symbols appear in the
import library when using a .def file that names a C++ free function
with no name decoration. I will follow up with a reduced test case
shortly.
Fixes PR50529. With this, lld-linked Chromium base_unittests passes on arm macs.
Surprisingly, no measurable impact on link time.
Differential Revision: https://reviews.llvm.org/D104681
The variable used to need the wider scope, but doesn't after the
reland. See LC_LINKER_OPTIONS-related discussion on
https://reviews.llvm.org/D104353 for background.
Real zerofill sections go after __thread_bss, since zerofill sections
must all be at the end of their segment and __thread_bss must be right
after __thread_data.
Works fine already, but wasn't tested as far as I can tell.
Also tweak comment about zerofill sections a bit.
No behavior change.
Differential Revision: https://reviews.llvm.org/D104609
We make it less than INT_MAX in order not to conflict with the ordering
of zerofill sections, which must always be placed at the end of their
segment.
This is the more structural fix for the issue addressed in {D104596}.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104607