We were fetching archive symbols too eagerly, bloating binary size as well as
just screwing up binaries that expected to look up certain symbols only at
runtime.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D115092
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
Mach-O LLD uses the buffer identifier of the memory buffer backing an object
file to generate stabs which are used by `dsymutil` to find the object file for
dSYM generation.
When using thinLTO, these buffers are provided by the cache which initially
saves them to disk as temporary files beginning with "Thin-" but renames them
to persistent files beginning with "llvmcache-" before the buffer is provided
to the cache user.
However, the buffer is created before the file is renamed and is given the temp
file's name as an identifier. This causes the generated stabs to point to
nonexistent files.
This change names the buffer with the eventual persistent filename. I think
this is safe because failing to rename the temp file is a fatal error.
Differential Revision: https://reviews.llvm.org/D115055
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
PLT usage needs the first 12 bytes of the .got section. We need to keep .got and
DT_GOT_PPC even if .got/_GLOBAL_OFFSET_TABLE_ are not referenced (large PIC code
may only reference .got2), which is the case in OpenBSD's ld.so, leading
to a misleading error, "unsupported insecure BSS PLT object".
Fix this by adding R_PPC32_PLTREL to the list of hasGotOffRel.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114982
During the llvm round table it was generally agreed that the newer macho
lld implementation is feature complete enough to replace the old
implementation entirely. This will reduce confusion for new users who
aren't aware of the history.
Differential Revision: https://reviews.llvm.org/D114842
This is very similar to https://reviews.llvm.org/D103557 but applies to
symbols which are undefined at link time rather than compile time.
We already have code that handles symbols which were defined at link
time but dead stripped by `--gc-sections` (See
`test/wasm/debug-removed-fn.ll`). In that case the symbols are not live
(!isLive()). However, we can also have live symbols (which are
references by the program) but which are undefined at link time and are
imported by the linker.
In the test case here the symbol `undef` is used but is not defined
in the program but is imported by the linker due to the
`--import-undefined` flag.
Fixes: https://github.com/emscripten-core/emscripten/issues/15528
Differential Revision: https://reviews.llvm.org/D114921
When a comdat symbol is defined in both bitcode and regular object
files, which are contained in the same archive, the linker could lose
the flag that the symbol is used in the regular object file and allow
LTO to internalize it, which led to "error: undefined symbol".
The issue was introduced in D79300.
Differential Revision: https://reviews.llvm.org/D114801
This reverts the PPC64PCRelLongBranchThunk part from D86706.
PPC64PCRelLongBranchThunk is the same as PPC64R12SetupStub.
Use `__gep_setup_` instead of `__long_branch_pcrel_` for the stub symbol name
as it more closely indicates the operation.
(Note: GNU ld uses `*.long_branch.*` and `*.plt_branch.*`).
Reviewed By: NeHuang, nemanjai
Differential Revision: https://reviews.llvm.org/D114656
There is a trend of having more optional options (usually security
hardening related) like -z cet-report=, -z bti-report=, -z force-bti.
If ld.lld 14.0.0 uses a warning, in 15/16/17/... timeframe when people
add new options to software, they can worry less about linker errors on ld.lld 14.0.0.
In some cases `-z foo` does essential work where a silent ignore can be
problematic, but the user has received a warning. From my observation, the
doing-essential-work `-z foo` is much fewer than the converse. In addition,
the user who cares can use `--fatal-warnings` (Note: GNU ld doesn't upgrade warnings to errors).
It is unclear whether we need something like `clang -Wunknown-warning-option`.
If we ever run into unfortunate transition like `-z start-stop-gc`, the
affected software (e.g. ldc is a compiler which passes linker options to the underlying ld)
can blindly add the `-z` option, without worrying it may cause a linker error to LLD 14.0.0.
Reviewed By: jrtc27, peter.smith
Differential Revision: https://reviews.llvm.org/D114748
Make one change: when the OutputSection is nullptr (due to /DISCARD/ or garbage
collected BssSection (replaceCommonSymbols)), discard the SyntheticSection as well.
I attempted to remove it 1 or 2 year ago but kept it just to have a good
diagnostic in case the output section is nullptr (should be impossible).
It is long enough that we haven't seen such a case.
Fix r285764: there is no guarantee that Out::first is placed before other
static data members of `struct Out`. After `bufferStart` was introduced, this
out-of-bounds write is destined in many compilers. It is likely benign, though.
And move `Out::elfHeader->size` assignment beside `Out::elfHeader->sectionIndex`
For -z separate-code and -z separate-loadable-segments:
When RW is present, the RX to RW transition is aligned with max-page-size.
When RW is absent, the RX to non-SHF_ALLOC transition should use max-page-size as well.
Currently, LLD does not support the complete set of ARM group relocations.
Given that I intend to start using these in the Linux kernel [0], let's add
support for these.
This implements the group processing as documented in the ELF psABI. Notably,
this means support is dropped for very far symbol references that also carry a
small component, where the immediate is rotated in such a way that only part of
it wraps to the other end of the 32-bit word. To me, it seems unlikely that
this is something anyone could be relying on, but of course I could be wrong.
[0] https://lore.kernel.org/r/20211122092816.2865873-8-ardb@kernel.org/
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D114172
This allows --power10-stubs= and --[no-]power10-stubs to override each other
(they are position dependent in GNU ld).
Also improve --help messages and the manpage.
Note: GNU ld's default "auto" mode uses heuristics to decide whether Power10
instructions are used. Arguably it is a design mistake of R_PPC64_REL24_NOTOC
(acked by the relevant folks on a libc-alpha discussion). We don't implement
"auto", so the default --power10-stubs is the same as "yes".
The canonical term is "extract" (GNU ld documentation, Solaris's `-z *extract`
options). Avoid inventing a term and match --why-extract. (ld64 prefers "load"
but the word is overloaded too much)
Mostly MFC, except for --help messages and the header row in
--print-archive-stats output.
BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to
SectionCommand to match `sectionCommands` and make it clear that the commands
are used in SECTIONS (except a special case for SymbolAssignment).
Also, improve naming of some BaseCommand variables (base -> cmd).
This partially reverts r315409: the description applies to LinkerScript, but not
to OutputSection.
The name "sectionCommands" is used in both LinkerScript::sectionCommands and
OutputSection::sectionCommands, which may lead to confusion.
"commands" in OutputSection has no ambiguity because there are no other types
of commands.
The attribute 'r' allows (or disallows for the negative case) read-only
sections, i.e. ones without the SHF_WRITE flag, to be assigned to the
memory region. Before the patch, lld could put a section in the wrong
region or fail with "error: no memory region specified for section".
Differential Revision: https://reviews.llvm.org/D113771
The current TLSDESC optimization code assumes:
```
leaq x@tlsdesc(%rip), %rax
call *x@tlscall(%rax) # adjacent
```
From https://gitlab.freedesktop.org/mesa/mesa/-/issues/5665 , it seems that the
two instructions may not be adjacent in GCC 10's output:
```
leaq x@tlsdesc(%rip), %rax
something else
call *x@tlscall(%rax)
```
This patch supports the case. While here, support non-RAX registers for
R_X86_64_GOTPC32_TLSDESC, in case the compiler generates inefficient:
```
leaq x@tlsdesc(%rip), %rcx # or %rdx, %rbx, %rdi, ...
movq %rcx, %rax
call *x@tlscall(%rax) # GNU ld/gold error for non-RAX
```
Differential Revision: https://reviews.llvm.org/D114416
The section symbols aren't of much practical use when looking at
a linked image. This shrinks one observed mingw style unstripped
binary by 14%.
IMAGE_SYM_CLASS_LABEL is in spirit the same as a temporary assembler
label that isn't emitted on the object file level at all.
Differential Revision: https://reviews.llvm.org/D113866
std::vector can have different sizes depending on the STL's debug level,
so account for its size separately. (You could argue that we should be
accounting for all the other members separately as well, but that would
be very unergonomic, and std::vector is the only one that's caused
problems so far.)
Follup-up to D107533, where we replaced local syms with non-local.
It doesn't make sense to replace local symbol with lazy.
Differential Revision: https://reviews.llvm.org/D110040
Fix a null pointer dereference when .got.plt is discarded.
This also adds a test for discarding `.plt`.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D114180
Follow-up to https://reviews.llvm.org/D112643. Even after that change, we were
still asserting if two separate functions that are eligible for ICF (same size,
same data, same number of relocs, same reloc types, ...) referred to
Undefineds. This fixes that oversight.
Differential Revision: https://reviews.llvm.org/D114195
When aligning the start address of an output section introduces a gap between the current dot pointer
and the new aligned address, we were already properly expanding the memory region, if available.
D74286 introduced a new behavior to also align the LMA address if an LMA region is specified.
However, this did not expand the corresponding LMA region.
Now, we also expand the LMA region if it is set.
This fixes PR52510.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114166
ld64 doesn't warn on builds using `-install_name` if it's a bundle. But, the
current warning is nice to have because `install_name` only works with dylib.
To prevent an overflow of warnings in build logs and have parity with ld64,
create a `--warn-dylib-install-name` and `--warn-no-dylib-install-name` flag
that enables this LLD specific warning.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D113534
In order to keep signal:noise high for the `__eh_frame` diff, I have teased-out the NFC changes and put them here.
Differential Revision: https://reviews.llvm.org/D114017
As discussed in https://reviews.llvm.org/D113809#3128636. It's a bit
unfortunate to move the asserts away from the structs whose sizes
they're checking, but it's a far better developer experience when one of
the asserts is violated, because you get a single error instead of every
single source file including the header erroring out.
The `r_address` field of `relocation_info` is only 4 bytes, so our
offset field (which is the `r_address` field adjusted for subsection
splitting) also only needs to be 4 bytes. This reduces the structure
size from 32 bytes to 24 bytes.
Combined with https://reviews.llvm.org/D113813, this is a minor perf
improvement for linking an internal app, tested on two machines:
```
smol-relocs baseline difference (95% CI)
sys_time 7.367 ± 0.138 7.543 ± 0.157 [ +0.9% .. +3.8%]
user_time 21.843 ± 0.351 21.861 ± 0.450 [ -1.3% .. +1.4%]
wall_time 20.301 ± 0.307 20.556 ± 0.324 [ +0.1% .. +2.4%]
samples 16 16
smol-relocs baseline difference (95% CI)
sys_time 2.923 ± 0.050 2.992 ± 0.018 [ +1.4% .. +3.4%]
user_time 10.345 ± 0.039 10.448 ± 0.023 [ +0.8% .. +1.2%]
wall_time 12.068 ± 0.071 12.229 ± 0.021 [ +1.0% .. +1.7%]
samples 15 12
```
More importantly though, this change by itself reduces our maximum
resident set size by 220 MB (2.75%, from 7.85 GB to 7.64 GB) on the
first machine. On the second machine, it reduces it by 125 MB (1.94%,
from 6.31 GB to 6.19 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113818
We can lay out Symbol more optimally to reduce its size from 56 bytes to
48 bytes by eliminating unnecessary padding, and we can lay out Defined
such that its bitfield members are placed in the tail padding of Symbol
(on ABIs which support this), to reduce it from 96 bytes to 80 bytes (8
bytes from the Symbol reduction, and 8 bytes from the tail padding
reuse).
This is perf-neutral for an internal app (results from two different
machines):
```
smol-syms baseline difference (95% CI)
sys_time 7.430 ± 0.202 7.440 ± 0.193 [ -2.6% .. +2.9%]
user_time 21.443 ± 0.513 21.206 ± 0.396 [ -3.3% .. +1.1%]
wall_time 20.453 ± 0.534 20.222 ± 0.488 [ -3.7% .. +1.5%]
samples 9 8
smol-syms baseline difference (95% CI)
sys_time 3.011 ± 0.050 3.040 ± 0.052 [ -0.4% .. +2.3%]
user_time 10.416 ± 0.075 10.496 ± 0.091 [ +0.1% .. +1.4%]
wall_time 12.229 ± 0.144 12.354 ± 0.192 [ -0.1% .. +2.1%]
samples 14 13
```
However, on the first machine, it reduces maximum resident set size by
65.9 MB (0.8%, from 7.92 GB to 7.85 GB). On the second machine, it
reduces it by 92 MB (1.4%, from 6.40 GB to 6.31 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113813
It was checking for 64-bit builds incorrectly. Unfortunately,
ConcatInputSection has grown a bit in the meantime, and I don't see any
obvious way to shrink it. Perhaps icfEqClass could use 32-bit hashes
instead of 64-bit ones, but xxHash64 is supposed to be much faster than
xxHash32 (https://github.com/Cyan4973/xxHash#benchmarks), so that sounds
like a loss. (Unrelatedly, we should really look at using XXH3 instead
of xxHash64 now.)
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113809
This is an NFC diff that prepares for pruning & relocating `__eh_frame`.
Along the way, I made the following changes to ...
* clarify usage of `section` vs. `subsection`
* remove `map` & `vec` from type names
* disambiguate class `Section` from template parameter `SectionHeader`.
Differential Revision: https://reviews.llvm.org/D113241
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with merged in these comments.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D113903
Non-allocatable sections are not part of the memory image of the
program, so there is no need to find memory regions for them either
matching properties or handling explicit assignments. The early test
and return help to simplify LinkerScript::findMemoryRegion() a bit.
Differential Revision: https://reviews.llvm.org/D113768
```
/Users/ksmiley/dev/llvm-project/lld/MachO/Symbols.cpp:43:27: warning: field 'external' will be initialized after field 'weakDefCanBeHidden' [-Wreorder-ctor]
weakDef(isWeakDef), external(isExternal),
^
1 warning generated.
```
Differential Revision: https://reviews.llvm.org/D113823
autohide symbols behaves similarly to private_extern symbols.
However, LD64 allows exporting autohide symbols. LLD currently does not.
This patch allows LLD to export them.
Differential Revision: https://reviews.llvm.org/D113167
(Split from D113167)
Benchmarking on one of our large apps which exports a few thousands symbols,
this showed an improvement of ~17%.
x ./LLD_no_parallel.txt
+ ./LLD_with_parallel.txt
N Min Max Median Avg Stddev
x 10 84.01 89.41 88.64 87.693 1.7424061
+ 10 71.9 74.29 72.63 72.753 0.77734663
Difference at 95.0% confidence
-14.94 +/- 1.26763
-17.0367% +/- 1.44553%
(Student's t, pooled s = 1.34912)
(wallclock)
Differential Revision: https://reviews.llvm.org/D113820
Similar to D113702, but for the LSDAs. Clang seems to emit all LSDA
relocs as section relocs, but ld -r can turn those relocs into symbol
ones.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113721
Dedup'ing unwind info is tricky because each CUE contains a different
function address, if ICF operated naively and compared the entire
contents of each CUE, entries with identical unwind info but belonging
to different functions would never be considered identical. To work
around this problem, we slice away the function address before
performing ICF. We rely on `relocateCompactUnwind()` to correctly handle
these truncated input sections.
Here are the numbers before and after D109944, D109945, and this diff
were applied, as tested on my 3.2 GHz 16-Core Intel Xeon W:
Without any optimizations:
base diff difference (95% CI)
sys_time 0.849 ± 0.015 0.896 ± 0.012 [ +4.8% .. +6.2%]
user_time 3.357 ± 0.030 3.512 ± 0.023 [ +4.3% .. +5.0%]
wall_time 3.944 ± 0.039 4.032 ± 0.031 [ +1.8% .. +2.6%]
samples 40 38
With `-dead_strip`:
base diff difference (95% CI)
sys_time 0.847 ± 0.010 0.896 ± 0.012 [ +5.2% .. +6.5%]
user_time 3.377 ± 0.014 3.532 ± 0.015 [ +4.4% .. +4.8%]
wall_time 3.962 ± 0.024 4.060 ± 0.030 [ +2.1% .. +2.8%]
samples 47 30
With `-dead_strip` and `--icf=all`:
base diff difference (95% CI)
sys_time 0.935 ± 0.013 0.957 ± 0.018 [ +1.5% .. +3.2%]
user_time 3.472 ± 0.022 6.531 ± 0.046 [ +87.6% .. +88.7%]
wall_time 4.080 ± 0.040 5.329 ± 0.060 [ +30.0% .. +31.2%]
samples 37 30
Unsurprisingly, ICF is now a lot slower, likely due to the much larger
number of input sections it needs to process. But the rest of the
linker only suffers a mild slowdown.
Note that the compact-unwind-bad-reloc.s test was expanded because we
now handle the relocation for CUE's function address in a separate code
path from the rest of the CUE relocations. The extended test covers both
code paths.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D109946
Clang seems to emit all functionAddress relocs as section relocs, but
`ld -r` can turn those relocs into symbol ones. It turns out that we
weren't handling that case correctly when the symbol was a weak def
whose definition did not prevail.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113702
Previously if you passed `-oso_prefix path/to/foo/` with a trailing
slash at the end, using `real_path` would remove that slash, but that
slash is necessary to make sure OSO prefix paths end up as valid
relative paths instead of starting with `/`.
Differential Revision: https://reviews.llvm.org/D113541
This brings back the original version of D81359.
I have found several use cases now.
* Unlike GNU ld, LLD's relocation processing is one pass. If we decide to
optimize(relax) R_X86_64_{,REX_}GOTPCRELX, we will suppress GOT generation and
cannot undo the decision later. Optimizing R_X86_64_REX_GOTPCRELX can usually
make it easy to hit `relocation R_X86_64_REX_GOTPCRELX out of range` because
the distance to GOT is usually shorter. Without --no-relax, the user has to
recompile with `-Wa,-mrelax-relocations=no`.
* The option would help during my investigationg of the root cause of https://git.kernel.org/linus/09e43968db40c33a73e9ddbfd937f46d5c334924
* There is need for relaxation for AArch64 & RISC-V. Implementing this for
x86-64 improves consistency with little target-specific cost (two-line
X86_64.cpp change).
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D113615
Clang seems to emit all functionAddress relocs as section relocs, but
`ld -r` can turn those relocs into symbol ones. It turns out that we
weren't handling that case correctly when the symbol was a weak def
whose definition did not prevail.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113702
This change implements support for R_ARM_THM_JUMP8 relocation in
addition to R_ARM_THM_JUMP11 which is already supported by LLD.
Differential Revision: https://reviews.llvm.org/D21225
An orphan section should be placed in the same memory region as its
anchor section if the latter specifies the memory region explicitly.
If there is no explicit assignment for the anchor section in the linker
script, its memory region is selected by matching attributes, and the
same should be done for the orphan section.
Before the patch, some scripts that were handled smoothly in GNU ld
caused an "error: no memory region specified for section" in lld.
Differential Revision: https://reviews.llvm.org/D112925
Previously, our unwind info finalization logic assumed that the LSDA
section referenced by `__compact_unwind` was already finalized before
`__TEXT,__unwind_info` itself. However, that assumption could be broken
by the use of `-rename_section` -- it could be (and is) used to move
`__gcc_except_tab` it into a different segment later in the file.
(__TEXT is always the first non-zerofill segment, so any rename
basically guarantees that the section will be ordered after
`__unwind_info`.)
To handle this case, we compare LSDA relocations instead of their final
values in `UnwindInfoSection::finalize()`, and we actually relocate
those LSDAs in `UnwindInfoSection::writeTo()`. In order to do this, we
need an easy way to track which Symbol a given CUE corresponds to. My
solution was to change our `cuPtrVector` into a vector of indices, with
each index used for both the symbols vector (`symbolsVec`) as well as
the CUE vector (`cuVector`).
This change seems perf neutral. Numbers for linking chromium_framework
on my 16 core Mac Pro:
base diff difference (95% CI)
sys_time 1.248 ± 0.025 1.245 ± 0.026 [ -1.3% .. +0.8%]
user_time 3.588 ± 0.045 3.587 ± 0.037 [ -0.6% .. +0.5%]
wall_time 4.605 ± 0.069 4.595 ± 0.069 [ -1.0% .. +0.5%]
samples 42 26
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113582
PR52408 reported an sh_info=0 instance. I have seen sh_info=0
independently before.
sh_info>=num_sections is probably very rare. Just use one diagnostic for
the two types of errors.
Delete invalid-relocations.test which is covered by invalid/bad-reloc-target.test
Differential Revision: https://reviews.llvm.org/D113466
PR/52372
Differential Revision: https://reviews.llvm.org/D112977
New changes:
- use llvm-otool instead of `otool` which doesn't in exist on non-OSX platforms
- add llvm-otool to the set of tools used by test so that the bot will use the <build_dir>/bin/llvm-otool instead of the unqualified `llvm-otool` (which may not exist)
- update tests since the latest (TOT) llvm-otool prints a space between two bytes and the old one doesn't.
The outdated documentation diverges a lot from the current state of
COFF/Mach-O/ELF/wasm ports and may just confuse users. It is better rewriting
some if useful.
Tested with `ninja docs-lld-html`
Reviewed By: #lld-macho, lhames, Jez Ng
Differential Revision: https://reviews.llvm.org/D113432
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.
Reviewed By: #libc, ldionne, mehdi_amini
Differential Revision: https://reviews.llvm.org/D113186
This removes the tablegen based parsing of LC_LINKER_OPTION since it can
only actually contain a very small number of potential arguments. In our
project with tablegen this took 5 seconds before.
This replaces https://reviews.llvm.org/D113075
Differential Revision: https://reviews.llvm.org/D113235
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D113080
This undocumented ld64 flag, based on the most recent ld64 source dump
from Xcode 12, only applies to i386. It seems like on all newer
architectures this behavior is the default.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113070
In one of our links lld was reading 760k files, but the unique number of
files was only 1500. This takes that link from 30 seconds to 8.
This seems like a heavy hammer, especially since some things don't need
to be cached, like the filelist arguments and the passed static
archives (the latter is already cached as a one off), but it seems ld64
does something similar here to short circuit these duplicate reads:
82e429e186/src/ld/InputFiles.cpp (L644-L665)
Of the types of files being read for our iOS app, the biggest problem
was constantly re-reading small tbd files:
```
% wc -l /tmp/read.txt
761414 /tmp/read.txt
% cat /tmp/read.txt | sort -u | wc -l
1503
% cat /tmp/read.txt | grep "\.a$" | wc -l
43721
% cat /tmp/read.txt | grep "\.tbd$" | wc -l
717656
```
We could likely hoist this logic up to not cache at this level, but it
would be a more invasive change to make sure all callers that needed it
cached the results.
I could see this being an issue with OOMs, and I'm not a linker expert so
maybe there's another way we should solve this problem? Feedback welcome!
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D113153
By default with ld64, architecture mismatches are just warnings, then
this flag can be passed to make these fail. This matches that behavior.
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D113082
D101513 means that we no longer need to specify `-pie` in most of our
test RUN commands. Let's clean up the unused flags so as not to confuse
future test writers.
Reviewed By: #lld-macho, oontvoo, MaskRay
Differential Revision: https://reviews.llvm.org/D113114
I'm not sure what the history is here but this test passes on macOS
today. It seems like we should unify these tests if they need to run
cross platform.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113085
On our large iOS project this took a link from 1 minute 45 seconds to 45
seconds. For reference ld64 does the same link in ~20 seconds.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113063
This reverts commit 5cbec88cbf.
Vitaly said that 2faac77f26 actually works.
Sanitizer's armv7-linux-androideabi24 configuration has other issues which haven't been identified yet, but that's unrelated to the empty symbol name issue.
Symbol's subclasses all have an additional bitfield of type uint8_t (RefState enum).
For the bitfields in the same block tomerge, they should be of the same type. (clang/gcc will work, but others like MSVC does not)
Differential Revision: https://reviews.llvm.org/D113040
This matches ld64, and it's conceivable that projects try to read
this information off stderr for that reason.
--version keeps writing to stdout.
Differential Revision: https://reviews.llvm.org/D113020
One fewer warning.
In practice, lld already "implements" it. (ie., it does not do dtrace-dof processing ever).
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D112934
LLD_IN_TEST determines how many times each port's `main` function is
run in each LLD process, and setting LLD_IN_TEST=2 (or higher) is useful
for checking if we're cleaning up and resetting global state correctly.
Add a test suite parameter to enable this easily. There's work in
progress to remove global state (e.g. D108850), but this seems useful in
the interim.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D112898
`fatal` should only be used for malformed inputs according to
ErrorHandler.h; `error` is more appropriate for missing arguments,
accompanied by a check to bail out early in case of the error. Some
tests need to be adjusted accordingly.
Makes `lld/test/MachO/arch.s` pass with `LLD_IN_TEST=2`.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D112879
We need to reset global state between runs, similar to the other ports.
There's some file-static state which needs to be reset as well and we
need to add some new helpers for that.
With this change, most LLD Mach-O tests pass with `LLD_IN_TEST=2` (which
runs the linker twice on each test). Some tests will be fixed by the
remainder of this stack, and the rest are fundamentally incompatible
with that mode (e.g. they intentionally throw fatal errors).
Fixes PR52070.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D112878
It's not used for anything yet, but we now accept `/pdbpagesize:4096`
(the default behavior) and we give arguably more useful diagnostics
for other values.
It's plumbed through to the MSF layer, so just uncommenting out
the bit in DriverUtils.cpp that rejects args other than 4096 is enough
to try other values.
Differential Revision: https://reviews.llvm.org/D112871
The "symbol 'foo' has no type" diagnostic tries to inform that copy
relocation/canonical PLT entry cannot be used, but the diagnostic is often
incorrect and confusing.
The hint does not pull its weight:
* adding -Wl,-z,notext often won't work (relocation types other than `symbolRel`, e.g. `R_AARCH64_LDST32_ABS_LO12_NC`)
* for pure (no assembly) C/C++ projects, the "-fPIC" hint is sufficient
When comparing relocations against two symbols, ICF's equalsConstant() did not
look at the value of the two symbols. With subsections_via_symbols, the value
is usually 0 but not always: In particular, it isn't 0 for constants in string
and literal sections. Since we ignored the value, comparing two constant string
symbols or two literal symbols always compared the 0th's element, so functions
in the same TU always compared as equal.
This can cause mislinks, and, with -dead_strip, crashes.
Fixes PR52349, see that bug for lots of details and examples of mislinks.
While here, make the existing assembly in icf-literals.s a bit more realistic
(use leaq instead of movq with strings, and use foo(%rip) instead of
foo@gotpcrel(%rip)). This has no interesting effect, it just maybe makes the
test look a bit less surprising.
Differential Revision: https://reviews.llvm.org/D112862
Previously relocations were only generated for PIC output, but
relocations for TLS GOT entries are always needed when shared
memory is enabled, not just in PIC mode.
This means that the `__wasm_apply_global_tls_relocs` is now
generated even for statically linked (non-PIC) output. Without
this the globals that hold the addresses of TLS symbols are
not set correctly.
Differential Revision: https://reviews.llvm.org/D112833
In the shared memory case we can always assume that TLS addresses
are relative to __tls_base. In the non-shared memory case TLS
variables are absolute, just like normal data addresses.
This simplifies the code in calcNewValue so that TLS relocations
no longer need special handling.
Differential Revision: https://reviews.llvm.org/D112831
In particular, they should not cause archives to be eagerly loaded. This
matches ld64's behavior.
Fixes PR52246.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D112756
Having to remember to call `canonical()` all over the place is
error-prone; let's do it in a centralized location instead. It also
appears to improve performance slightly.
base diff difference (95% CI)
sys_time 0.984 ± 0.009 0.983 ± 0.014 [ -0.8% .. +0.6%]
user_time 6.508 ± 0.035 6.475 ± 0.036 [ -0.8% .. -0.2%]
wall_time 5.321 ± 0.034 5.300 ± 0.033 [ -0.7% .. -0.1%]
samples 36 23
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D112687
Previously we were relying on the dynamic loader to take care of this
but it simple and correct for us to do it here instead.
Now we initialize bss segments as part of `__wasm_init_memory` at the
same time we initialize passive segments.
In addition we extent the us of `__wasm_init_memory` outside of shared
memory situations. Specifically it is now used to initialize bss
segments when the memory is imported.
Differential Revision: https://reviews.llvm.org/D112667
Many diagnostics use `getErrorPlace` or `getErrorLocation` to report a location.
In the presence of line table debug information, `getErrorPlace` uses a source
file location and ignores the object file location. However, the object file
location is sometimes more useful.
This patch changes "undefined symbol" and "out of range" diagnostics to report
both object/source file locations. Other diagnostics can use similar format if
needed.
The key idea is to let `InputSectionBase::getLocation` report the object file
location and use `getSrcMsg` for source file/line information. `getSrcMsg`
doesn't leverage `STT_FILE` information yet, but I think the temporary lack of
the functionality is ok.
For the ARM "branch and link relocation" diagnostic, I arbitrarily place the
source file location at the end of the line. The diagnostic is not very common
so its formatting doesn't need to be pretty.
Differential Revision: https://reviews.llvm.org/D112518
There are a couple internal builds that require the use of this flag.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D112594
ICF runs before relocation processing, but undefined symbol errors
are only emitted during relocation processing.
So just ignore Undefineds during ICF (instead of crashing) -- lld
will emit an error once ICF is done.
Fixes PR52330.
Differential Revision: https://reviews.llvm.org/D112643
Otherwise tools like codesign_allocate will choke. We were already
handling this correctly for the other DYLD_INFO sections.
Doing this correctly is a bit subtle: we don't know if export_size will
be zero until we have run `ExportSection::finalizeContents()`. However,
we must still add the ExportSection to the `__LINKEDIT` segment in order
that it gets sorted during `sortSectionsAndSegments()`.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D112589
WordLiteralSection dedupes literals by content.
WordLiteralInputSection::getOffset() used to read a literal at the passed-in
offset and look up this value in the deduping map to find the offset of the
deduped value.
But it's possible that (e.g.) a 16-byte literal's value is accessed 4 bytes in.
To get the offset at that address, we have to get the deduped value at offset 0
and then apply the offset 4 to the result.
(See also WordLiteralSection::finalizeContents() which fills in those maps.)
Only a problem on arm64 because in x86_64 the offset is part of the instruction
instead of a separate ARM64_RELOC_ADDEND relocation. (See bug for more details.)
Fixes PR51999.
Differential Revision: https://reviews.llvm.org/D112584
For `InputSection` `.foo`, its `InputBaseSection::{areRelocsRela,firstRelocation,numRelocation}` basically
encode the information of `.rel[a].foo`. However, one uint32_t (the relocation section index)
suffices. See the implementation of `relsOrRelas`.
This change decreases sizeof(InputSection) from 184 to 176 on 64-bit Linux.
The maximum resident set size linking a large application (1.2G output) decreases by 0.39%.
Differential Revision: https://reviews.llvm.org/D112513
Broken by a9353dbe51.
Now that the functions point to the compact unwind entries, instead of
the other way around, we need to perform the "invalid reference" check
in a different place.
This change was originally part of the stacked diff D109946, but should
have been included as part of D109945.
**Context:**
This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test
the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication.
**Summary**
Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.
The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.
Reviewed By: alexander-shaposhnikov, #lld-macho
Differential Revision: https://reviews.llvm.org/D111164
This diff does away with `addEntriesForFunctionsWithoutUnwindInfo()`,
because `addSymbol()` can now determine which functions need those
entries.
While overhauling UnwindInfoSection, I also parallelized the relocation
of the contents of the CUEs. This somewhat offsets the time regression
from creating one InputSection per CUE (which was done in D109944).
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D109945
Compact unwind entries (CUEs) contain pointers to their respective
function symbols. However, during the link process, it's far more useful
to have pointers from the function symbol to the CUE than vice versa.
This diff adds that pointer in the form of `Defined::compactUnwind`.
In particular, when doing dead-stripping, we want to mark CUEs live when
their function symbol is live; and when doing ICF, we want to dedup
sections iff the symbols in that section have identical CUEs. In both
cases, we want to be able to locate the symbols within a given section,
as well as locate the CUEs belonging to those symbols. So this diff also
adds `InputSection::symbols`.
The ultimate goal of this refactor is to have ICF support dedup'ing
functions with unwind info, but that will be handled in subsequent
diffs. This diff focuses on simplifying `-dead_strip` --
`findFunctionsWithUnwindInfo` is no longer necessary, and
`Defined::isLive()` is now a lot simpler. Moreover, UnwindInfoSection no
longer has to check for dead CUEs -- we simply avoid adding them in the
first place.
Additionally, we now support stripping of dead LSDAs, which follows
quite naturally since `markLive()` can now reach them via the CUEs.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D109944
We were previously always emitting the GOT into `__DATA_CONST`, even for
target platforms where it should end up in `__DATA`.
I stumbled onto this while trying to use the `class-dump` tool -- with
the wrong segment names, it fails to locate the ObjC runtime info and
therefore fails to dump any classes.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D112500
This is what ld64 does too, so we have parity here (though I think ld64
still removes dead code more effectively than we do...)
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D112485
The hack is irrelevant for two reasons:
* binutils 2.24 is quite old and cannot handle R_X86_64_REX_GOTPCRELX from 2016 onwards anyway
* `canMergeToProgbits` allows combining SHT_INIT_ARRAY/SHT_FINI_ARRAY into SHT_PROGBITS
For a function call (using the default `-fplt`), GCC `-mcmodel=large` generates an assembly modifier which
leads to an R_X86_64_PLTOFF64 relocation. In real world,
http://git.ageinghacker.net/jitter (used by GNU poke) uses `-mcmodel=large`.
R_X86_64_PLTOFF64's formula is (if preemptible) `L - GOT + A` or (if non-preemptible) `S - GOT + A`
where `GOT` is (confusingly) the address of `.got.plt`
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D112386
Taken from Chih-Mao Chen's D100835.
RelExpr has 64 bits now and needs the extension to support new members
(`R_PLT_GOTPLT` for `R_X86_64_PLTOFF64` support).
Note: RelExpr needs to have at least a member >=64 to prevent
-Wtautological-constant-out-of-range-compare for `if (expr >= 64)`.
Reviewed By: arichardson, peter.smith
Differential Revision: https://reviews.llvm.org/D112385
GNU ld doesn't support `--no-pic-executable`.
`-p` has been removed from likely the only use case (Linux kernel) for over 2.5 years: https://git.kernel.org/linus/091bb549f7722723b284f63ac665e2aedcf9dec9
`--no-add-needed` was the pre-binutils-2.23 spelling for `--no-copy-dt-needed-entries`.
The legacy alias is irrelevant in 2021.
While attempting to simplify it, I discovered a concerning discrepancy
between our handling of LC_LINKER_OPTION vs ld64's. In particular, ld64
does not appear to check for `-all_load` nor `-ObjC` when processing
those options. Thus, if/when we fix this behavior, no duplicate symbol
error will be expected regardless of the use-after-free. As such, I've
removed the test logic that tries to induce the duplicate symbol error.
We can just rely on ASAN to do the verification.
In order to make the test run on Windows, I've removed the symlink
logic. Both ld64 and LLD handle this un-symlinked framework just fine.
I also capitalized the framework name, since that's the typical
convention.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D112195
If segments are defined in a linker script, placing an orphan section
before the found closest-rank section can result in adding it in a
previous segment and changing flags of that segment. This happens if
the orphan section has a lower sort rank than the found section. To
avoid that, the patch forces orphan sections to be moved after the
found section if segments are explicitly defined.
Differential Revision: https://reviews.llvm.org/D111717
In Driver.cpp, addFramework used std::string instance to represent the path of a framework, which will be freed after the function returns. However, this string is stored in loadedArchive, which will be used later to compare with path of newly added frameworks. This caused https://bugs.llvm.org/show_bug.cgi?id=52133. A test is included in this commit to reproduce this bug.
Now resolveDylibPath returns a StringRef instance, and it uses StringSaver to save its data, then returns it to functions on the top. This ensures the resolved framework path is still valid after LC_LINKER_OPTION is parsed.
Reviewed By: int3, #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D111706
This change implements new DAG nodes TABLE_GET/TABLE_SET, and lowering
methods for load and stores of reference types from IR arrays. These
global LLVM IR arrays represent tables at the Wasm level.
Differential Revision: https://reviews.llvm.org/D111154
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
This change is derived from a test case we have locally but I could not
see an equivalent in LLD's testing.
Differential Revision: https://reviews.llvm.org/D111803
prepareSymbolRelocation() in Writer.cpp adds both symbols that need binding and
symbols relocated with a pointer relocation to the got.
Pointer relocations are emitted for non-movq GOTPCREL(%rip) loads. (movqs
become GOT_LOADs so that the linker knows they can be relaxed to leaqs, while
others, such as addq, become just GOT -- a pointer relocation -- since they
can't be relaxed in that way).
For example, this C file produces a private_extern GOT relocation when
compiled with -O2 with clang:
extern const char kString[];
const char* g(int a) { return kString + a; }
Linkers need to put pointer-relocated symbols into the GOT, but ld64 marks them
as LOCAL in the indirect symbol table. This matters, since `strip -x` looks at
the indirect symbol table when deciding what to strip.
The indirect symtab emitting code was assuming that only symbols that need
binding are in the GOT, but pointer relocations where there too. Hence, the
code needs to explicitly check if a symbol is a private extern.
Fixes https://crbug.com/1242638, which has some more information in comments 14
and 15. With this patch, the output of `nm -U` on Chromium Framework after
stripping now contains just two symbols when using lld, just like with ld64.
Differential Revision: https://reviews.llvm.org/D111852
This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.
1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
which points to the address of per-function LSDA info. It is more
convenient to use than `MCSymbol` because it can take additional
target flags.
2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
symbol relative to `__memory_base` and generate the `add` node. If
PIC is disabled, continue to use the absolute address.
3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
the backend, because it is hard to make it work with dynamic
linking's loading order. Instead, we make all tag symbols undefined
in the LLVM backend and import it from JS.
4. Add support for undefined tags to the linker.
Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D111388
I think D79300 has fixed the D51892 (`__i686.get_pc_thunk.bx`) issue, so
we can bring back rL330869.
D79300 says `would error undefined symbol instead of the more relevant discarded section`
but it doesn't reproduce now.
This avoids a quirk in `isUndefWeak()`.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D111365
I noticed that we had this case in our internal testsuite but couldn't find it in LLD's tests.
This adds that case.
Differential Revision: https://reviews.llvm.org/D110716
This field only exists if the directory exists on the machine running
the test. It likely exists for most Intel macOS users because of
homebrew, but doesn't exist on some of the CI machines. This
unfortunately makes this test a bit less strict.
Differential Revision: https://reviews.llvm.org/D111361
Some subprojects like compiler-rt define the `darwin` feature in their
lit config, but lld does not do that, so we need to use the global
system-darwin here instead. This test seems to have drifted from the
actual behavior so I also had to add `/usr/local/lib` here to make it
pass.
Differential Revision: https://reviews.llvm.org/D111268
This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
uint8_t Attribute;
uint32_t SigIndex;
};
```
Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.
In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.
I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.
Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.
This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.
Reviewed By: sbc100, tlively
Differential Revision: https://reviews.llvm.org/D111086
PT_LOAD segments in the program header must be sorted by their virtual
addresses, so they should be defined in a similar order as the
associated sections.
Differential Revision: https://reviews.llvm.org/D111068
This simplifies the code in a number of ways and avoids
having to track functions and their types separately.
Differential Revision: https://reviews.llvm.org/D111104
A number of the ICF tests were not updated to use --print-icf-sections
instead of --verbose and various '-NOT' checks were not updated to the
latest output format of --print-icf-sections. Because these are all
'negative' tests, these issues have gone unnoticed.
Differential Revision: https://reviews.llvm.org/D110353
Try to address Windows flakes from d87bdc272b
by adding "|| true" as suggested in D110276 so the whole test doesn't
fail when Windows thinks it can't remove the binary.
Instead, just make the later flag win, like usual.
Implement this by making -no_deduplicate an actual alias for --icf=none
at the Options.td level.
Differential Revision: https://reviews.llvm.org/D110672
In looking at the disk space used by a ninja check-all, I found that a
few of the largest files were copies of clang and lld made into temp
directories by a couple of tests. These tests were added in D53021 and
D74811. Clean up these copies after usage.
Differential Revision: https://reviews.llvm.org/D110276
The ARM backend was explicitly setting global binding on the personality
symbol. This was added without any comment in a7ec2dcefd, which
introduced EHABI support (back in 2011). None of the other backends do
anything equivalent, as far as I can tell.
This causes problems when attempting to wrap the personality symbol.
Wrapped symbols are marked as weak inside LTO to inhibit IPO (see
https://reviews.llvm.org/D33621). When we wrap the personality symbol,
it initially gets weak binding, and then the ARM backend attempts to
change the binding to global, which causes an error in MC because of
attempting to change the binding of a symbol from non-global to global
(the error was added in https://reviews.llvm.org/D90108).
Simply drop the ARM backend's explicit global binding setting to fix
this. This matches all the other backends, and a large internal
application successfully linked and ran with this change, so it
shouldn't cause any problems. Test via LLD, since wrapping is required
to exhibit the issue.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D110609
* Add a newline before `DYNAMIC RELOCATION RECORDS` (see D101796)
* Add the missing `OFFSET TYPE VALUE` line
* Align columns
Note: llvm-readobj/ELFDumper.cpp `loadDynamicTable` has sophisticated PT_DYNAMIC
code which is unavailable in llvm-objdump.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D110595
Most architectures use .got instead of .got.plt, so switching the default can
minimize customization.
This fixes an issue for SPARC V9 which uses .got .
AVR, AMDGPU, and MSP430 don't seem to use _GLOBAL_OFFSET_TABLE_.
Without such wrapping, linking lld fails with missing symbols because of
C++ symbol mangling with older versions of the MacOSX SDK, in which
xar.h doesn't have an extern "C" block itself.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D110224
(As I mentioned in https://reviews.llvm.org/D62609#1534158 ,
the condition for using bti c for executable can be loosened.)
In two cases the address of a PLT may escape:
* canonical PLT entry for a STT_FUNC
* non-preemptible STT_GNU_IFUNC which is converted to STT_FUNC
The first case can be detected with `needsPltAddr`.
The second case is not straightforward to detect because for the Relocations.cpp
created `directSym`, it's difficult to know whether the associated `sym` has
exercised the `!needsPlt(expr)` code path. Just use the conservative `isInIplt`
condition. A non-preemptible ifunc not referenced by non-GOT-generating
non-PLT-generating relocations will have an unneeded `bti c`, but the cost is acceptable.
The second case fixes a bug as well: a -shared link may have non-preemptible ifunc.
Before the patch we did not emit `bti c` and could be wrong if the PLT address escaped.
GNU ld doesn't handle the case: `relocation R_AARCH64_ADR_PREL_PG_HI21 against STT_GNU_IFUNC symbol 'ifunc2' isn't handled by elf64_aarch64_final_link_relocate` (https://sourceware.org/bugzilla/show_bug.cgi?id=28370)
For -shared, if BTI is enabled but PAC is disabled, the PLT entry size increases
from 16 to 24 because we have to select the PLT scheme early, but the cost is
acceptable.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D110217
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110209
Restore the checking of addresses in ICF test which was testing the
behaviour of ICF with regards to different alignments of otherwise
identical sections. Also make the test more robust to layout changes.
Differential Revision: https://reviews.llvm.org/D110090
Similar to D69607 but for archive member extraction unrelated to GC. This patch adds --why-extract=.
Prior art:
GNU ld -M prints
```
Archive member included to satisfy reference by file (symbol)
a.a(a.o) main.o (a)
b.a(b.o) (b())
```
-M is mainly for input section/symbol assignment <-> output section mapping
(often huge output) and the information may appear ad-hoc.
Apple ld64
```
__Z1bv forced load of b.a(b.o)
_a forced load of a.a(a.o)
```
It doesn't say the reference file.
Arm's proprietary linker
```
Selecting member vsnprintf.o(c_wfu.l) to define vsnprintf.
...
Loading member vsnprintf.o from c_wfu.l.
definition: vsnprintf
reference : _printf_a
```
---
--why-extract= gives the user the full data (which is much shorter than GNU ld
-Map). It is easy to track a chain of references to one archive member with a
one-liner, e.g.
```
% ld.lld main.o a_b.a b_c.a c.a -o /dev/null --why-extract=- | tee stdout
reference extracted symbol
main.o a_b.a(a_b.o) a
a_b.a(a_b.o) b_c.a(b_c.o) b()
b_c.a(b_c.o) c.a(c.o) c()
% ruby -ane 'BEGIN{p={}}; p[$F[1]]=[$F[0],$F[2]] if $.>1; END{x="c.a(c.o)"; while y=p[x]; puts "#{y[0]} extracts #{x} to resolve #{y[1]}"; x=y[0] end}' stdout
b_c.a(b_c.o) extracts c.a(c.o) to resolve c()
a_b.a(a_b.o) extracts b_c.a(b_c.o) to resolve b()
main.o extracts a_b.a(a_b.o) to resolve a
```
Archive member extraction happens before --gc-sections, so this may not be a live path
under --gc-sections, but I think it is a good approximation in practice.
* Specifying a file avoids output interleaving with --verbose.
* Required `=` prevents accidental overwrite of an input if the user forgets `=`. (Most of compiler drivers' long options accept `=` but not ` `)
Differential Revision: https://reviews.llvm.org/D109572
Original commit description:
[LLD] Remove global state in lld/COFF
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.
See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.
I also haven't moved the `driver` or `config` variables yet.
Differential Revision: https://reviews.llvm.org/D109634
This reverts commit a2fd05ada9.
Original commits were b4fa71eed3
and e03c7e367a.
... instead of constructing a new one each time. This allows us
to take advantage of {D105305}.
I didn't see a substantial difference when linking chromium_framework,
but this paves the way for reusing similar logic for splitting compact
unwind entries into sections. There are a lot more of those, so the
performance impact is significant.
Differential Revision: https://reviews.llvm.org/D109895
Sometimes people intentionally re-define a dylib personlity symbol as a local defined symbol as a workaround to a ld -r bug.
As a result, we could see "too many personalities" to encode. This patch tries to handle this case by ignoring the local symbols entirely.
Differential Revision: https://reviews.llvm.org/D107533
Add a test to ensure that MachO files including
a LC_CODE_SIGNATURE load command produced by lld
are signed correctly.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D109840
Move the functionality in lld that handles writing of the LC_CODE_SIGNATURE load command and associated data section to a central reusable location.
This change is in preparation for another change that modifies llvm-objcopy to reproduce the LC_CODE_SIGNATURE load command and corresponding
data section to maintain the validity of signed macho object files passed through llvm-objcopy.
Reviewed By: #lld-macho, int3, oontvoo
Differential Revision: https://reviews.llvm.org/D109803
This test checks that timers are working and printing as expected.
I also seem to have changed the order of the timers in my globals refactoring
patch, so I fixed it here.
Differential Revision: https://reviews.llvm.org/D109904
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.
See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.
I also haven't moved the `driver` or `config` variables yet.
Differential Revision: https://reviews.llvm.org/D109634
This way, we do not need to set LLVM_CMAKE_PATH to LLVM_CMAKE_DIR when (NOT LLVM_CONFIG_FOUND)
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D107717
Rather than depending on the hex dump from obj2yaml. Now the test shows the
expected function body in a human readable format.
Differential Revision: https://reviews.llvm.org/D109730
This matters for example for the iPhoneSimulator14.0.sdk, which has
a System/Library/Frameworks/UIKit.framework/UIKit that has
LC_BUILD_VERSION with minos of 14.0, so linking against that file
will produce warnings like:
.../iPhoneSimulator14.0.sdk/System/Library/Frameworks/UIKit.framework/UIKit
has version 14.0.0, which is newer than target minimum of 12.0.0
when targeting x86_64-apple-ios12.0-simulator. That doens't happen when
linking against UIKit.tbd instead, obviously.
Linking with RC_TRACE_DYLIB_SEARCHING=1 shows that ld64 also searches
the tbd file first, and we already get that right for non-framework
dylibs.
Fixes crbug.com/1249456.
Differential Revision: https://reviews.llvm.org/D109768
We previously had a limitation that TLS variables could not
be exported (and therefore could also not be imported). This
change removed that limitation.
Differential Revision: https://reviews.llvm.org/D108877
For multithreaded modules (i.e. modules with a shared memory), lld injects a
synthetic Wasm start function that is automatically called during instantiation
to initialize memory from passive data segments. Even though the module will be
instantiated separately on each thread, memory initialization should happen only
once. Furthermore, memory initialization should be finished by the time each
thread finishes instantiation. Since multiple threads may be instantiating their
modules at the same time, the synthetic function must synchronize them.
The current synchronization tries to atomically increment a flag from 0 to 1 in
memory then enters one of two cases. First, if the increment was successful, the
current thread is responsible for initializing memory. It does so, increments
the flag to 2 to signify that memory has been initialized, then notifies all
threads waiting on the flag. Otherwise, the thread atomically waits on the flag
with an expected value of 1 until memory has been initialized. Either the
initializer thread finishes initializing memory (i.e. sets the flag to 2) first
and the waiter threads do not end up blocking, or the waiter threads succesfully
start waiting before memory is initialized so they will be woken by the
initializer thread once it has finished.
One complication with this scheme is that there are various contexts on the Web,
most notably on the main browser thread, that cannot successfully execute a
wait. Executing a wait in these contexts causes a trap, and in this case would
cause instantiation to fail. The embedder must therefore ensure that these
contexts win the race and become responsible for initializing memory, since that
is the only code path that does not execute a wait.
Unfortunately, since only one thread can win the race and initialize memory,
this scheme makes it impossible to have multiple threads in contexts that cannot
wait. For example, it is not currently possible to instantiate the module on
both the main browser thread as well as in an AudioWorklet. To loosen this
restriction, this commit inserts an extra check so that the wait will not be
executed at all when memory has already been initialized, i.e. when the flag
value is 2. After this change, the module can be instantiated on threads in
non-waiting contexts as long as the embedder can guarantee either that the
thread will win the race and initialize memory (as before) or that memory has
already been initialized when instantiation begins. Threads in contexts that can
wait can continue racing to initialize memory.
Fixes (or at least improves) PR51702.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D109722
Remove some unnecessary logging from wasm-ld when running under
`--verbose`. Unlike `-debug` this logging is available in release
builds. This change makes it little more minimal/readable.
Also, avoid compiling the `debugWrite` function in releaase builds
where it does nothing. This should remove a lot debug strings from
the binary, and avoid having to construct unused debug strings at
runtime.
Differential Revision: https://reviews.llvm.org/D109583
In the case that TLS is used in the single-threaded program, and
therefore effectively lowered away, we still optionally create a
`__tls_base` symbols, but the code for setting it was assuming it was
always created.
Differential Revision: https://reviews.llvm.org/D109518
llvm::errs() is unbuffered. On a POSIX platform, composing a diagnostic
string may invoke the ::write syscall multiple times, which can be slow.
Buffer writes to a temporary SmallString when composing a single diagnostic to
reduce the number of ::write syscalls to one (also easier to read under
strace/truss).
For an invocation of ld.lld with 62000+ lines of
`ld.lld: warning: symbol ordering file: no such symbol: ` warnings (D87121),
the buffering decreases the write time from 1s to 0.4s (for /dev/tty) and
from 0.4s to 0.1s (for a tmpfs file). This can speed up
`relocation R_X86_64_PC32 out of range` diagnostic printing as well
with `--noinhibit-exec --no-fatal-warnings`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D87272
Failing to do so results in `std::bad_function_call` being
thrown when a pass tries to emit a diagnostic.
I've copied the relevant test over from LLD-ELF's test suite.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D109274
We end up calling resolveBranchVA(), which asserts for Undefineds.
As fix, just return early in Writer::run() if there are any diagnostics
after processing relocations (which is where undefined symbol errors are
emitted). This matches what the ELF port does.
Differential Revision: https://reviews.llvm.org/D109079
Fixes PR51578 in practice.
Currently there's only enough room for a single thunk, which for real-life code
isn't enough. The error case only happens when there are many branch statements
very close to each other (0 or 1 instructions apart), with the function at the
finalization barrier small.
There's a FIXME on what to do if we hit this case, but that suggestion sounds
complicated to me (see end of PR51578 comment 5 for why).
Instead, just leave more room for thunks. Chromium's unit_tests links fine with
room for 3 thunks. Leave room for 100, which should fix this for most cases in
practice.
There's little cost for leaving lots of room: This slop value only determines
when we finalize sections, and we insert thunks for forward jumps into
unfinalized sections. So leaving room means we'll need a few more thunks, but
the thunk jump range is 128 MiB while a single thunk is just 12 bytes.
For Chromium's unit_tests:
With a slop of 3: thunk calls = 355418, thunks = 10903
With a slop of 100: thunk calls = 355426, thunks = 10904
Chances are 100 is enough for all use cases we'll hit in practice, but even
bumping it to 1000 would probably be fine.
Differential Revision: https://reviews.llvm.org/D108930
- Move a few variables closer to their uses, remove some completely
(no behavior change)
- Add some comments
- Make maxPotentialThunks include calls to stubs. It's possible that
an earlier call to a stub late in the stub table will need a thunk,
and that inserted thunk could push a stub earlier in the stub table
out of range. This is unlikely to happen, but usually there are
way fewer stub calls than non-stub calls, so if we're doing a
conservative approximation here we might as well do it correctly.
(For chromium's unit_tests target, 134421/242639 stub calls are
direct calls without this change, compared to 134408/242639 with
this change)
No real, meaningful behavior difference.
Differential Revision: https://reviews.llvm.org/D108924
- Don't subtract thunkSize from branchRange. Most places care about
the actual maximal branch range. Subtract thunkSize in the one place
that wants to leave room for a thunk.
- Set it to 0x800_0000 instead of 0xFF_FFFF
- Subtract 4 for the positive branch direction since it's a
two's complement 24bit number sign-extended mutiplied by 4,
so its range is -0x800_0000..+0x7FF_FFFC
- Make boundary checks include the boundary values
This doesn't make a huge difference in practice. It's preparation
for a "real" fix for PR51578 -- but it also lets the repro in comment 0
in that bug place one more thunk before hitting the TODO.
Differential Revision: https://reviews.llvm.org/D108897
The assert is harmless and thinks worked fine in builds with asserts enabled,
but it's still nice to fix the assert.
Differential Revision: https://reviews.llvm.org/D108853
We currently complain "could not open /LTCG: no such file or directory",
which isn't very useful. We could emit a warning when we see this flag, but
just ignoring it seems fine.
Final missing part of PR38799.
Differential Revision: https://reviews.llvm.org/D108799
This is what ld64 does. Deviating in behavior here can result
in some subtle duplicate symbol errors, as detailed in the objc.s test.
Differential Revision: https://reviews.llvm.org/D108781
The previous logic was duplicated between symbol-initiated
archive loads versus flag-initiated loads (i.e. `-force_load` and
`-ObjC`). This resulted in code duplication as well as redundant work --
we would create Archive instances twice whenever we had one of those
flags; once in `getArchiveMembers` and again when we constructed the
ArchiveFile.
This was motivated by an upcoming diff where we load archive members
containing ObjC-related symbols before loading those containing
ObjC-related sections, as well as before performing symbol resolution.
Without this refactor, it would be difficult to do that while avoiding
loading the same archive member twice.
Differential Revision: https://reviews.llvm.org/D108780
This was missed by {D107035}. This fix addresses the following warning:
loop variable 'personality' has type 'const uint32_t &' (aka 'const unsigned int &') but is initialized with type 'const unsigned long long' resulting in a copy [-Wrange-loop-analysis]
In addition to fixing the size, I also removed the const reference,
since there's no performance benefit to avoiding copies of integer-sized
values.
If multiple /manifestdependency: flags are passed, they are
naively deduped, but after that each of them should have an
effect, instead of just the last one.
Also, /manifestdependency: flags are allowed in .drectve sections
(from `#pragma comment(linker, ...`). To make the interaction between
/manifestdependency: flags enabling manifest by default but
/manifest:no overriding this work, add an explict ManifestKind::Default
state to represent no explicit /manifest flag being passed.
To make /manifestdependency: flags from input file .drectve sections
work with /manifest:embed, delay embedded manifest emission until
after input files have been read.
Differential Revision: https://reviews.llvm.org/D108628
This CL is small, but the description can be a little long because I'm
trying to sum up the status quo for Emscripten/Wasm EH/SjLj options.
First, this CL adds an option for Wasm SjLj (`-wasm-enable-sjlj`), which
handles SjLj using Wasm EH. The implementation for this will be added as
a followup CL, but this adds the option first to do error checking.
This also adds an option for Wasm EH (`-wasm-enable-eh`), which has been
already implemented. Before we used `-exception-model=wasm` as the same
meaning as enabling Wasm EH, but after we add Wasm SjLj, it will be
possible to use Wasm EH instructions for Wasm SjLj while not enabling
EH, so going forward, to use Wasm EH, `opt` and `llc` will need this
option. This only affects `opt` and `llc` command lines and does not
affect Emscripten user interface.
Now we have two modes of EH (Emscripten/Wasm) and also two modes of SjLj
(also Emscripten/Wasm). The options corresponding to each of are:
- Emscripten EH: `-enable-emscripten-cxx-exceptions`
- Emscripten SjLj: `-enable-emscripten-sjlj`
- Wasm EH: `-wasm-enable-eh -exception-model=wasm`
`-mattr=+exception-handling`
- Wasm SjLj: `-wasm-enable-sjlj -exception-model=wasm`
`-mattr=+exception-handling`
The reason Wasm EH/SjLj's options are a little complicated are
`-exception-model` and `-mattr` are common LLVM options ane not under
our control. (`-mattr` can be omitted if it is embedded within the
bitcode file.)
And we have the following rules of the option composition:
- Emscripten EH and Wasm EH cannot be turned on at the same itme
- Emscripten SjLj and Wasm SjLj cannot be turned on at the same time
- Wasm SjLj should be used with Wasm EH
Which means we now allow these combinations:
- Emscripten EH + Emscripten SjLj: the current default in `emcc`
- Wasm EH + Emscripten SjLj:
This is allowed, but only as an interim step in which we are testing
Wasm EH but not yet have a working implementation of Wasm SjLj. This
will error out (D107687) in compile time if `setjmp` is called in a
function in which Wasm exception is used.
- Wasm EH + Wasm SjLj:
This will be the default mode later when using Wasm EH. Currently Wasm
SjLj implementation doesn't exist, so it doesn't work.
- Emscripten EH + Wasm SjLj will not work.
This CL moves these error checking routines to
`WebAssemblyPassConfig::addIRPasses`. Not sure if this is an ideal place
to do this, but I couldn't find elsewhere. Currently some checking is
done within LowerEmscriptenEHSjLj, but these checks only run if
LowerEmscriptenEHSjLj runs so it may not run when Wasm EH is used. This
moves that to `addIRPasses` and adds some more checks.
Currently LowerEmscriptenEHSjLj pass is responsible for Emscripten EH
and Emscripten SjLj. Wasm EH transformations are done in multiple
places, including WasmEHPrepare, LateEHPrepare, and CFGStackify. But in
the followup CL, LowerEmscriptenEHSjLj pass will be also responsible for
a part of Wasm SjLj transformation, because WasmSjLj will also be using
several Emscripten library functions, and we will be sharing more than
half of the transformation to do that between Emscripten SjLj and Wasm
SjLj.
Currently we have `-enable-emscripten-cxx-exceptions` and
`-enable-emscripten-sjlj` but these only work for `llc`, because for
`llc` we feed these options to the pass but when we run the pass using
`opt` the pass will be created with no options and the default options
will be used, which turns both Emscripten EH and Emscripten SjLj on.
Now we have one more SjLj option to care for, LowerEmscriptenEHSjLj pass
needs a finer way to control these options. This CL removes those
default parameters and make LowerEmscriptenEHSjLj pass read directly
from command line options specified. So if we only run
`opt -wasm-lower-em-ehsjlj`, currently both Emscripten EH and Emscripten
SjLj will run, but with this CL, none will run unless we additionally
pass `-enable-emscripten-cxx-exceptions` or `-enable-emscripten-sjlj`,
or both. This does not affect users; this only affects our `opt` tests
because `emcc` will not call either `opt` or `llc`. As a result of this,
our existing Emscripten EH/SjLj tests gained one or both of those
options in their `RUN` lines.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D107685
In the case of weakly defined symbols in shared libraries we now
generate both an import and an export. The dynamic linker can then
choose how a winner from among all the shared libraries that define a
given symbol.
Previously any direct usage of a weakly defined symbol would use the
DSO-local definition (For example, even through there would be single
address for a weakly defined function, each DSO could end up directly
calling its local version).
Fixes: https://github.com/emscripten-core/emscripten/issues/13773
Differential Revision: https://reviews.llvm.org/D108413
In PIC mode we import function address via `GOT.mem` imports but for
direct function calls we still import the first class function.
However, if the function is never directly called we can avoid the first
class import completely.
Differential Revision: https://reviews.llvm.org/D108345
The convention is not to check the prefix before `error: `.
This gives flexibility if we need to rename ld64.lld to something else,
(e.g. a while ago we used ld64.lld.darwinnew).
Address post follow up comment in D108016. Avoid creating isec for
LLVM segments since we are skipping over it.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D108167
There was an instance of a third-party archive containing multiple
_llvm symbols from different files that clashed with each other
producing duplicate symbols. Symbols under the LLVM segment
don't seem to be producing any meaningful value, so just ignore them.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D108016
In e72403f96d, we added the flag
"--no-dynamicbase" for disabling the dynamicbase flag which we set
by default. At the time, ld.bfd didn't have any corresponding
option (as ld.bfd defaulted to not setting the flag). Almost at
the same time, corresponding options were added to ld.bfd for
disabling it (while it was being enabled by default), with a
different name, "--disable-dynamicbase".
Thus add the "--disable-dynamicbase" option. Make this default
one advertised in the help listing, but keep the "--no-dynamicbase"
form as an alias. Also improve checking for the last option set
if there are multiple ones on the same command line.
Also add corresponding disable options for a lot of other flags
that we set by default, also added in ld.bfd in the same commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=514b4e191d5f46de8e142fe216e677a35fa9c4bb
Differential Revision: https://reviews.llvm.org/D107930
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX)
due to source changes (e.g. `#if` code runs for profile generation but not for profile use)
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.
Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.
Differential Revision: https://reviews.llvm.org/D104431
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX).
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.
Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D104431
Now that we have https://reviews.llvm.org/D105539 we can
use objdump -d to actually check for instruction sequences
rather than binary blobs.
This is just an example of how to do that we should followup
with a wider ranging conversion of existing tests.
Differential Revision: https://reviews.llvm.org/D106897
Clang diagnostics refer to identifier names in quotes.
This patch makes inline remarks conform to the convention.
New behavior:
```
% clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c
a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline]
int bar(int a) { return foo(a); }
^
```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D107791
This adds thin archives to the map file test.
I noticed that we had this test-case in our downstream
testsuite but it wasn't in the upstream testing.
Differential revision: https://reviews.llvm.org/D107555
This patch enables compressed input sections on big-endian targets by
checking the target endianness and selecting an appropriate `Chdr`
structure.
Fixes PR51369
Differential Revision: https://reviews.llvm.org/D107635
See: http://45.33.8.238/macm1/15677/step_10.txt
This is a test that has `REQUIRES: x86` which means it never ran
before; I don't have a MachO environment but based on the FileCheck
output it looks like it should be sufficient to remove one CHECK line.
Copy relocation on a non-default version symbol is unsupported and can crash at
runtime. Fortunately there is a one-line fix which works for most cases:
ensure `getSymbolsAt` unconditionally returns `ss`.
If two non-default version symbols are defined at the same place and both
are copy relocated, our implementation will copy relocated them into different
addresses. The pointer inequality is very unlikely an issue. In GNU ld, copy
relocating version aliases seems to create more pointer inequality problems than
us.
(
In glibc, sys_errlist@GLIBC_2.2.5 sys_errlist@GLIBC_2.3 sys_errlist@GLIBC_2.4
are defined at the same place, but it is unlikely they are all copy relocated in
one executable. Even if so, the variables are read-only and pointer inequality
should not be a problem.
)
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D107535
Currently version script patterns are ignored for .symver produced
non-default version (single @) symbols. This makes such symbols
not localizable by `local:`, e.g.
```
.symver foo3_v1,foo3@v1
.globl foo_v1
foo3_v1:
ld.lld --version-script=a.ver -shared a.o
```
This patch adds the support:
* Move `config->versionDefinitions[VER_NDX_LOCAL].patterns` to `config->versionDefinitions[versionId].localPatterns`
* Rename `config->versionDefinitions[versionId].patterns` to `config->versionDefinitions[versionId].nonLocalPatterns`
* Allow `findAllByVersion` to find non-default version symbols when `includeNonDefault` is true. (Note: `symtab` keys do not have `@@`)
* Make each pattern check both the unversioned `pat.name` and the versioned `${pat.name}@${v.name}`
* `localPatterns` can localize `${pat.name}@${v.name}`. `nonLocalPatterns` can prevent localization by assigning `verdefIndex` (before `parseSymbolVersion`).
---
If a user notices new `undefined symbol` errors with a version script containing
`local: *;`, the issue is likely due to a missing `global:` pattern.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D107234
Now that D95204 switched default to new Darwin backend, rename some CMake
targets to match.
Reviewed By: #lld-macho, smeenai, int3
Differential Revision: https://reviews.llvm.org/D107516
Remnant after D72803.
Distributions who want to customize the string can customize
LLD_VERSION_STRING instead.
Reviewed By: #lld-macho, mstorsjo, thakis
Differential Revision: https://reviews.llvm.org/D107416
Due to an assembler design flaw (IMO), `.symver foo,foo@v1` produces two symbols `foo` and `foo@v1` if `foo` is defined.
* `v1 {};` produces both `foo` and `foo@v1`, but GNU ld only produces `foo@v1`
* `v1 { foo; };` produces both `foo@@v1` and `foo@v1`, but GNU ld only produces `foo@v1`
* `v2 { foo; };` produces both `foo@@v2` and `foo@v1`, matching GNU ld. (Tested by symver.s)
This patch implements the GNU ld behavior by reusing the symbol redirection mechanism
in D92259. The new test symver-non-default.s checks the first two cases.
Without the patch, the second case will produce `foo@v1` and `foo@@v1` which
looks weird and makes foo unnecessarily default versioned.
Note: `.symver foo,foo@v1,remove` exists but the unfortunate `foo` will not go
away anytime soon.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D107235