This reverts commit 3c21166a94.
The build is broken (clang-8 host compiler):
lld/MachO/DriverUtils.cpp:271:8: error: use of overloaded operator '<<' is ambiguous (with operand types 'llvm::raw_fd_ostream' and 'lld::macho::DependencyTracker::DepOpCode')
os << opcode;
~~ ^ ~~~~~~
This reverts commit 9670d2e4af.
Second attemp to reland D98559. New changes:
- inline functions removed from cpp file.
- updated tests to use CHECK-DAG instead of CHECK-NEXT
- fixed ambiguous "<<" operator by switching `char` to uint8_t
The only known reason why ICF should not merge otherwise identical
sections with differing associated sections has to do with exception
handling tables. It's not clear what ICF should do when there are other
kinds of associated sections. In every other case when this has come up,
debug info and CF guard metadata, we have opted to make ICF ignore the
associated sections.
For comparison, ELF doesn't do anything for comdat groups. Instead,
.eh_frame is parsed to figure out if a section has an LSDA, and if so,
ICF is disabled.
Another issue is that the order of associated sections is not defined.
We have had issues in the past (crbug.com/1144476) where changing the
order of the .xdata/.pdata sections in the object file lead to large ICF
slowdowns.
To address these issues, I decided it would be best to explicitly
consider only .pdata and .xdata sections during ICF. This makes it easy
to ignore the object file order, and I think it makes the intention of
the code clearer.
I've also made the children() accessor return an empty list for
associated sections. This mostly only affects ICF and GC. This was the
behavior before I made this a linked list, so the behavior change should
be good. This had positive effects on chrome.dll: more .xdata sections
were merged that previously could not be merged because they were
associated with distinct .pdata sections.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D98993
This reverts commit 2554b95db5.
Relanding [lld-macho] Implement -dependency_info (D98559) with changes:
- inline functions removed from cpp file.
- updated tests to not check libSystem.tbd with other input files (because of possible indeterministic ordering)
There is a bug when initial exec is relaxed to local exec.
In the following situation:
InitExec.c
```
extern __thread unsigned TGlobal;
unsigned getConst(unsigned*);
unsigned addVal(unsigned, unsigned*);
unsigned GetAddrT() {
return addVal(getConst(&TGlobal), &TGlobal);
}
```
Def.c
```
__thread unsigned TGlobal;
unsigned getConst(unsigned* A) {
return *A + 3;
}
unsigned addVal(unsigned A, unsigned* B) {
return A + *B;
}
```
The problem is in InitExec.c but Def.c is required if you want to link the example and see the problem.
To compile everything:
```
clang -O3 -mcpu=pwr10 -c InitExec.c
clang -O3 -mcpu=pwr10 -c Def.c
ld.lld InitExec.o Def.o -o IeToLe
```
If you objdump the problem object file:
```
$ llvm-objdump -dr --mcpu=pwr10 InitExec.o
```
you will get the following assembly:
```
0000000000000000 <GetAddrT>:
0: a6 02 08 7c mflr 0
4: f0 ff c1 fb std 30, -16(1)
8: 10 00 01 f8 std 0, 16(1)
c: d1 ff 21 f8 stdu 1, -48(1)
10: 00 00 10 04 00 00 60 e4 pld 3, 0(0), 1
0000000000000010: R_PPC64_GOT_TPREL_PCREL34 TGlobal
18: 14 6a c3 7f add 30, 3, 13
0000000000000019: R_PPC64_TLS TGlobal
1c: 78 f3 c3 7f mr 3, 30
20: 01 00 00 48 bl 0x20
0000000000000020: R_PPC64_REL24_NOTOC getConst
24: 78 f3 c4 7f mr 4, 30
28: 30 00 21 38 addi 1, 1, 48
2c: 10 00 01 e8 ld 0, 16(1)
30: f0 ff c1 eb ld 30, -16(1)
34: a6 03 08 7c mtlr 0
38: 00 00 00 48 b 0x38
0000000000000038: R_PPC64_REL24_NOTOC addVal
```
The lines of interest are:
```
10: 00 00 10 04 00 00 60 e4 pld 3, 0(0), 1
0000000000000010: R_PPC64_GOT_TPREL_PCREL34 TGlobal
18: 14 6a c3 7f add 30, 3, 13
0000000000000019: R_PPC64_TLS TGlobal
1c: 78 f3 c3 7f mr 3, 30
```
Which once linked gets turned into:
```
10010210: ff ff 03 06 00 90 6d 38 paddi 3, 13, -28672, 0
10010218: 00 00 00 60 nop
1001021c: 78 f3 c3 7f mr 3, 30
```
The problem is that register 30 is never set after the optimization.
Therefore it is not correct to relax the above instructions by replacing
the add instruction with a nop.
Instead the add instruction should be replaced with a copy (mr) instruction.
If the add uses the same resgiter as input and as ouput then it is safe to
continue to replace the add with a nop.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D95262
This reverts commit c53a1322f3.
Test only passes depending on build dir having a lexicographically later name
than the source dir, and doesn't link on mac/win. See
https://reviews.llvm.org/D98559#2640265 onward.
Bug: https://bugs.llvm.org/show_bug.cgi?id=49278
The flag is not well documented, so this implementation is based on observed behaviour.
When specified, `-dependency_info <path>` produced a text file containing information pertaining to the current linkage, such as input files, output file, linker version, etc.
This file's layout is also not documented, but it seems to be a series of null ('\0') terminated strings in the form `<op code><path>`
`<op code>` could be:
`0x00` : linker version
`0x10` : input
`0x11` : files not found(??)
`0x40` : output
`<path>` : is the file path, except for the linker-version case.
(??) This part is a bit unclear. I think it means all the files the linker attempted to look at, but could not find.
Differential Revision: https://reviews.llvm.org/D98559
`--shuffle-sections=<seed>` applies to all sections. The new
`--shuffle-sections=<section-glob>=<seed>` makes shuffling selective. To the
best of my knowledge, the option is only used as debugging, so just drop the
original form.
`--shuffle-sections '.init_array*=-1'` `--shuffle-sections '.fini_array*=-1'`.
reverses static constructors/destructors of the same priority.
Useful to detect some static initialization order fiasco.
`--shuffle-sections '.data*=-1'`
reverses `.data*` sections. Useful to detect unfunded pointer comparison results
of two unrelated objects.
If certain sections have an intrinsic order, the old form cannot be used.
Differential Revision: https://reviews.llvm.org/D98679
Move some functions closer to their uses. Move detailed address-assignment logic out of the otherwise abstract `Writer::run()`. This prepares the ground for a diff to implement branch range extension thunks.
* `SyntheticSections.cpp`
** move `needsBinding()` and `prepareBranchTarget()` into `Writer.cpp`
** move `addNonLazyBindingEntries()` adjacent to its use.
* `Writer.cpp`
** move address-assignment logic from `Writer::run()` into new function `Writer::assignAddresses()`
** move `needsBinding()` and `prepareBranchTarget()` from `SyntheticSections.cpp`
* `Target.h`
** remove orphaned decls of `prepareSymbolRelocation()` and `validateRelocationInfo()` which were moved to other files in earlier diffs.
Differential Revision: https://reviews.llvm.org/D98795
If the number of sections changes, which is common for re-links after
incremental updates, the section order may change drastically.
Special case -1 to reverse input sections. This is a stable transform.
The section order is more resilient to incremental updates. Usually the
code issue (e.g. Static Initialization Order Fiasco, assuming pointer
comparison result of two unrelated objects) is due to the relative order
between two problematic input files A and B. Checking the regular order
and the reversed order is sufficient.
Differential Revision: https://reviews.llvm.org/D98445
This fixes defects in D98223 [lld-macho] implement options -(un)exported_symbol(s_list):
* disallow export of hidden symbols
* verify that whitelisted literal names are defined in the symbol table
* reflect export-status overrides in `nlist` attribute of `N_EXT` or `N_PEXT`
Thanks to @thakis for raising these issues
Differential Revision: https://reviews.llvm.org/D98381
Visual Studios implementation of the C++ Standard Library does not use strerror to produce a message for std::error_code unlike other standard libraries such as libstdc++ or libc++ that might be used.
This patch adds a cmake script that through running a C++ program gets the error messages for the POSIX error codes and passes them onto lit through an optional config parameter.
If the config parameter is not set, or getting the messages failed, due to say a cross compiling configuration without an emulator, it will fall back to using pythons strerror functions.
Differential Revision: https://reviews.llvm.org/D98278
This pleases the codesign
(Otherwise it complains about "function starts data out of place")
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D98648
The lit test suite uses python 3.6 features. Rather than a strange
python syntax error upon running the lit tests, we will require the
correct version in CMake.
Reviewed By: serge-sans-paille, yln
Differential Revision: https://reviews.llvm.org/D95635
They were previously in SyntheticSections.h, but now there are
a bunch of non-synthetic section names in the list.
Also renamed `__functionStarts` to `__func_starts` for uniformity with
other section names + keeps the name under 16 characters (in case we ever
want to write it out as a real section).
Reviewed By: #lld-macho, compnerd
Differential Revision: https://reviews.llvm.org/D98586
Summary: The exact out-of-range value seems to differ by 8 bytes on the
buildbots compared to my local machine. I'm guessing it has something to
do with what inputs we are getting from llvm-mc. Not sure why, but I
don't think it's super important -- let's just ignore the number, the
out-of-range message is the important thing here
Previously, it was difficult to write code that handled both synthetic
and regular sections generically. We solve this problem by creating a
fake InputSection at the start of every SyntheticSection.
This refactor allows us to handle DSOHandle like a regular Defined
symbol (since Defined symbols must be attached to an InputSection), and
paves the way for supporting `__mh_*header` symbols. Additionally, it
simplifies our binding/rebase code.
I did have to extend Defined a little -- it now has a `linkerInternal`
flag, to indicate that `___dso_handle` should not be in the final symbol
table.
I've also added some additional testing for `___dso_handle`.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D98545
This diff required fixing `getEmbeddedAddend` to apply sign
extension to 32-bit values. We were previously passing around wrong
64-bit addend values that became "right" after being truncated back to
32-bit.
I've also made `getEmbeddedAddend` return a signed int, which is similar
to what LLD-ELF does for its `getImplicitAddend`.
`reportRangeError`, `checkUInt`, and `checkInt` are counterparts of similar
functions in LLD-ELF.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D98387
The S_[GL]PROC32_ID symbol records are supposed to point to function ID
records. If they don't, they are corrupt. The warning message here was
very technical, but a user has encountered it in the wild. Add some more
information and some more testing.
There are 3 remaining tests that still have `REQUIRE: shell`:
* color-diagnostics.test -- seems necessary for ANSI escape sequence support
* stabs.s -- the shell part could be removed, but I don't think we can support
the test on Windows anyway due to its reliance on `touch` to set the modtime
* framework.s -- uses symlinks, I'm not sure this works on Windows
Addresses PR49512.
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D98395
SUBTRACTOR relocations are always paired with UNSIGNED
relocations to indicate a pair of symbols whose address difference we
want. Functionally they are like a single relocation: only one pointer
gets written / relocated. Previously, we would handle these pairs by
skipping over the SUBTRACTOR relocation and writing the pointer when
handling the UNSIGNED reloc. This diff reverses things, so we write
while handling SUBTRACTORs and skip over the UNSIGNED relocs instead.
Being able to distinguish between SUBTRACTOR and UNSIGNED relocs in the
write phase (i.e. inside `relocateOne`) is useful for the upcoming range
check diff: we want to check that SUBTRACTOR relocs write signed values,
but UNSIGNED relocs (naturally) write unsigned values.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D98386
The previous implementation miscalculated the addend, resulting
in an underflow. This meant that every SIGNED_N section relocation would
be associated with the last subsection (since the addend would now be a
huge number). We were "lucky" that this mistake was typically cancelled
out -- 64-to-32-bit-truncation meant that the final value was correct,
as long as subsections were not rearranged.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D98385
Previously, SyntheticSections.cpp did not have a top-level `using namespace
llvm::MachO` because it caused a naming conflict: `llvm::MachO::Symbol` would
collide with `lld::macho::Symbol`.
`MachO::Symbol` represents the symbols defined in InterfaceFiles (TBDs). By
moving the inclusion of InterfaceFile.h into our .cpp files, we can avoid this
name collision in other files where we are only dealing with LLD's own symbols.
Along the way, I removed all unnecessary "MachO::" prefixes in our code.
Cons of this approach: If TextAPI/MachO/Symbol.h gets included via some other
header file in the future, we could run into this collision again.
Alternative 1: Have either TextAPI/MachO or BinaryFormat/MachO.h use a different
namespace. Most of the benefit of `using namespace llvm::MachO` comes from being
able to use things in BinaryFormat/MachO.h conveniently; if TextAPI was under a
different (and fully-qualified) namespace like `llvm::tapi` that would solve our
problems. Cons: lots of files across llvm-project will need to be updated, and
folks who own the TextAPI code need to agree to the name change.
Alternative 2: Rename our Symbol to something like `LldSymbol`. I think this is
ugly.
Personally I think alternative #1 is ideal, but I'm not sure the effort to do it is
worthwhile, this diff's halfway solution seems good enough to me. Thoughts?
Reviewed By: #lld-macho, oontvoo, MaskRay
Differential Revision: https://reviews.llvm.org/D98149
GNU ld supports `.` and `$` in symbol names while LLD doesn't support them in
`readPrimary` expressions. Using `.` can result in such an error:
```
https://github.com/ClangBuiltLinux/linux/issues/1318
ld.lld: error: ./arch/powerpc/kernel/vmlinux.lds:255: malformed number: .TOC.
>>> __toc_ptr = (DEFINED (.TOC.) ? .TOC. : ADDR (.got)) + 0x8000;
```
Allow `.` (ppc64 special symbol `.TOC.`) and `$` (RISC-V special symbol `__global_pointer$`).
Change `diag[3-5].test` to use an invalid character `^`.
Note: GNU ld allows `~` in non-leading positions of a symbol name. `~`
is not used in practice, conflicts with the unary operator, and can
cause some parsing difficulty, so this patch does not add it.
Differential Revision: https://reviews.llvm.org/D98306
This is the minimal port from ELF. Any extension should easy from here
Test plan: ninja check-all-macho
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D98419
This reverts commit bacf9cf2c5 and
reinstates commit 1a9bd5b813.
Reverting this commit did not appear to make the problem go away, so we
can go ahead and reland it.
Pointer and reference induction variables of range-based for loops are often const, and code authors often lax about qualifying them.
Differential Revision: https://reviews.llvm.org/D98317
The flag doesn't (and shouldn't) have an effect in that case.
ld64 doesn't warn on this, but it seems like a good thing to do.
If it causes problems in practice for some reason, we can revert it.
Also add a dedicated test for install_name.
Differential Revision: https://reviews.llvm.org/D98259
lld doesn't read MH_DEAD_STRIPPABLE_DYLIB to strip dead dylibs yet,
but now it can produce dylibs with it set.
While here, also switch an existing test that looks only at the main Mach-O
header from --all-headers to --private-header.
Differential Revision: https://reviews.llvm.org/D98262
A `local:` version node in a version script can change the effective symbol binding
to STB_LOCAL. The linker needs to communicate the fact to enable WPD
(otherwise LTO does not know that the `!vcall_visibility` metadata has
effectively changed from VCallVisibilityPublic to VCallVisibilityLinkageUnit).
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D98220
Top-level `using llvm::opt` has been present in `lld/MachO/Driver*.cpp` for some time, so remove lingering `opt::` prefixes.
Differential Revision: https://reviews.llvm.org/D98314
lld policy discourages `auto`. Replace it with a type name whenever reasonable. Retain `auto` to avoid ...
* redundancy, as for decls such as `auto *t = mumble_cast<TYPE *>` or similar that specifies the result type on the RHS
* verbosity, as for iterators
* gratuitous suffering, as for lambdas
Along the way, add `const` when appropriate.
Note: a future diff will ...
* add more `const` qualifiers
* remove `opt::` when we are already `using llvm::opt`
Differential Revision: https://reviews.llvm.org/D98313
Implement command-line options to alter a dylib's exported-symbols list:
* `-exported_symbol*` options override the default export list. The export list is compiled according to the command-line option(s) only.
* `-unexported_symbol*` options hide otherwise public symbols.
* `-*exported_symbol PATTERN` options specify a single literal or glob pattern.
* `-*exported_symbols_list FILE` options specify a file containing a series of lines containing symbol literals or glob patterns. Whitespace and `#`-prefix comments are stripped.
Note: This is a simple implementation of the primary use case. ld64 has much more complexity surrounding interactions with other options, many of which are obscure and undocumented. We will start simple and complexity as necessary.
Differential Revision: https://reviews.llvm.org/D98223
Add first bits for emitting LC_FUNCTION_STARTS.
This is a recommit of f344dfeb with the adjusted test
which should address build bots breakages.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D97260