loadFile could load mulitple files just because yaml has a feature for
putting multiple documents in one file.
Designing a linker around what yaml can do seems like a bad idea to
me. This patch changes it to read a single file.
There are further improvements to be done to the api and they
will follow shortly.
llvm-svn: 235724
In the resolver, we maintain a list of undefined symbols, and when we
visit an archive file, we check that file if undefined symbols can be
resolved using files in the archive. The archive file class provides
find() function to lookup a symbol.
Previously, we call find() for each undefined symbols. Archive files
may be visited multiple times if they are in a --start-group and
--end-group. If we visit a file M times and if we have N undefined
symbols, find() is called M*N times. I found that that is one of the
most significant bottlenecks in LLD when linking a large executable.
find() is not a very cheap operation because it looks up a hash table
for a given string. And a string, or a symbol name, can be pretty long
if you are dealing with C++ symbols.
We can eliminate the bottleneck.
Calling find() with the same symbol multiple times is a waste. If a
result of looking up a symbol is "not found", it stays "not found"
forever because the symbol simply doesn't exist in the archive.
Thus, we should call find() only for newly-added undefined symbols.
This optimization makes O(M*N) O(N).
In this patch, all undefined symbols are added to a vector. For each
archive/shared library file, we maintain a start position P. All
symbols [0, P) are already searched. [P, end of the vector) are not
searched yet. For each file, we scan the vector only once.
This patch changes the order in which undefined symbols are looked for.
Previously, we iterated over the result of _symbolTable.undefines().
Now we iterate over the new vector. This is a benign change but caused
differences in output if remaining undefines exist. This is why some
tests are updated.
The performance improvement of this patch seems sometimes significant.
Previously, linking chrome.dll on my workstation (Xeon 2.4GHz 8 cores)
took about 70 seconds. Now it takes (only?) 30 seconds!
http://reviews.llvm.org/D8091
llvm-svn: 231434
strings don't mix so easily. This fixes the last remaining failure
I have in 'check-all' on a system with both Python3 and and Python2
installed.
llvm-svn: 224947
Summary:
Fix the binary file reader to properly read dyld version info.
Update the install_name test case to properly test the binary reader. We can't use '-print_atoms' as the output format is 'native' yaml and it does not contains the dyld current and compatibility versions.
Also change the timestamp value of LD_ID_DYLD to match the one generated by ld64.
The dynamic linker (dyld) used to expects different values for timestamp in LD_ID_DYLD and LD_LOAD_DYLD for prebound images. While prebinding is deprecated, we should probably keep it safe and match ld64.
Reviewers: kledzik
Subscribers: llvm-commits
Projects: #lld
Differential Revision: http://reviews.llvm.org/D6736
llvm-svn: 224681
Summary:
Work on adding -rpath support to the mach-o linker.
This patch is based on the ld64 behavior for the command line option validation.
It includes a basic test to check that the LC_RPATH load commands are properly generated when that option is used.
It also add LC_RPATH support to the binary reader, but I don't know how to test it though.
Reviewers: kledzik
Subscribers: llvm-commits
Projects: #lld
Differential Revision: http://reviews.llvm.org/D6724
llvm-svn: 224544
Mach-o does not use a simple SO_NEEDED to track dependent dylibs. Instead,
the linker copies four things from each dylib to each client: the runtime path
(aka "install name"), the build time, current version (dylib build number), and
compatibility version The build time is no longer used (it cause every rebuild
of a dylib to be different). The compatibility version is usually just 1.0
and never changes, or the dylib becomes incompatible.
This patch copies that information into the NormalizedMachO format and
propagates it to clients.
llvm-svn: 222300
When fixing up BL instructions, the linker has to compare the thumbness of the
target to decide if the instruction needs to be converted to BLX. But with B
instruction there is no BX, so the linker asserts if the target is not the
same thumbness. This assert was firing in -r mode when the target was undefined
which it interpreted as being non-thumb.
Test case change is to add a B (in both thumb and arm code) to an undefined
symbol and round trip through -r mode.
llvm-svn: 222266
The arm64 assembler almost always uses r_extern=1 relocations in which the
r_symbolnum field is the index of the symbol the relocation references. But
sometimes it will set r_extern=0 in which case the linker needs to read the
content of the reloction to determine the target.
Add test case that the r_extern=0 relocation round trips.
llvm-svn: 222198
The GOT slots were being laid out in a random order by the GOTPass which
caused randomness in the output file.
Note: With this change lld now bootstraps on darwin. That is:
1) link lld using system linker to make lld.1
2) link lld using lld.1 to make lld.2
3) link lld using lld.2 to make lld.3
Now lld.2 and lld.3 are identical.
llvm-svn: 221831
On darwin in final linked images, the __TEXT segment covers that start of the
file. That means in memory a process can see the mach_header (and load commands)
for every loaded image in a process. There are APIs that take and return the
mach_header addresses as a way to specify a particular loaded image.
For completeness, any code can get the address of the mach_header of the image
it is in by using &__dso_handle. In addition there are mach-o type specific
symbols like __mh_execute_header.
The linker needs to supply a definition for any of these symbols if used. But
the address the symbol it resolves to is not in any section. Instead it is the
address of the start of the __TEXT segment.
I needed to make a small change to SimpleFileNode to not override
resetNextIndex() because the Driver creates a SimpleFileNode to hold the
internal/implicit files that the context/writer can create. For some reason
SimpleFileNode overrode resetNextIndex() to do nothing instead of reseting
the index (which mach-o needs if the internal file is an archive).
llvm-svn: 221822
The way lazy binding works in mach-o is that the linker generates a helper
function and has the stub (PLT) initially jump to it. The helper function
pushes an extra parameter then jumps into dyld. The extra parameter is an
offset into the lazy binding info where dyld will find the information about
which symbol to bind and way lazy binding pointer to update.
llvm-svn: 221654
The darwin linker lets you rearrange functions and data for better locality
(less paging). You do this with the -order_file option which supplies a text
file containing one symbol per line.
Implementing this required a small change to LayoutPass to add a custom sorter
hook.
llvm-svn: 221545
The darwin linker has two ways to force all members of an archive to be loaded.
The -all_load option applies to all static libraries. The -force_load takes
a path to a library and just that library's members are force loaded.
llvm-svn: 221477
code. Same basic change that was done in r218429 for ARM code.
Where the ARM thumb symbolizer in llvm-objdump’s Mach-O disassembler is now
plumbed in with r221470 from the llvm trunk.
llvm-svn: 221473
Darwin uses two-level-namespace lookup for symbols which means the static
linker records where each symbol must be found at runtime. Thus defining a
symbol in a dylib loaded earlier will not effect where symbols needed by
later dylibs will be found. Instead overriding is done through a section
of type S_INTERPOSING which contains tuples of <interposer, interposee>.
llvm-svn: 221424
The job of the CompactUnwind pass is to turn __compact_unwind data (and
__eh_frame) into the compressed final form in __unwind_info. After it's done,
the original atoms are no longer relevant and should be deleted (they cause
problems during actual execution, quite apart from the fact that they're not
needed).
llvm-svn: 221301
lld was regenerating LC_DATA_IN_CODE in .o output files, but not into
final linked images.
Update test case to verify data-in-code info makes it into final linked images.
llvm-svn: 220827
Objective-C switched to a new ABI which uses a different mangling for class
names. But to keep projects building that use export lists that use the old
class name mangling, the linker recognizes the old names and transforms them
to the new mangling.
llvm-svn: 220598
In final linked shared images, the __TEXT segment contains both code and
the mach-o header/load-commands. In the case of a data-only dylib, there is
no code, so we need to force the addition of the __TEXT segment.
llvm-svn: 220597
All compiler generated mach-o object files are marked with MH_SUBSECTIONS_VIA_SYMBOLS.
But hand written assembly files need to opt-in if they are written correctly.
The flag means the linker can break up a sections at symbol addresses and
dead strip or re-order functions.
This change recognizes object files without the flag and marks its atoms as
not dead strippable and adds a layout-after chain of references so that the
atoms cannot be re-ordered.
llvm-svn: 220348
The darwin linker operates differently than the gnu linker with respect to
libraries. The darwin linker first links in all object files from the command
line, then to resolve any remaining undefines, it repeatedly iterates over
libraries on the command line until either all undefines are resolved or no
undefines were resolved in the last pass.
When Shankar made the InputGraph model, the plan for darwin was for the darwin
driver to place all libraries in a group at the end of the InputGraph. Thus
making the darwin model a subset of the gnu model. But it turns out that does
not work because the driver cannot tell if a file is an object or library until
it has been loaded, which happens later.
This solution is to subclass InputGraph for darwin and just iterate the graph
the way darwin linker needs.
llvm-svn: 220330
-all_load tells the darwin linker to immediately load all members of all
archives. The code do that used reinterpret_cast<> instead of dyn_cast<>.
If the file was a dylib, the reinterpret_cast<> turned a pointer to a dylib
into a pointer to an archive...boom.
Added test case to reproduce the crash, simplified the code and used dyn_cast<>.
llvm-svn: 219990
To deal with cycles in shared library dependencies, the darwin linker supports
marking specific link dependencies as "upward". An upward link is when a
lower level library links against a higher level library.
llvm-svn: 219949
Not all situations are representable in the compressed __unwind_info format,
and when this happens the entry needs to point to the more general __eh_frame
description.
Just x86_64 implementation for now.
rdar://problem/18208653
llvm-svn: 219836
We'll also need references back to the CIE eventually, but for now making sure
we can work out what an FDE is referring to is enough.
The actual kind of reference needs to be different between architectures,
probably because of MachO's chronic shortage of relocation types but I don't
really want to know in case I find out something that distresses me even more.
rdar://problem/18208653
llvm-svn: 219824
Arm code has two instruction encodings "thumb" and "arm". When branching from
one code encoding to another, you need to use an instruction that switches
the instruction mode. Usually the transition only happens at call sites, and
the linker can transform a BL instruction in BLX (or vice versa). But if the
compiler did a tail call optimization and a function ends with a branch (not
branch and link), there is no pc-rel BX instruction.
The ShimPass looks for pc-rel B instructions that will need to switch mode.
For those cases it synthesizes a shim which does the transition, then modifies
the original atom with the B instruction to target to the shim atom.
llvm-svn: 219655
mach-o supports "fat" files which are a header/table-of-contents followed by a
concatenation of mach-o files (or archives of mach-o files) built for
different architectures. Previously, the support for fat files was in the
MachOReader, but that only supported fat .o files and dylibs (not archives).
The fix is to put the fat handing into MachOFileNode. That way any input file
kind (including archives) can be fat. MachOFileNode selects the sub-range
of the fat file that matches the arch being linked and creates a MemoryBuffer
for just that subrange.
llvm-svn: 219268
This option is added by Xcode when it runs the linker. It produces a binary
file which contains the file the linker used. Xcode uses the info to
dynamically update it dependency tracking.
To check the content of the binary file, the test case uses a python script
to dump the binary file as text which FileCheck can check.
llvm-svn: 219039
The darwin linker has the -demangle option which directs it to demangle C++
(and soon Swift) mangled symbol names. Long term we need some Diagnostics object
for formatting errors and warnings. But for now we have the Core linker just
writing messages to llvm::errs(). So, to enable demangling, I changed the
Resolver to call a LinkingContext method on the symbol name.
To make this more interesting, the demangling code is done via __cxa_demangle()
which is part of the C++ ABI, which is only supported on some platforms, so I
had to conditionalize the code with the config generated HAVE_CXXABI_H.
llvm-svn: 218718