The initial segment protection was also being used to set the maximum
segment protection level. Instead, the maximum should be set according
to the architecture we are linking. For example on Mac OS it should be
RWX on most pages, but on iOS is often on R_X.
rdar://problem/24515136
llvm-svn: 259966
On Mac OS 10.5 and later, with X86_64 and outputting a dynamic executable,
ld64 set the CPU_SUBTYPE_LIB64 mask on the cpusubtype in the mach_header.
This adds the same functionality to lld.
rdar://problem/24507177
llvm-svn: 259826
ld64 sets both S_ATTR_PURE_INSTRUCTIONS and S_ATTR_SOME_INSTRUCTIONS
on __TEXT, __text. We only had the S_ATTR_PURE_INSTRUCTIONS attribute.
rdar://problem/24495801
llvm-svn: 259744
In the case where we are emitting to an object file, the platform is
possibly unknown, and the source object files contained load commands
for version min, we can take the maximum of those min versions and
emit in in the output object file.
This test also tests r259739.
llvm-svn: 259742
This option is emitted in the min_version load commands.
Note, there's currently a difference in behaviour compared to ld64 in
that we emit a warning if we generate a min_version load command and
didn't give an sdk_version. We need to decide what the correct behaviour
is here as its possible we want to emit an error and force clients to
provide the option.
llvm-svn: 259729
If the command line contains something like -macosx_version_min and we
don't explicitly disable generation with -no_version_load_command then
we generate the LC_VERSION_MIN command in the output file.
There's a couple of FIXME's in here. These will be handled soon with
more tests but I didn't want to grow this patch any more than it already was.
rdar://problem/24472630
llvm-svn: 259718
In r259574 I fixed some of the issues with the mach header symbols
and DSO handles.
This is the next issue whereby the __mh_execute_header has to not
be dead stripped, and (to match ld64) should be dynamically referenced.
The test here should also have been added in r259574 to make sure that
we emit this symbol. But checking that it is not only emitted but also
has the correct reference type is fine.
llvm-svn: 259589
The magic file which contained these symbols inherited from archive
which meant that the resolver didn't add the required atoms as archive
members only get added when referenced. Instead we now inherit from
SimpleFile which always links in the atoms needed.
The second issue was in the handling of these symbols when we emit
the MachO. The mach header symbol needs to be in the atom list as
it gets an offset (0), and being in the atom list makes sure it is
emitted to the symbol table. DSO handles are not emitted to the
symbol table.
rdar://problem/24450654
llvm-svn: 259574
When we parse a MachoFile, we set a number of members from the parsed
file, for example, subsectionsViaSymbols.
However, a number of passes, such as ObjCPass, create local copies of
MachoFile and don't get the benefit of setting flags and other fields in
the parser. Instead we can just give a more sensible default as the parser
will definitely get the correct value from the file anyway.
llvm-svn: 259426
__DATA, __objc_catlist contains a list of pointers to categories.
We want to atomize it so that the ObjC pass can later optimize and remove
categories. That will be a later patch.
llvm-svn: 259386
This option matches the behaviour of ld64, that is it prevents globals
from being dead stripped in executables and dylibs.
Reviewed by Lang Hames
Differential Revision: http://reviews.llvm.org/D16026
llvm-svn: 258554
This pass currently emits an objc image info section if one is required.
This section contains the aggregated version and flags for all of the input
files.
llvm-svn: 258197
Like arch, os, etc, when we know we are going to use a file, we check
that the file has compatible objc constraints to the context, throw
appropriate errors where that is not the case, and hopefully set the
objc constraints on the context for use later.
Added 2 tests to ensure that we don't have incompatibilities between
host and simulator code as both will get x86 based architectures.
llvm-svn: 258173
When generating a relocatable file, its only valid to set this flag if
all of the inputs also had the flag. Otherwise we may atomize incorrectly
when we link the relocatable file again.
Reviewed by Lang Hames.
Differential Revision: http://reviews.llvm.org/D16018
llvm-svn: 257976
The image info struct contains flags for what kind of GC/retain/release is required.
Give an error if we parse GC flags as these are unsupported.
llvm-svn: 257974
This file was failing to build with asan enabled. The reason being that
applyFixupFinal was writing 4-bytes worth of fixup in to an atom only
a single byte in length.
The test case didn't actually need this particular reloc so i've removed
it, although i'll follow up with future commits to actually verify that
relocs are to an address with enough space for the fixup to be applied.
llvm-svn: 257906
This patch makes use of the handleLoadedFile hook added in r257814.
That method is used to check the arch and the OS of the files we are linking
against the arch and OS on the context.
The first test to use this ensures that we do not try to combine i386 Mac OS code
with i386 simulator code.
llvm-svn: 257837
It wasn't actually pointing to the function at the start of the text section, and so the offset in the binary differed when we passed the file through a second time.
The __eh_frame section uses implicit relocations and when reducing this test case from explicit to implicit, I got
the offset wrong. This makes sure it is correct.
llvm-svn: 257101
The __eh_frame section contains relocations which can always be implicitly generated.
This patch tracks whether sections have only implicitly relocations and skips emitting them to the object file if that is the case.
The test case here ensures that this is the case for __eh_frame sections.
Reviewed by Lang Hames.
http://reviews.llvm.org/D15594
llvm-svn: 257099
The fixup content we encode here should be the offset from the
fixup location back to the last nonlocal label. We were only encoding
the address of the fixup, and not taking in to account the base address
of the atom we are in.
Updated the test case here to have a text section which will come before
the data section where the relocation lives. .data being at offset 0 had
previously been hiding this bug.
llvm-svn: 256974
The final section order in relocatable files was just a side effect
of the atom sorter. This meant that sections like __data were before
__text because __data has RW permissions and __text RX and RW was less
than RX in our enum.
Final linked images had an actual section/segment sorter. There was no
reason for the difference, so simplify a bunch of code and just use the
same sorted for everything.
Reviewed by Lang Hames.
http://reviews.llvm.org/D15868
llvm-svn: 256786
The assembly at the top of this file contained more relocations than
the YAML. I regenerated it so that we'd have complete relocation testing.
Also added detailed explanations of the relocations in the file so that
future people don't have to try decode them when something goes wrong.
llvm-svn: 256064
negDelta32 is only ever implicitly generated as the FDE->CIE reference.
We therefore don't emit a relocation for it in the object file in -r mode.
The value we write in to the FDE location therefore needs to point to the
final target address of the CIE, and not the inAtomAddress as it was currently
doing.
llvm-svn: 255835
The delta64 relocation is represented as the pair ARM64_RELOC_SUBTRACTOR and ARM64_RELOC_UNSIGNED.
Those should always have the same offset, so this adds a check and tests to ensure this is the case.
Also updated the error printing in this case to shows both relocs when erroring on pair.
llvm-svn: 255274
The gcc_except_tab was generating these references to point to the typeinfo in the data section.
gcc_except_tab also had the DW_EH_PE_indirect flag set which means that at runtime we are going
to dereference this entry as if it is in the GOT.
Reviewed by Nick Kledzik in http://reviews.llvm.org/D15360.
llvm-svn: 255085
This is a basic initial implementation of the -flat_namespace and
-undefined options for LLD-darwin. It ignores several subtlties,
but the result is close enough that we can now link LLVM (but not
clang) on Darwin and pass all regression tests.
llvm-svn: 248732
loadFile could load mulitple files just because yaml has a feature for
putting multiple documents in one file.
Designing a linker around what yaml can do seems like a bad idea to
me. This patch changes it to read a single file.
There are further improvements to be done to the api and they
will follow shortly.
llvm-svn: 235724
In the resolver, we maintain a list of undefined symbols, and when we
visit an archive file, we check that file if undefined symbols can be
resolved using files in the archive. The archive file class provides
find() function to lookup a symbol.
Previously, we call find() for each undefined symbols. Archive files
may be visited multiple times if they are in a --start-group and
--end-group. If we visit a file M times and if we have N undefined
symbols, find() is called M*N times. I found that that is one of the
most significant bottlenecks in LLD when linking a large executable.
find() is not a very cheap operation because it looks up a hash table
for a given string. And a string, or a symbol name, can be pretty long
if you are dealing with C++ symbols.
We can eliminate the bottleneck.
Calling find() with the same symbol multiple times is a waste. If a
result of looking up a symbol is "not found", it stays "not found"
forever because the symbol simply doesn't exist in the archive.
Thus, we should call find() only for newly-added undefined symbols.
This optimization makes O(M*N) O(N).
In this patch, all undefined symbols are added to a vector. For each
archive/shared library file, we maintain a start position P. All
symbols [0, P) are already searched. [P, end of the vector) are not
searched yet. For each file, we scan the vector only once.
This patch changes the order in which undefined symbols are looked for.
Previously, we iterated over the result of _symbolTable.undefines().
Now we iterate over the new vector. This is a benign change but caused
differences in output if remaining undefines exist. This is why some
tests are updated.
The performance improvement of this patch seems sometimes significant.
Previously, linking chrome.dll on my workstation (Xeon 2.4GHz 8 cores)
took about 70 seconds. Now it takes (only?) 30 seconds!
http://reviews.llvm.org/D8091
llvm-svn: 231434
strings don't mix so easily. This fixes the last remaining failure
I have in 'check-all' on a system with both Python3 and and Python2
installed.
llvm-svn: 224947
Summary:
Fix the binary file reader to properly read dyld version info.
Update the install_name test case to properly test the binary reader. We can't use '-print_atoms' as the output format is 'native' yaml and it does not contains the dyld current and compatibility versions.
Also change the timestamp value of LD_ID_DYLD to match the one generated by ld64.
The dynamic linker (dyld) used to expects different values for timestamp in LD_ID_DYLD and LD_LOAD_DYLD for prebound images. While prebinding is deprecated, we should probably keep it safe and match ld64.
Reviewers: kledzik
Subscribers: llvm-commits
Projects: #lld
Differential Revision: http://reviews.llvm.org/D6736
llvm-svn: 224681
Summary:
Work on adding -rpath support to the mach-o linker.
This patch is based on the ld64 behavior for the command line option validation.
It includes a basic test to check that the LC_RPATH load commands are properly generated when that option is used.
It also add LC_RPATH support to the binary reader, but I don't know how to test it though.
Reviewers: kledzik
Subscribers: llvm-commits
Projects: #lld
Differential Revision: http://reviews.llvm.org/D6724
llvm-svn: 224544
Mach-o does not use a simple SO_NEEDED to track dependent dylibs. Instead,
the linker copies four things from each dylib to each client: the runtime path
(aka "install name"), the build time, current version (dylib build number), and
compatibility version The build time is no longer used (it cause every rebuild
of a dylib to be different). The compatibility version is usually just 1.0
and never changes, or the dylib becomes incompatible.
This patch copies that information into the NormalizedMachO format and
propagates it to clients.
llvm-svn: 222300
When fixing up BL instructions, the linker has to compare the thumbness of the
target to decide if the instruction needs to be converted to BLX. But with B
instruction there is no BX, so the linker asserts if the target is not the
same thumbness. This assert was firing in -r mode when the target was undefined
which it interpreted as being non-thumb.
Test case change is to add a B (in both thumb and arm code) to an undefined
symbol and round trip through -r mode.
llvm-svn: 222266
The arm64 assembler almost always uses r_extern=1 relocations in which the
r_symbolnum field is the index of the symbol the relocation references. But
sometimes it will set r_extern=0 in which case the linker needs to read the
content of the reloction to determine the target.
Add test case that the r_extern=0 relocation round trips.
llvm-svn: 222198
The GOT slots were being laid out in a random order by the GOTPass which
caused randomness in the output file.
Note: With this change lld now bootstraps on darwin. That is:
1) link lld using system linker to make lld.1
2) link lld using lld.1 to make lld.2
3) link lld using lld.2 to make lld.3
Now lld.2 and lld.3 are identical.
llvm-svn: 221831
On darwin in final linked images, the __TEXT segment covers that start of the
file. That means in memory a process can see the mach_header (and load commands)
for every loaded image in a process. There are APIs that take and return the
mach_header addresses as a way to specify a particular loaded image.
For completeness, any code can get the address of the mach_header of the image
it is in by using &__dso_handle. In addition there are mach-o type specific
symbols like __mh_execute_header.
The linker needs to supply a definition for any of these symbols if used. But
the address the symbol it resolves to is not in any section. Instead it is the
address of the start of the __TEXT segment.
I needed to make a small change to SimpleFileNode to not override
resetNextIndex() because the Driver creates a SimpleFileNode to hold the
internal/implicit files that the context/writer can create. For some reason
SimpleFileNode overrode resetNextIndex() to do nothing instead of reseting
the index (which mach-o needs if the internal file is an archive).
llvm-svn: 221822
The way lazy binding works in mach-o is that the linker generates a helper
function and has the stub (PLT) initially jump to it. The helper function
pushes an extra parameter then jumps into dyld. The extra parameter is an
offset into the lazy binding info where dyld will find the information about
which symbol to bind and way lazy binding pointer to update.
llvm-svn: 221654
The darwin linker lets you rearrange functions and data for better locality
(less paging). You do this with the -order_file option which supplies a text
file containing one symbol per line.
Implementing this required a small change to LayoutPass to add a custom sorter
hook.
llvm-svn: 221545
The darwin linker has two ways to force all members of an archive to be loaded.
The -all_load option applies to all static libraries. The -force_load takes
a path to a library and just that library's members are force loaded.
llvm-svn: 221477
code. Same basic change that was done in r218429 for ARM code.
Where the ARM thumb symbolizer in llvm-objdump’s Mach-O disassembler is now
plumbed in with r221470 from the llvm trunk.
llvm-svn: 221473
Darwin uses two-level-namespace lookup for symbols which means the static
linker records where each symbol must be found at runtime. Thus defining a
symbol in a dylib loaded earlier will not effect where symbols needed by
later dylibs will be found. Instead overriding is done through a section
of type S_INTERPOSING which contains tuples of <interposer, interposee>.
llvm-svn: 221424
The job of the CompactUnwind pass is to turn __compact_unwind data (and
__eh_frame) into the compressed final form in __unwind_info. After it's done,
the original atoms are no longer relevant and should be deleted (they cause
problems during actual execution, quite apart from the fact that they're not
needed).
llvm-svn: 221301
lld was regenerating LC_DATA_IN_CODE in .o output files, but not into
final linked images.
Update test case to verify data-in-code info makes it into final linked images.
llvm-svn: 220827
Objective-C switched to a new ABI which uses a different mangling for class
names. But to keep projects building that use export lists that use the old
class name mangling, the linker recognizes the old names and transforms them
to the new mangling.
llvm-svn: 220598
In final linked shared images, the __TEXT segment contains both code and
the mach-o header/load-commands. In the case of a data-only dylib, there is
no code, so we need to force the addition of the __TEXT segment.
llvm-svn: 220597
All compiler generated mach-o object files are marked with MH_SUBSECTIONS_VIA_SYMBOLS.
But hand written assembly files need to opt-in if they are written correctly.
The flag means the linker can break up a sections at symbol addresses and
dead strip or re-order functions.
This change recognizes object files without the flag and marks its atoms as
not dead strippable and adds a layout-after chain of references so that the
atoms cannot be re-ordered.
llvm-svn: 220348
The darwin linker operates differently than the gnu linker with respect to
libraries. The darwin linker first links in all object files from the command
line, then to resolve any remaining undefines, it repeatedly iterates over
libraries on the command line until either all undefines are resolved or no
undefines were resolved in the last pass.
When Shankar made the InputGraph model, the plan for darwin was for the darwin
driver to place all libraries in a group at the end of the InputGraph. Thus
making the darwin model a subset of the gnu model. But it turns out that does
not work because the driver cannot tell if a file is an object or library until
it has been loaded, which happens later.
This solution is to subclass InputGraph for darwin and just iterate the graph
the way darwin linker needs.
llvm-svn: 220330
-all_load tells the darwin linker to immediately load all members of all
archives. The code do that used reinterpret_cast<> instead of dyn_cast<>.
If the file was a dylib, the reinterpret_cast<> turned a pointer to a dylib
into a pointer to an archive...boom.
Added test case to reproduce the crash, simplified the code and used dyn_cast<>.
llvm-svn: 219990
To deal with cycles in shared library dependencies, the darwin linker supports
marking specific link dependencies as "upward". An upward link is when a
lower level library links against a higher level library.
llvm-svn: 219949
Not all situations are representable in the compressed __unwind_info format,
and when this happens the entry needs to point to the more general __eh_frame
description.
Just x86_64 implementation for now.
rdar://problem/18208653
llvm-svn: 219836
We'll also need references back to the CIE eventually, but for now making sure
we can work out what an FDE is referring to is enough.
The actual kind of reference needs to be different between architectures,
probably because of MachO's chronic shortage of relocation types but I don't
really want to know in case I find out something that distresses me even more.
rdar://problem/18208653
llvm-svn: 219824
Arm code has two instruction encodings "thumb" and "arm". When branching from
one code encoding to another, you need to use an instruction that switches
the instruction mode. Usually the transition only happens at call sites, and
the linker can transform a BL instruction in BLX (or vice versa). But if the
compiler did a tail call optimization and a function ends with a branch (not
branch and link), there is no pc-rel BX instruction.
The ShimPass looks for pc-rel B instructions that will need to switch mode.
For those cases it synthesizes a shim which does the transition, then modifies
the original atom with the B instruction to target to the shim atom.
llvm-svn: 219655
mach-o supports "fat" files which are a header/table-of-contents followed by a
concatenation of mach-o files (or archives of mach-o files) built for
different architectures. Previously, the support for fat files was in the
MachOReader, but that only supported fat .o files and dylibs (not archives).
The fix is to put the fat handing into MachOFileNode. That way any input file
kind (including archives) can be fat. MachOFileNode selects the sub-range
of the fat file that matches the arch being linked and creates a MemoryBuffer
for just that subrange.
llvm-svn: 219268
This option is added by Xcode when it runs the linker. It produces a binary
file which contains the file the linker used. Xcode uses the info to
dynamically update it dependency tracking.
To check the content of the binary file, the test case uses a python script
to dump the binary file as text which FileCheck can check.
llvm-svn: 219039
The darwin linker has the -demangle option which directs it to demangle C++
(and soon Swift) mangled symbol names. Long term we need some Diagnostics object
for formatting errors and warnings. But for now we have the Core linker just
writing messages to llvm::errs(). So, to enable demangling, I changed the
Resolver to call a LinkingContext method on the symbol name.
To make this more interesting, the demangling code is done via __cxa_demangle()
which is part of the C++ ABI, which is only supported on some platforms, so I
had to conditionalize the code with the config generated HAVE_CXXABI_H.
llvm-svn: 218718
This is a minimally useful pass to construct the __unwind_info section in a
final object from the various __compact_unwind inputs. Currently it doesn't
produce any compressed pages, only works for x86_64 and will fail if any
function ends up without __compact_unwind.
rdar://problem/18208653
llvm-svn: 218703
On darwin, the linker tools records which dylib (DSO) each undefined was found
in, and then at runtime, the loader (dyld) only looks in that one specific
dylib for each undefined symbol. Now that llvm-objdump can display that info
I can write test cases.
llvm-svn: 217898
The provided base must also be a multiple of the system's page size, which is a
reasonable enough demand.
Also check the other diagnostics more thoroughly.
llvm-svn: 217577
The existing system linkers on Darwin and Linux are called "ld". We'd like to
eventually drop in lld as "ld" and have it just work. But lld is a universal
linker that requires the first option to be -flavor to know which command line
mode to emulate (gnu or darwin).
This change tests if argv[0] is "ld" and if so, if the tool was built on MacOSX
then assume the darwin flavor otherwise the gnu flavor. There are two test
cases which copy lld to "ld" and then run it. One for darwin and one for linux.
llvm-svn: 217566