mach-o supports "fat" files which are a header/table-of-contents followed by a
concatenation of mach-o files (or archives of mach-o files) built for
different architectures. Previously, the support for fat files was in the
MachOReader, but that only supported fat .o files and dylibs (not archives).
The fix is to put the fat handing into MachOFileNode. That way any input file
kind (including archives) can be fat. MachOFileNode selects the sub-range
of the fat file that matches the arch being linked and creates a MemoryBuffer
for just that subrange.
llvm-svn: 219268
This option is added by Xcode when it runs the linker. It produces a binary
file which contains the file the linker used. Xcode uses the info to
dynamically update it dependency tracking.
To check the content of the binary file, the test case uses a python script
to dump the binary file as text which FileCheck can check.
llvm-svn: 219039
No functionality change. This removes a down-cast from LinkingContext to
MachOLinkingContext.
Also, remove const from LinkingContext::createImplicitFiles() to remove
the need for another const cast. Seems reasonable for createImplicitFiles()
to need to modify the context (MachOLinkingContext does).
llvm-svn: 218796
The darwin linker has the -demangle option which directs it to demangle C++
(and soon Swift) mangled symbol names. Long term we need some Diagnostics object
for formatting errors and warnings. But for now we have the Core linker just
writing messages to llvm::errs(). So, to enable demangling, I changed the
Resolver to call a LinkingContext method on the symbol name.
To make this more interesting, the demangling code is done via __cxa_demangle()
which is part of the C++ ABI, which is only supported on some platforms, so I
had to conditionalize the code with the config generated HAVE_CXXABI_H.
llvm-svn: 218718
This is a minimally useful pass to construct the __unwind_info section in a
final object from the various __compact_unwind inputs. Currently it doesn't
produce any compressed pages, only works for x86_64 and will fail if any
function ends up without __compact_unwind.
rdar://problem/18208653
llvm-svn: 218703
llvm\tools\lld\lib\readerwriter\macho\macholinkingcontext.cpp(647):
warning C4715: 'lld::MachOLinkingContext::exportSymbolNamed' :
not all control paths return a value
llvm\tools\lld\lib\readerwriter\macho\machonormalizedfilefromatoms.cpp(723):
warning C4715: '`anonymous namespace'::Util::getSymbolTableRegion' :
not all control paths return a value
While all enum values do appear in the switch, an uninitialized or corrupted
enum variable would not be caught without the default: case in the switch.
llvm-svn: 218197
On darwin, the linker tools records which dylib (DSO) each undefined was found
in, and then at runtime, the loader (dyld) only looks in that one specific
dylib for each undefined symbol. Now that llvm-objdump can display that info
I can write test cases.
llvm-svn: 217898
Most of the changes are in the new file ArchHandler_arm64.cpp. But a few
things had to be fixed to support 16KB pages (instead of 4KB) which iOS arm64
requires. In addition the StubInfo struct had to be expanded because
arm64 uses two instruction (ADRP/LDR) to load a global which requires two
relocations. The other mach-o arches just needed one relocation.
llvm-svn: 217469
There is a bit (MH_PIE) in the flags field of the mach_header which tells
the kernel is a program was built position independent (for ASLR). The linker
automatically attempts to build programs PIE if they are built for a recent
OS version. But the -pie and -no_pie options override that default behavior.
llvm-svn: 217408
Mach-O has a "fat" (or "universal") variant where the same contents built for
different architectures are concatenated into one file with a table-of-contents
header at the start. But this leaves a dilemma for the linker - which
architecture to use.
Normally, the linker command line -arch is used to force which slice of any fat
files are used. The clang compiler always passes -arch to the linker when
invoking it. But some Makefiles invoke the linker directly and don’t specify
the -arch option. For those cases, the linker scans all input files in command
line order and finds the first non-fat object file. Whatever architecture it
is becomes the architecture for the link.
llvm-svn: 217189
The use of default: was disabling the warning about unused enumerators. Fix
that, then fix the one enumerator that was not handled. Add coverage for
it in test suite.
llvm-svn: 217078
On Darwin at runtime, dyld will prefer to use the export trie of a dylib instead
of the traditional symbol table (which is large and requires a binary search).
This change enables the linker to generate an export trie and to prefer it if
found in a dylib being linked against. This also simples the yaml for dylibs
because the yaml form of the trie can be reduced to just a sequence of names.
llvm-svn: 217066
Mach-O symbols can have an attribute on them means their content should never be
dead code stripped. This translates to deadStrip() == deadStripNever.
llvm-svn: 216234
Both options control the final scope of atoms.
When -exported_symbols_list <file> is used, the file is parsed into one
symbol per line in the file. Only those symbols will be exported (global)
in the final linked image.
The -keep_private_externs option is only used with -r mode. Normally, -r
mode reduces private extern (scopeLinkageUnit) symbols to non-external. But
add the -keep_private_externs option keeps them private external.
llvm-svn: 216146
The darwin linker has an option, heavily used by Xcode, in which, instead
of listing all input files on the command line, the input file paths are
written to a text file and the path of that text file is passed to the linker
with the -filelist option (similar to @file).
In order to make test cases for this, I generalized the -test_libresolution
option to become -test_file_usage.
llvm-svn: 215762
Darwin has a packaging mechanism for shared libraries and headers called
frameworks. A directory Foo.framework contains a shared library binary file
"Foo" and a subdirectory "Headers". Most OS frameworks are all in one
directory /System/Library/Frameworks/. As a linking convenience, the linker
option "-framework Foo" means search the framework directories specified
with -F (analogous to -L) looking for a shared library Foo.framework/Foo.
llvm-svn: 215680
In general two-level namespace means each program records exactly which dylib
each undefined (imported) symbol comes from. But, sometimes the implementor
wants to hide the implementation dylib. For instance libSytem.dylib is the base
dylib all Darwin programs must link with. A few years ago it was split up
into two dozen dylibs by all are hidden behind libSystem.dylib which re-exports
each sub-dylib. All clients still think libSystem.dylib is the implementor.
To support this, the linker must load "indirect" dylibs and not just the
"direct" dylibs specified on the command line. This is done in the
createImplicitFiles() method after all command line specified files are
loaded. Since an indirect dylib may have already been loaded as a direct dylib
(or indirectly via a previous direct dylib), the MachOLinkingContext keeps
a list of all loaded dylibs.
With this change hello world can now be linked against the real OS or SDK.
llvm-svn: 215605
Split up the CRuntimeFile into one part for output types that need an entry
point and another part for output types that use stubs.
Add file 'test/mach-o/Inputs/libSystem.yaml' for use by test cases that
use -dylib and therefore may now need the helper symbol in libSystem.dylib.
llvm-svn: 215602
Mach-o uses "two-level namespace" where each undefined symbols is associated
with a specific dylib. This means at runtime the loader (dyld) does need to
search all loaded dylibs for that symbol but rather just the one specified.
Now that llvm-nm -m prints out that info, properly set that info, and test
it in the hello world test cases.
llvm-svn: 215598
The tests assume the path separator is '/', but if you run
them on Windows it is '\'. As a result the tests are failing
on Windows. This should be the minimal change to make these
tests to pass on Windows platform.
Differential Revision: http://reviews.llvm.org/D4710
llvm-svn: 214990
In some cases the address of a function will be materialized with a movw/movt
pair. If the function is a thumb function, the low bit needs to be set on
the movw immediate value.
llvm-svn: 214277
The -sectalign option is used to increase the alignment required for a section.
It required some reworking of how the __TEXT segment is laid out because that
segment also contains the mach_header and load commands. And the size of load
commands depend on the number of segments, sections, and dependent dylibs used.
Using this option will simplify some future test cases because the final
address of code can be pinned down, making tests of its content easier.
llvm-svn: 214268
All iOS arm processor support switching between arm and thumb mode at call sites
by using the BLX instruction (instead of BL). But the compiler does not know
the implementation mode for extern functions, so the linker must update BL/BLX
instructions to match what is linked is actually linked together. In addition,
pointers to functions (such as vtables) must have the low bit set if the target
of the pointer is a thumb mode function.
llvm-svn: 214140
Sometimes compilers emit data into code sections (e.g. constant pools or
jump tables). These runs of data can throw off disassemblers. The solution
in mach-o is that ranges of data-in-code are encoded into a table pointed to
by the LC_DATA_IN_CODE load command.
The way the data-in-code information is encoded into lld's Atom model is that
that start and end of each data run is marked with a Reference whose offset
is the start/end of the data run. For arm, the switch back to code also marks
whether it is thumb or arm code.
llvm-svn: 213901
This patch just supports marking ranges that are thumb code (vs arm code).
Future patches will mark data and jump table ranges. The ranges are encoded
as References with offsetInAtom being the start of the range and the target
being the same atom.
llvm-svn: 213712
Over time the symbols and relocations have changed for dwarf unwind info
in the __eh_frame section. Add test cases for older and new style.
llvm-svn: 213585
Add support for adding section relocations in -r mode. Enhance the test
cases which validate the parsing of .o files to also round trip. They now
write out the .o file and then parse that, verifying all relocations survived
the round trip.
llvm-svn: 213333
All architecture specific handling is now done in the appropriate
ArchHandler subclass.
The StubsPass and GOTPass have been simplified. All architecture specific
variations in stubs are now encoded in a table which is vended by the
current ArchHandler.
llvm-svn: 213187
These behave slightly idiosyncratically in the best of cases, and have
additional hacks layered on top of that for compatibility with badly behaved
build systems (via ld64).
For -lXYZ:
+ If XYZ is actually XY.o then search all library paths for XY.o
+ Otherwise search all library paths, first for libXYZ.dylib, then libXYZ.a
+ By default the library paths are /usr/lib and /usr/local/lib in that order.
For -syslibroot:
+ -syslibroot options apply to absolute paths in the search order.
+ All -syslibroot prefixes that exist are added to the search path *instead*
of the original.
+ If no -syslibroot prefixed path exists, the original is kept.
+ Hacks^WExceptions:
+ If only 1 -syslibroot is given and doesn't contain /usr/lib or
/usr/local/lib, that path is dropped entirely. (rdar://problem/6438270).
+ If the last -syslibroot is "/", all of them are ignored entirely.
(rdar://problem/5829579).
At least, that's my best interpretation of what ld64 does in buildSearchPaths.
llvm-svn: 212706
This converts the very complicated mach-o arm
relocations into the simple Reference Kinds in lld.
The next patch will use the internal Reference kinds
to fix up arm/thumb code.
llvm-svn: 212306
When trying to map atom types to sections, we were iterating through an array
until we hit a sentinel value. There's no need for such dances when range-based
for loops are available.
llvm-svn: 212035
This isn't really the right place to put them in final object files (that would
be __TEXT,__unwind_info), but the format is different between relocatable and
final objects, which means we really need a pass to handle the translation.
For now, re-emitting in __LD,__compact_unwind is harmless (dyld ignores it and
moves straight on to inspecting __TEXT,__eh_frame), and sidesteps an assertion
failure when processing files containing compact-unwind info.
llvm-svn: 212032
Segments must occupy a multiple of the page size in memory (4096 currently). We
check for this when emitting files, but the placement algorithm broke down for
the second non-__TEXT segment encountered: the offset wasn't aligned up to 4096
before starting its layout.
llvm-svn: 212031
Because of how we were calculating fileOffset and fileSize for segments, most
ended up at a single offset in a finalised MachO file. This meant the data
often didn't even get written in the final object, let alone where it would be
useful.
llvm-svn: 212030
This is first step in reworking how mach-o relocations are processed.
The existing KindHandler is going to become a delgate/helper object for
processing architecture specific references. The KindHandler knows how
to convert mach-o relocations into References and back, as well, as fixing
up the content the relocation is on.
One of the messy things about mach-o relocations is that they sometime
come in pairs, but the pairs still convert to one lld::Reference. So, the
conversion has to detect pairs (arch specific) and change the stride.
llvm-svn: 211921
The previous function returned true for "s < s", which could completely mess up
the sorting of symbols within a section.
Unfortunately, I don't think there's a robust way to write a test for this.
Anything I come up with will be making assumptions about the particular
implementation of std::sort.
llvm-svn: 211704
When looking through sections with zero-terminated string-literals (__cstring
or __ustring) we were constantly rechecking the first few bytes of the string
for '\0' rather than advancing along. This obviously failed unless all strings
within the section had the same length as that first one.
llvm-svn: 211682
We were trying to examine the first symbol in a section that we wanted to
atomize by symbols, even when there wasn't one. Instead, we should make the
initial anonymous symbol cover the entire section in that situation.
llvm-svn: 211681
The previous commit uncovered a bug in the mach-o writer whereby two __text
sections were created. But the test case did not catch that. So I updated
the test case to run the linker a second time, reading the output of the
first pass.
llvm-svn: 210624
This provides support for the autoconfing & make build style.
The format, style and implementation follows that used within the llvm and clang projects.
TODO: implement out-of-source documentation builds.
llvm-svn: 210177
In sections that are broken into atoms at symbols, if the first symbol in the
section is not at the start of the section, then make an anonymous atom for
the section content that is before the first symbol.
llvm-svn: 210142
Previously each section kind had its own code to loop over the section and
parse it into atoms. This refactoring has two tables. The first maps sections
to ContentType. The second maps ContentType to information on how to find
the atom boundaries.
A few bugs in test cases were discovered as part of the refactoring.
No change in functionality intended.
llvm-svn: 210138
This is a short-term fix to allow lld Readers to return error messages
with dynamic content.
The long term fix will be to enhance ErrorOr<> to work with errors other
than error_code. Or to change the interface to Readers to pass down a
diagnostics object through which all error messages are written.
llvm-svn: 209681
This results in some simplifications to the code where an OwningPtr had to
be used with the previous api and then ownership moved to a unique_ptr for
the rest of lld.
llvm-svn: 203809
This restores the debug output to how it was before r197727 broke it. This
went undetected because the corresponding test was never run due to broken
feature detection.
llvm-svn: 202079
The main goal of this patch is to allow "mach-o encoded as yaml" and "native
encoded as yaml" documents to be intermixed. They are distinguished via
yaml tags at the start of the document. This will enable all mach-o test cases
to be written using yaml instead of checking in object files.
The Registry was extend to allow yaml tag handlers to be registered. The
mach-o Reader adds a yaml tag handler for the tag "!mach-o".
Additionally, this patch fixes some buffer ownership issues. When parsing
mach-o binaries, the mach-o atoms can have pointers back into the memory
mapped .o file. But with yaml encoded mach-o, name and content are ephemeral,
so a copyRefs parameter was added to cause the mach-o atoms to make their
own copy.
llvm-svn: 198986
It will configure resonable defaults for other settings in the
MachOLinkingContext object based on the parameters.
Patch by Joe Ranieri
llvm-svn: 197851
The main changes are in:
include/lld/Core/Reference.h
include/lld/ReaderWriter/Reader.h
Everything else is details to support the main change.
1) Registration based Readers
Previously, lld had a tangled interdependency with all the Readers. It would
have been impossible to make a streamlined linker (say for a JIT) which
just supported one file format and one architecture (no yaml, no archives, etc).
The old model also required a LinkingContext to read an object file, which
would have made .o inspection tools awkward.
The new model is that there is a global Registry object. You programmatically
register the Readers you want with the registry object. Whenever you need to
read/parse a file, you ask the registry to do it, and the registry tries each
registered reader.
For ease of use with the existing lld code base, there is one Registry
object inside the LinkingContext object.
2) Changing kind value to be a tuple
Beside Readers, the registry also keeps track of the mapping for Reference
Kind values to and from strings. Along with that, this patch also fixes
an ambiguity with the previous Reference::Kind values. The problem was that
we wanted to reuse existing relocation type values as Reference::Kind values.
But then how can the YAML write know how to convert a value to a string? The
fix is to change the 32-bit Reference::Kind into a tuple with an 8-bit namespace
(e.g. ELF, COFFF, etc), an 8-bit architecture (e.g. x86_64, PowerPC, etc), and
a 16-bit value. This tuple system allows conversion to and from strings with
no ambiguities.
llvm-svn: 197727
This patch adds support for converting normalized mach-o to and from binary
mach-o. It also changes WriterMachO (which previously directly wrote a
mach-o binary given a set of Atoms) to instead do it in two steps. The first
step uses normalizedFromAtoms() to convert Atoms to normalized mach-o, and the
second step uses writeBinary() which to generate the mach-o binary file.
llvm-svn: 194167
n_desc field in MachO string table was not initialized. On Unix,
test/darwin/hello-world.objtxt did not fail because I think an nlist object
is always allocated to a fresh heap initialized with zeros. On Windows,
uninitialized fields are filled with 0xCC when compiled with /GZ. Because
of that the test was failing on Windows.
llvm-svn: 193909
Enable this for the following flavors
a) core
b) gnu
c) darwin
Its disabled for the flavor PECOFF. Convenient markers are added with FIXME
comments in the Driver that would be removed and code removed from each flavor.
llvm-svn: 193585
On discussing this with Nick, it looks like the StubAtoms
that contain a lazyImmediate reference kind should be null
and the location needs to be fixed up later with some value
that is an offset into the __LINKEDIT segment.
The drawback is that it allows yaml files with references
that expect a target to be considered without one.
This results in bad yaml files that would need to be handled
in the YAML Reader.
Inorder to fix this, the Stub Atoms use a dummy target such
as itself.
llvm-svn: 193476
This is the first step in how I plan to get mach-o object files support into
lld. We need to be able to test the mach-o Reader and Write on systems without
a mach-o tools. Therefore, we want to support a textual way (YAML) to represent
mach-o files.
MachONormalizedFile.h defines an in-memory abstraction of the content of mach-o
files. The in-memory data structures are always native endianess and always
use 64-bit sizes. That internal data structure can then be converted to or
from three different formats: 1) yaml (text) encoded mach-o, 2) binary mach-o
files, 3) lld Atoms.
This patch defines the internal model and uses YAML I/O to implement the
conversion to and from the model to yaml. The next patch will implement
the conversion from normalized to binary mach-o.
This patch includes unit tests to validate the yaml conversion APIs.
llvm-svn: 192147
Changes :-
a) Functionality in InputGraph to insert Input elements at any position
b) Functionality in the Resolver to use nextFile
c) Move the functionality of assigning file ordinals to InputGraph
d) Changes all inputs to MemoryBuffers
e) Remove LinkerInput, InputFiles, ReaderArchive
llvm-svn: 192081
This patch inverts the return value of these functions, so that they return
"true" on success and "false" on failure. The meaning of boolean return value
was mixed in LLD; for example, InputGraph::validate() returns true on success.
With this patch they'll become consistent.
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1748
llvm-svn: 191341
Also change some local variable names: "ti" -> "context" and
"_targetInfo" -> "_context".
Differential Revision: http://llvm-reviews.chandlerc.com/D1301
llvm-svn: 187823
The yaml reader is not specific to any file format. This patch moves
it to TargetInfo and makes validate a non virtual interface so that it
can be constructed from a single location.
The same method will be used to create a reader for llvm bitcode
files.
llvm-svn: 183740
The major changes are:
1) LinkerOptions has been merged into TargetInfo
2) LinkerInvocation has been merged into Driver
3) Drivers no longer convert arguments into an intermediate (core) argument
list, but instead create a TargetInfo object and call setter methods on
it. This is only how in-process linking would work. That is, you can
programmatically set up a TargetInfo object which controls the linking.
4) Lots of tweaks to test suite to work with driver changes
5) Add the DarwinDriver
6) I heavily doxygen commented TargetInfo.h
Things to do after this patch is committed:
a) Consider renaming TargetInfo, given its new roll.
b) Consider pulling the list of input files out of TargetInfo. This will
enable in-process clients to create one TargetInfo the re-use it with
different input file lists.
c) Work out a way for Drivers to format the warnings and error done in
core linking.
llvm-svn: 178776
I really would have liked to split this patch up, but it would greatly
complicate the lld-core and lld drivers having to deal with both
{Reader,Writer}Option and TargetInfo.
llvm-svn: 173217
table header. Skeleton code for ReferenceKinds.
Credits:
Doxygen by Michael Spencer.
Origianl implementation from Macho by Sidney Manning.
Templatization, implementation of section header chunks, string table, ELF header by Hemant Kulkarni.
llvm-svn: 163906
Although the code is not valid to begin with. It is trying to do a raw memory
copy of a non standard-layout type. nameoffset is not guaranteed to directly
follow cmdsize.
This should be properly fixed.
llvm-svn: 158612
ivars in WriterOptionsMachO instead have its methods compute ivar interactions.
Refactor mach-o Reference Kinds and introduce abstract class KindHandler.
Split up StubAtoms.hpp by architecture. Add support for 32-bit x86 stubs.
llvm-svn: 158336
now Reader and Writer subclasses for each file format. Each Reader and
Writer subclass defines an "options" class which controls how that Reader
or Writer operates.
llvm-svn: 157774