Commit Graph

50 Commits

Author SHA1 Message Date
Rui Ueyama 7f8ca6e367 lld: Fix initial Mach-O load commands size calculation omitting LC_FUNCTION_STARTS
Patch by Nicholas Allegra.

The Mach-O writer calculates the size of load commands multiple times.

First, Util::assignAddressesToSections() (in MachONormalizedFileFromAtoms.cpp)
calculates the size using headerAndLoadCommandsSize() (in
MachONormalizedFileBinaryWriter.cpp), which creates a temporary
MachOFileLayout for the NormalizedFile, only to retrieve its
headerAndLoadCommandsSize.  Later, writeBinary() (in
MachONormalizedFileBinaryWriter.cpp) creates a new layout and uses the offsets
from that layout to actually write out everything in the NormalizedFile.

But the NormalizedFile changes between the first computation and the second.
When Util::assignAddressesToSections is called, file.functionStarts is always
empty because Util::addFunctionStarts has not yet been called. Yet
MachOFileLayout decides whether to include a LC_FUNCTION_STARTS command based
on whether file.functionStarts is nonempty. Therefore, the initial computation
always omits it.

Because padding for the __TEXT segment (to make its size a multiple of the
page size) is added between the load commands and the first section, LLD still
generates a valid binary as long as the amount of padding happens to be large
enough to fit LC_FUNCTION_STARTS command, which it usually is.

However, it's easy to reproduce the issue by adding a section of a precise
size. Given foo.c:

  __attribute__((section("__TEXT,__foo")))
  char foo[0xd78] = {0};

Run:

  clang -dynamiclib -o foo.dylib foo.c -fuse-ld=lld -install_name
  /usr/lib/foo.dylib
  otool -lvv foo.dylib

This should produce:

  truncated or malformed object (offset field of section 1 in LC_SEGMENT_64
  command 0 not past the headers of the file)

This commit:

 - Changes MachOFileLayout to always assume LC_FUNCTION_STARTS is present for
   the initial computation, as long as generating LC_FUNCTION_STARTS is
   enabled. It would be slightly better to check whether there are actually
   any functions, since no LC_FUNCTION_STARTS will be generated if not, but it
   doesn't cause a problem if the initial computation is too high.

 - Adds a test.

 - Adds an assert in MachOFileLayout::writeSectionContent() that we are not
   writing section content into the load commands region (which would happen
   if the offset was calculated too low due to the initial load commands size
   calculation being too low).  Adds an assert in
   MachOFileLayout::writeLoadCommands to validate a similar situation where
   two size-of-load-commands computations are expected to be equivalent.

llvm-svn: 358545
2019-04-17 01:47:16 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
David Blaikie fd872e9637 MachONormalizedFile.h: Remove unimplemented function
dump had no definition, so op<< was never usable anyway - remove the
definition of the latter and the declaration of the former.

llvm-svn: 318880
2017-11-22 21:10:19 +00:00
Rui Ueyama 3f851704c1 Move new lld's code to Common subdirectory.
New lld's files are spread under lib subdirectory, and it isn't easy
to find which files are actually maintained. This patch moves maintained
files to Common subdirectory.

Differential Revision: https://reviews.llvm.org/D37645

llvm-svn: 314719
2017-10-02 21:00:41 +00:00
Zachary Turner 264b5d9e88 Move Object format code to lib/BinaryFormat.
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.

Differential Revision: https://reviews.llvm.org/D33843

llvm-svn: 304864
2017-06-07 03:48:56 +00:00
Lang Hames 242fde1b36 [lld][MachO] Remove some debugging output code that was mistakenly left in in
r276935.

llvm-svn: 276944
2016-07-28 00:28:48 +00:00
Lang Hames 436f7d6606 [lld][MachO] Re-apply r276921 with fix - initialize strings for debug string
copies.

llvm-svn: 276935
2016-07-27 22:55:30 +00:00
Lang Hames f2260567ca [lld][MachO] Temporarily revert r276921 - it's causing bot-failures on Linux.
llvm-svn: 276928
2016-07-27 22:46:02 +00:00
Lang Hames 560333749f [lld][MachO] Add debug info support for MachO.
This patch causes LLD to build stabs debugging symbols for files containing
DWARF debug info, and to propagate existing stabs symbols for object files
built using '-r' mode. This enables debugging of binaries generated by LLD
from MachO objects.

llvm-svn: 276921
2016-07-27 21:31:25 +00:00
Pete Cooper 2f6216c181 Use Expected<T> instead of ErrorOr<T>in yaml reader. NFC
llvm-svn: 264981
2016-03-31 01:13:04 +00:00
Pete Cooper c6e7b8146a Convert readBinary to llvm::Error. NFC
llvm-svn: 264973
2016-03-30 23:58:24 +00:00
Pete Cooper ec4e166a5a Convert normalized file to atoms methods to new error handling. NFC.
This converts almost all of the error handling in atom creation
to llvm::Error instead of std::error_code.

llvm-svn: 264968
2016-03-30 23:43:27 +00:00
Pete Cooper fefbd22814 Convert lld file writing to llvm::Error. NFC.
This converts the writeFile method, as well as some of the ones it calls
in the normalized binary file writer and yaml writer.

llvm-svn: 264961
2016-03-30 23:10:39 +00:00
Pete Cooper 3f564a52d0 Parsed alignment should be a power of 2.
The .o path always makes sure to store a power of 2 value in the
Section alignment.  However, the YAML code didn't verify this.

Added verification and updated all the tests which had a 3 but meant
to have 2^3.

llvm-svn: 264228
2016-03-24 00:36:37 +00:00
Pete Cooper 9b28a4559e Add cmdline options for LC_DATA_IN_CODE load command.
Also added the defaults for whether to generate this load command, which
the cmdline options are able to override.

There was also a difference to ld64 which is fixed here in that ld64 will
generate an empty data in code command if requested.

rdar://problem/24472630

llvm-svn: 260191
2016-02-09 02:10:39 +00:00
Pete Cooper 41f3e8e408 Generate LC_FUNCTION_STARTS load command.
This load command generates data in the LINKEDIT section which
is a list of ULEB128 delta's to all of the functions in the __text section.

It is then 0 terminated and pointer aligned to pad.

ld64 exposes the -function-starts and no-function-starts cmdline options
to override behaviour from the defaults based on file types.

rdar://problem/24472630

llvm-svn: 260188
2016-02-09 01:38:13 +00:00
Pete Cooper d75b7181df Move includes inside guards. NFC.
These includes were before the #ifndef/#define guards.  Moving them to
where they should be.

llvm-svn: 260153
2016-02-08 21:50:45 +00:00
Pete Cooper b8fec3ea62 Set max segment protection level.
The initial segment protection was also being used to set the maximum
segment protection level.  Instead, the maximum should be set according
to the architecture we are linking.  For example on Mac OS it should be
RWX on most pages, but on iOS is often on R_X.

rdar://problem/24515136

llvm-svn: 259966
2016-02-06 00:51:16 +00:00
Pete Cooper ceee5de088 Generate version min load commands when the platform is unknown.
In the case where we are emitting to an object file, the platform is
possibly unknown, and the source object files contained load commands
for version min, we can take the maximum of those min versions and
emit in in the output object file.

This test also tests r259739.

llvm-svn: 259742
2016-02-04 02:16:08 +00:00
Pete Cooper 354809e139 Add generation of LC_VERSION_MIN load commands.
If the command line contains something like -macosx_version_min and we
don't explicitly disable generation with -no_version_load_command then
we generate the LC_VERSION_MIN command in the output file.

There's a couple of FIXME's in here.  These will be handled soon with
more tests but I didn't want to grow this patch any more than it already was.

rdar://problem/24472630

llvm-svn: 259718
2016-02-03 22:28:29 +00:00
Pete Cooper e5fa5a3c29 Add more debugging output to MachO lld. NFC.
In debug builds there's now a dump method on Section and improved
printing of atoms.

llvm-svn: 255826
2015-12-16 22:03:21 +00:00
Lang Hames ac2adce66b [lld][MachO] Recognize __thread_bss sections as zero-fill and set all the
appropriate bits.

This fixes the remaining clang regression test failures when linking clang with
lld on Darwin.

llvm-svn: 255390
2015-12-11 23:25:09 +00:00
Benjamin Kramer cfacc9d3e6 [MachO] Initialize all fields of NormalizedFile.
The ObjectFileYAML.roundTrip serializes a default-constructed
NormalizedFile to YAML, triggering uninitialized memory reads.

While there use in-class member initializers.

llvm-svn: 240446
2015-06-23 19:55:04 +00:00
Lang Hames 65a64c9c29 [LLD] Add support for the -stack_size option to Darwin ld.
llvm-svn: 237841
2015-05-20 22:10:50 +00:00
Rafael Espindola ed48e53d60 Use MemoryBufferRef instead of MemoryBuffer&. NFC.
This just reduces the noise from another patch.

llvm-svn: 235933
2015-04-27 22:48:51 +00:00
Rui Ueyama 629f964d50 Use arithmetic type to represent alignments (not in log2) everywhere.
This is the final step of conversion. Now log2 numbers are removed
from everywhere!

llvm-svn: 233246
2015-03-26 02:20:25 +00:00
Rui Ueyama f006f4d62c Define an implicit constructor which takes actual alignment value to PowerOf2.
The new constructor's type is the same, but this one takes not a log2
value but an alignment value itself, so the meaning is totally differnet.

llvm-svn: 233244
2015-03-26 01:44:01 +00:00
Rui Ueyama 48865ca64d Make PowerOf2's constructor private.
Ban conversion from integers to PowerOf2 even if explicit
to make all places we create PowerOf2 instances visible.

llvm-svn: 233243
2015-03-26 01:29:06 +00:00
Rui Ueyama c3d18f5120 Remove implicit constructor and operator int from PowerOf2.
This patch is to make instantiation and conversion to an integer explicit,
so that we can mechanically replace all occurrences of the class with
integer in the next step.

Now get() returns an alignment value rather than its log2 value.

llvm-svn: 233242
2015-03-26 01:12:32 +00:00
Rui Ueyama 1d510428e8 Separate file parsing from File's constructors.
This is a second patch for InputGraph cleanup.

Sorry about the size of the patch, but what I did in this
patch is basically moving code from constructor to a new
method, parse(), so the amount of new code is small.
This has no change in functionality.

We've discussed the issue that we have too many classes
to represent a concept of "file". We have File subclasses
that represent files read from disk. In addition to that,
we have bunch of InputElement subclasses (that are part
of InputGraph) that represent command line arguments for
input file names. InputElement is a wrapper for File.

InputElement has parseFile method. The method instantiates
a File. The File's constructor reads a file from disk and
parses that.

Because parseFile method is called from multiple worker
threads, file parsing is processed in parallel. In other
words, one reason why we needed the wrapper classes is
because a File would start reading a file as soon as it
is instantiated.

So, the reason why we have too many classes here is at
least partly because of the design flaw of File class.
Just like threads in a good threading library, we need
to separate instantiation from "start" method, so that
we can instantiate File objects when we need them (which
should be very fast because it involves only one mmap()
and no real file IO) and use them directly instead of
the wrapper classes. Later, we call parse() on each
file in parallel to let them do actual file IO.

In this design, we can eliminate a reason to have the
wrapper classes.

In order to minimize the size of the patch, I didn't go so
far as to replace the wrapper classes with File classes.
The wrapper classes are still there.

In this patch, we call parse() immediately after
instantiating a File, so this really has no change in
functionality. Eventually the call of parse() should be
moved to Driver::link(). That'll be done in another patch.

llvm-svn: 224102
2014-12-12 07:31:09 +00:00
Nick Kledzik 5b9e48b4ce [mach-o] propagate dylib version numbers
Mach-o does not use a simple SO_NEEDED to track dependent dylibs.  Instead,
the linker copies four things from each dylib to each client: the runtime path
(aka "install name"), the build time, current version (dylib build number), and
compatibility version  The build time is no longer used (it cause every rebuild
of a dylib to be different).  The compatibility version is usually just 1.0
and never changes, or the dylib becomes incompatible.

This patch copies that information into the NormalizedMachO format and
propagates it to clients.

llvm-svn: 222300
2014-11-19 02:21:53 +00:00
Shankar Easwaran 2b67fca033 Sort include files according to convention.
llvm-svn: 220131
2014-10-18 05:33:55 +00:00
Nick Kledzik 14b5d208cb [mach-o] Support fat archives
mach-o supports "fat" files which are a header/table-of-contents followed by a
concatenation of mach-o files (or archives of mach-o files) built for
different architectures.  Previously, the support for fat files was in the
MachOReader, but that only supported fat .o files and dylibs (not archives).

The fix is to put the fat handing into MachOFileNode.  That way any input file
kind (including archives) can be fat.  MachOFileNode selects the sub-range
of the fat file that matches the arch being linked and creates a MemoryBuffer
for just that subrange.

llvm-svn: 219268
2014-10-08 01:48:10 +00:00
Nick Kledzik 1bebb2832e [mach-o] Add support for arm64 (AAarch64)
Most of the changes are in the new file ArchHandler_arm64.cpp.  But a few
things had to be fixed to support 16KB pages (instead of 4KB) which iOS arm64
requires.  In addition the StubInfo struct had to be expanded because
arm64 uses two instruction (ADRP/LDR) to load a global which requires two
relocations.  The other mach-o arches just needed one relocation.

llvm-svn: 217469
2014-09-09 23:52:59 +00:00
Nick Kledzik 635f9c7158 [mach-o] Let darwin driver infer arch from .o files if -arch not used.
Mach-O has a "fat" (or "universal") variant where the same contents built for
different architectures are concatenated into one file with a table-of-contents
header at the start.  But this leaves a dilemma for the linker - which
architecture to use.

Normally, the linker command line -arch is used to force which slice of any fat
files are used.  The clang compiler always passes -arch to the linker when
invoking it.  But some Makefiles invoke the linker directly and don’t specify
the -arch option.  For those cases, the linker scans all input files in command
line order and finds the first non-fat object file.  Whatever architecture it
is becomes the architecture for the link.

llvm-svn: 217189
2014-09-04 20:08:30 +00:00
Nick Kledzik 21921375cc [mach-o] Add support for LC_DATA_IN_CODE
Sometimes compilers emit data into code sections (e.g. constant pools or
jump tables). These runs of data can throw off disassemblers.  The solution
in mach-o is that ranges of data-in-code are encoded into a table pointed to
by the LC_DATA_IN_CODE load command.

The way the data-in-code information is encoded into lld's Atom model is that
that start and end of each data run is marked with a Reference whose offset
is the start/end of the data run.  For arm, the switch back to code also marks
whether it is thumb or arm code.

llvm-svn: 213901
2014-07-24 23:06:56 +00:00
Nick Kledzik 378066c80e [mach-o] improve errors when mixing architectures
llvm-svn: 212072
2014-06-30 22:57:33 +00:00
Rafael Espindola b1a4d3a26c Don't import error_code into the lld namespace.
llvm-svn: 210785
2014-06-12 14:53:47 +00:00
Rui Ueyama bc69bce7de [MachO] Remove "virtual" and add "override".
llvm-svn: 205057
2014-03-28 21:36:33 +00:00
Alexey Samsonov 8e6829e436 Remove extra semicolon for -Wpedantic
llvm-svn: 204219
2014-03-19 09:38:31 +00:00
Shankar Easwaran 3d8de47f76 Fix trailing whitespace.
llvm-svn: 200182
2014-01-27 03:09:26 +00:00
Joey Gouly 010b37691d [MachO] Begin support for reading fat binaries.
llvm-svn: 199259
2014-01-14 22:32:38 +00:00
Nick Kledzik 6edd722a2c [mach-o] enable mach-o and native yaml to be intermixed
The main goal of this patch is to allow "mach-o encoded as yaml" and "native
encoded as yaml" documents to be intermixed.  They are distinguished via 
yaml tags at the start of the document.  This will enable all mach-o test cases
to be written using yaml instead of checking in object files.

The Registry was extend to allow yaml tag handlers to be registered.  The
mach-o Reader adds a yaml tag handler for the tag "!mach-o". 

Additionally, this patch fixes some buffer ownership issues.  When parsing
mach-o binaries, the mach-o atoms can have pointers back into the memory 
mapped .o file.  But with yaml encoded mach-o, name and content are ephemeral, 
so a copyRefs parameter was added to cause the mach-o atoms to make their
own copy.  

llvm-svn: 198986
2014-01-11 01:07:43 +00:00
Joey Gouly ceb16dedef [MachO] Begin to add some MachO specific File/Atoms, and add the start of
normalizedToAtoms.

llvm-svn: 198459
2014-01-03 23:12:02 +00:00
Rui Ueyama 170a1a892e Run clang-format on r197727.
llvm-svn: 197788
2013-12-20 07:48:29 +00:00
Nick Kledzik e555277780 [lld] Introduce registry and Reference kind tuple
The main changes are in:
  include/lld/Core/Reference.h
  include/lld/ReaderWriter/Reader.h
Everything else is details to support the main change.

1) Registration based Readers
Previously, lld had a tangled interdependency with all the Readers.  It would
have been impossible to make a streamlined linker (say for a JIT) which
just supported one file format and one architecture (no yaml, no archives, etc).
The old model also required a LinkingContext to read an object file, which
would have made .o inspection tools awkward.

The new model is that there is a global Registry object. You programmatically 
register the Readers you want with the registry object. Whenever you need to 
read/parse a file, you ask the registry to do it, and the registry tries each 
registered reader.

For ease of use with the existing lld code base, there is one Registry
object inside the LinkingContext object. 


2) Changing kind value to be a tuple
Beside Readers, the registry also keeps track of the mapping for Reference
Kind values to and from strings.  Along with that, this patch also fixes
an ambiguity with the previous Reference::Kind values.  The problem was that
we wanted to reuse existing relocation type values as Reference::Kind values.
But then how can the YAML write know how to convert a value to a string? The
fix is to change the 32-bit Reference::Kind into a tuple with an 8-bit namespace
(e.g. ELF, COFFF, etc), an 8-bit architecture (e.g. x86_64, PowerPC, etc), and
a 16-bit value.  This tuple system allows conversion to and from strings with 
no ambiguities.

llvm-svn: 197727
2013-12-19 21:58:00 +00:00
Rui Ueyama 014192dbda Fix include guards.
llvm-svn: 194776
2013-11-15 03:09:26 +00:00
Rui Ueyama c1800beb55 Remove unnecessary namespace qualifier.
llvm-svn: 194037
2013-11-05 01:37:40 +00:00
Nick Kledzik 369ffd1c55 fix typos
llvm-svn: 192154
2013-10-08 02:07:19 +00:00
Nick Kledzik 30332b19d3 Supoort mach-o encoded in yaml.
This is the first step in how I plan to get mach-o object files support into 
lld. We need to be able to test the mach-o Reader and Write on systems without 
a mach-o tools. Therefore, we want to support a textual way (YAML) to represent 
mach-o files.

MachONormalizedFile.h defines an in-memory abstraction of the content of mach-o  
files. The in-memory data structures are always native endianess and always
use 64-bit sizes. That internal data structure can then be converted to or
from three different formats: 1) yaml (text) encoded mach-o, 2) binary mach-o
files, 3) lld Atoms.

This patch defines the internal model and uses YAML I/O to implement the 
conversion to and from the model to yaml. The next patch will implement
the conversion from normalized to binary mach-o.

This patch includes unit tests to validate the yaml conversion APIs.

llvm-svn: 192147
2013-10-08 00:43:34 +00:00