Summary:
Previously we would try to load PDBs for every PE executable we tried to
symbolize. If that failed, we would fall back to DWARF. If there wasn't
any DWARF, we'd print mostly useless symbol information using the export
table.
With this change, we only try to load PDBs for executables that claim to
have them. If that fails, we can now print an error rather than falling
back silently. This should make it a lot easier to diagnose and fix
common symbolization issues, such as not having DIA or not having a PDB.
Reviewers: zturner, eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20982
llvm-svn: 271725
This opens the door to introducing a YAML outputter which can be
used for machine consumption. Currently the yaml output style
is unimplemented and returns an error if you try to use it.
Reviewed By: rnk, ruiu
Differential Revision: http://reviews.llvm.org/D20967
llvm-svn: 271712
This re-applies r271611, and hopefully the bots won't break this time.
Although ld64 always outputs linkedit data in the same order, it isn't actually required to. This change makes yaml2obj resilient if the offsets are in arbitrary order.
llvm-svn: 271687
When printing line information and file checksums, we were printing
the file offset field from the struct header. This teaches
llvm-pdbdump how to turn those numbers into the filename. In the
case of file checksums, this is done by looking in the global
string table. In the case of line contributions, this is done
by indexing into the file names buffer of the DBI stream. Why
they use a different technique I don't know.
llvm-svn: 271630
To facilitate this, a couple of changes had to be made:
1. `ModuleSubstream` got moved from `DebugInfo/PDB` to
`DebugInfo/CodeView`, and various codeview related types are defined
there. It turns out `DebugInfo/CodeView/Line.h` already defines many of
these structures, but this is really old code that is not endian aware,
doesn't interact well with `StreamInterface` and not very helpful for
getting stuff out of a PDB. Eventually we should migrate the old readobj
`COFFDumper` code to these new structures, or at least merge their
functionality somehow.
2. A `ModuleSubstream` visitor is introduced. Depending on where your
module substream array comes from, different subsets of record types can
be expected. We are already hand parsing these substream arrays in many
places especially in `COFFDumper.cpp`. In the future we can migrate these
paths to the visitor as well, which should reduce a lot of code in
`COFFDumper.cpp`.
Differential Revision: http://reviews.llvm.org/D20936
Reviewed By: ruiu, majnemer
llvm-svn: 271621
Although ld64 always outputs linkedit data in the same order, it isn't actually required to. This change makes yaml2obj resilient if the offsets are in arbitrary order.
llvm-svn: 271611
This commit adds round tripping for MachO symbol data. Symbols are entries in the name list, that contain offsets into the string table which is at the end of the __LINKEDIT segment.
llvm-svn: 271604
This first pass only splits apart the records and dumps the line
info kinds and binary data. Subsequent patches will parse out
the binary data into more useful information and dump it in
detail.
llvm-svn: 271576
Unlike other sections that can grow to any size, the COFF section header
stream has maximum length because each record is fixed size and the COFF
file format limits the maximum number of sections. So I decided to not
create a specific stream class for it. Instead, I added a member function
to DbiStream class which returns a vector of COFF headers.
Differential Revision: http://reviews.llvm.org/D20717
llvm-svn: 271557
This directory is used to find if there is a PDB associated with an
executable. I plan to use this functionality to teach llvm-symbolizer
whether it should use DIA or DWARF to symbolize a given DLL.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D20885
llvm-svn: 271539
The new option makes it possible to build external projects as part of
the llvm build without copying (or symlinking) then into llvm/tool with
specifying a few additional cmake variables.
Example usage (2 additional project called foo and bar):
-DLLVM_EXTERNAL_PROJECTS="Foo;Bar"
-DLLVM_EXTERNAL_FOO_SOURCE_DIR=/src/foo
-DLLVM_EXTERNAL_BAR_SOURCE_DIR=/src/bar
Note: This is the extension of the approach we already support for
clang/lldb/poly with adding an option to specify additional supported
projects.
Differential revision: http://reviews.llvm.org/D20838
llvm-svn: 271440
This tidies up some code that was manually constructing RuntimeDyld::SymbolInfo
instances from JITSymbols. It will save more mess in the future when
JITSymbol::getAddress is extended to return an Expected<TargetAddress> rather
than just a TargetAddress, since we'll be able to embed the error checking in
the conversion.
llvm-svn: 271350
when the object is from a slice of a Mach-O Universal Binary use something like
"foo.o (for architecture i386)" as part of the error message when expected.
Also fixed places in these tools that were ignoring object file errors from
MachOUniversalBinary::getAsObjectFile() when the code moved on to see if
the slice was an archive.
To do this MachOUniversalBinary::getAsObjectFile() and
MachOUniversalBinary::getObjectForArch() were changed from returning
ErrorOr<...> to Expected<...> then that was threaded up to its users.
Converting these interfaces to Expected<> from ErrorOr<> does involve
touching a number of places. To contain the changes for now the use of
errorToErrorCode() is still used in two places yet to be fully converted.
llvm-svn: 271332
Adds the method MCStreamer::EmitBinaryData, which is usually an alias
for EmitBytes. In the MCAsmStreamer case, it is overridden to emit hex
dump output like this:
.byte 0x0e, 0x00, 0x08, 0x10
.byte 0x03, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x10, 0x00, 0x00
Also, when verbose asm comments are enabled, this patch prints the dump
output for each comment before its record, like this:
# ArgList (0x1000) {
# TypeLeafKind: LF_ARGLIST (0x1201)
# NumArgs: 0
# Arguments [
# ]
# }
.byte 0x06, 0x00, 0x01, 0x12
.byte 0x00, 0x00, 0x00, 0x00
This should make debugging easier and testing more convenient.
Reviewers: aaboud
Subscribers: majnemer, zturner, amccarth, aaboud, llvm-commits
Differential Revision: http://reviews.llvm.org/D20711
llvm-svn: 271313
The MachO export trie is a serially encoded trie keyed by symbol name. This code parses the trie and preserves the structure so that it can be dumped again.
llvm-svn: 271300
This converts remaining uses of ByteStream, which was still
left in the symbol stream and type stream, to using the new
StreamInterface zero-copy classes.
RecordIterator is finally deleted, so this is the only way left
now. Additionally, more error checking is added when iterating
the various streams.
With this, the transition to zero copy pdb access is complete.
llvm-svn: 271101
Fix: updated clang code which was not updated by mistake.
Original commit message:
[llvm-mc] - Teach llvm-mc to generate zlib styled compression sections.
This patch is strongly based on previously reverted D20331.
(because of gnuutils < 2.26 does not support compressed debug sections in non zlib-gnu style)
Difference that this patch supports both zlib and zlib-gnu styles.
-compress-debug-sections option now supports next values:
-compress-debug-sections=zlib-gnu
-compress-debug-sections=zlib
-compress-debug-sections=none
Previously specifying -compress-debug-sections enabled zlib-gnu compression,
so anyone can put "-compress-debug-sections=zlib-gnu" to restore the behavior
that was before this patch for case when compression was enabled.
Differential revision: http://reviews.llvm.org/D20676
llvm-svn: 270987
It broke buildbot:
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/13585/steps/build/logs/stdio
Initial commit message:
[llvm-mc] - Teach llvm-mc to generate zlib styled compression sections.
This patch is strongly based on previously reverted D20331.
(because of gnuutils < 2.26 does not support compressed debug sections in non zlib-gnu style)
Difference that this patch supports both zlib and zlib-gnu styles.
-compress-debug-sections option now supports next values:
-compress-debug-sections=zlib-gnu
-compress-debug-sections=zlib
-compress-debug-sections=none
Previously specifying -compress-debug-sections enabled zlib-gnu compression,
so anyone can put "-compress-debug-sections=zlib-gnu" to restore the behavior
that was before this patch for case when compression was enabled.
Differential revision: http://reviews.llvm.org/D20676
llvm-svn: 270978
This patch is strongly based on previously reverted D20331.
(because of gnuutils < 2.26 does not support compressed debug sections in non zlib-gnu style)
Difference that this patch supports both zlib and zlib-gnu styles.
-compress-debug-sections option now supports next values:
-compress-debug-sections=zlib-gnu
-compress-debug-sections=zlib
-compress-debug-sections=none
Previously specifying -compress-debug-sections enabled zlib-gnu compression,
so anyone can put "-compress-debug-sections=zlib-gnu" to restore the behavior
that was before this patch for case when compression was enabled.
Differential revision: http://reviews.llvm.org/D20676
llvm-svn: 270977
This will be needed in order to consistently return an Error
to clients of the API being developed in D20268.
Differential Revision: http://reviews.llvm.org/D20550
llvm-svn: 270967
Since we want to move toward zero-copy access to stream data, we
want to remove all instances of copying operations. So get rid
of some of those here.
Differential Revision: http://reviews.llvm.org/D20720
Reviewed By: ruiu
llvm-svn: 270960
PDBs can be extremely large. We're already mapping the entire
PDB into the process's address space, but to make matters worse
the blocks of the PDB are not arranged contiguously. So, when
we have something like an array or a string embedded into the
stream, we have to make a copy. Since it's convenient to use
traditional data structures to iterate and manipulate these
records, we need the memory to be contiguous.
As a result of this, we were using roughly twice as much memory
as the file size of the PDB, because every stream was copied
out and re-stitched together contiguously.
This patch addresses this by improving the MappedBlockStream
to allocate from a BumpPtrAllocator only when a read requires
a discontiguous read. Furthermore, it introduces some data
structures backed by a stream which can iterate over both
fixed and variable length records of a PDB. Since everything
is backed by a stream and not a buffer, we can read almost
everything from the PDB with zero copies.
Differential Revision: http://reviews.llvm.org/D20654
Reviewed By: ruiu
llvm-svn: 270951
This adds support for YAML round tripping dyld info lazy bindings. The storage and format of these is the same as regular bind opcodes, they are just interpreted differently by dyld, and can have DONE opcodes in the middle of the opcode lists.
llvm-svn: 270920
This adds support for YAML round tripping dyld info weak bindings. The storage and format of these is the same as regular bind opcodes, they are just interpreted differently by dyld.
llvm-svn: 270911
This adds support for YAML round tripping dyld info bind opcodes. Bind opcodes can have signed or unsigned LEB128 data, and they can have symbols associated with them.
llvm-svn: 270901
At some point we're going to need libObject to have this dependency, but as it is now this is causing too many headaches. This commit will reduce the linkage to just llvm-objdump where it is strictly needed, and we'll cross the libObject bridge later when we need it.
llvm-svn: 270866
Summary:
Several changes were required for ThinLTO links involving bitcode
archive static libraries. With this patch clang/llvm bootstraps with
ThinLTO and gold.
The first is that the gold callbacks get_input_file and
release_input_file can normally be used to get file information for
each constituent bitcode file within an archive. However, these
interfaces lock the underlying file and can't be for each archive
constituent for ThinLTO backends where we get all the input files up
front and don't release any until after the backend threads complete.
However, it is sufficient to only get and release once per file, and
then each consituent bitcode file can be accessed via get_view. This
required saving some information to identify which file handle is the
"leader" for each claimed file sharing the same file descriptor, and
other information so that get_input_file isn't necessary later when
processing the backends.
Second, the module paths in the index need to distinguish between
different constituent bitcode files within the same archive file,
otherwise they will all end up with the same archive file path.
Do this by appending the offset within the archive for the start of the
bitcode file, returned by get_input_file when we claim each bitcode file,
and saving that along with the file handle.
Third, rather than have the function importer try to load a file based
on the module path identifier (which now contains a suffix to
distinguish different bitcode files within an archive), use a custom
module loader. This is the same approach taken in libLTO, and I am using
the support refactored into the new LTO.h header in r270509. The module
loader parses the bitcode files out of the memory buffers returned from
gold via the get_view callback and saved in a map. This also means that
we call the function importer directly, rather than add it to the pass
pipeline (which was in the plan to do already for other reasons).
Reviewers: pcc, joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D20559
llvm-svn: 270814
This is a support COFF feature. Ensure that we can display the weak externals
auxiliary symbol. It contains useful information (such as the default binding
and how to resolve the symbol).
This reapplies the previous patch with a modification which hopefully should fix
the endianness issues. The variadic call would promote the ulittle32_t to a
uint32_t which would lose the byte-swapping behaviour desired.
llvm-svn: 270813
Richard Smith identified this in post commit review of r270466. The
string sections in particular (in the future, possibly all sections - so
I'm not going to bother pulling out just the string sections for the
extra lifetime handling right now) need to remain valid during
processing of all inputs so that elements of the DWPStringPool can be
looked up repeatedly without having to make in-memory copies of string
contents in the noncompressed case (more common in dwp+dwp merge steps
where the memory is a bigger problem because the files are larger).
Using the SmallVector (or any vector) a reallocation on push_back could
cause any of the nested SmallStrings in small mode to move in memory and
invalid pointers to their contents. Using a deque the SmallStrings will
never move around since no elements are removed from the container.
llvm-svn: 270797
We have need to reuse this functionality, including making
additional generic stream types that are smarter about how and
when they copy memory versus referencing the original memory.
So all of these structures belong in the common library
rather than being pdb specific.
llvm-svn: 270751
This should actually address PR27855. This results in adding references to the system libs inside generated dylibs so that they get correctly pulled in when linking against the dylib.
llvm-svn: 270723
searching for external symbols, and fall back to the SymbolResolver::findSymbol
method if the former returns null.
This makes RuntimeDyld behave more like a static linker: Symbol definitions
from within the current module's "logical dylib" will be preferred to
external definitions. We can build on this behavior in the future to properly
support weak symbol handling.
Custom symbol resolvers that override the findSymbolInLogicalDylib method may
notice changes due to this patch. Clients who have not overridden this method
should generally be unaffected, however users of the OrcMCJITReplacement class
may notice changes.
llvm-svn: 270716
This is a support COFF feature. Ensure that we can display the weak externals
auxiliary symbol. It contains useful information (such as the default binding
and how to resolve the symbol).
llvm-svn: 270648
This adds support for parsing and dumping the following
symbol types:
S_LPROCREF
S_ENVBLOCK
S_COMPILE2
S_REGISTER
S_COFFGROUP
S_SECTION
S_THUNK32
S_TRAMPOLINE
As of this patch, the test PDB files no longer have any unknown
symbol types.
llvm-svn: 270628
When dumping huge PDB files, too many of the options were grouped
together so you would get neverending spew of output. This patch
introduces more granular display options so you can only dump the
fields you actually care about.
llvm-svn: 270607
This makes use of the newly introduced `CVSymbolVisitor` to dump details
of each type of symbol record in the symbol streams. Future patches will
bring this visitor based dumping to the publics stream, as well as
creating a `SymbolDumpDelegate` to print more information about
relocations etc.
Differential Revision: http://reviews.llvm.org/D20545
Reviewed By: ruiu
llvm-svn: 270585
to llvm-objdump. This section is created with -fembed-bitcode option.
This requires the use of libxar and the Cmake and lit support were crafted by
Chris Bieneman!
rdar://26202242
llvm-svn: 270491
This will pave the way to introduce a full fledged symbol visitor
similar to how we have a type visitor, thus allowing the same
dumping code to be used in llvm-readobj and llvm-pdbdump.
Differential Revision: http://reviews.llvm.org/D20384
Reviewed By: rnk
llvm-svn: 270475
Now that the string pool is referential rather than maintaining its own
copy of string data, compressed sections (well, technically only the
debug_str section*) need to be preserved for the lifetime of the pool to
match.
* I'm not currently optimizing for memory footprint with compressed
input - the major memory limit I'm hitting is on dwp+dwp merge steps
and we aren't currently compressing contents in dwp files, just in the
.dwo inputs.
llvm-svn: 270462
Main problem here was that SHF_COMPRESSED has the same value with
XCORE_SHF_CP_SECTION, which was included as standart (common) flag.
As far I understand xCore is a family of controllers and it that
means it's constant should be processed separately,
only if e_machine == EM_XCORE, otherwise llvm-readobj would output
different constants twice for compressed section:
Flags [
..
SHF_COMPRESSED (0x800)
..
XCORE_SHF_CP_SECTION (0x800)
..
]
what probably does not make sence if you're not working with xcore file.
Differential revision: http://reviews.llvm.org/D20273
llvm-svn: 270320
This fills section data with 0xDEADBEEF and segment data not inside a section with 0xBAADDA7A. This results in yaml2obj generating a matching size object file. Any additional bytes in the file are zero'd.
This is a starting point for populating the remaining segment data, and provides a hex viewable file that you can easily see the missing data in.
llvm-svn: 270286
In addition to clarifying the warning message this contains a minor functional
change in that it now warns if the *immediate* parent directory in which the
missing PCM is expected to be isn't found.
This patch also includes a more comprehensive testcase.
rdar://problem/25860711
llvm-svn: 270269
DBI stream contains a stream number of the symbol record stream.
Symbol record streams is an array of length-type-value members.
Each member represents one symbol.
Publics stream contains offsets to the symbol record stream.
This patch is to print out all symbols that are referenced by
the publics stream.
Note that even with this patch, llvm-pdbdump cannot dump all the
information in a publics stream since it contains more information
than symbol names. I'll improve it in followup patches.
Differential Revision: http://reviews.llvm.org/D20480
llvm-svn: 270262
Now that MachO load command fields are fully covered we can fill unaccounted for bytes with 0. That allows us to sparsely specify YAML to simplify tests.
Simplifying load_commands test accordingly.
llvm-svn: 270158
This removes the subclasses of ProfileSummary, moves the members of the derived classes to the base class.
Differential Revision: http://reviews.llvm.org/D20390
llvm-svn: 270143
This splits ProfileSummary into two classes: a ProfileSummary class that has methods to convert from/to metadata and a ProfileSummaryBuilder class that computes the profiles summary which is in ProfileData.
Differential Revision: http://reviews.llvm.org/D20314
llvm-svn: 270136
This re-applies r270115.
Many of the MachO load commands can have data appended after the command structure. This data is frequently strings, but can actually be anything. This patch adds support for three optional fields on load command yaml descriptions.
The new PayloadString YAML field is populated with the data after load commands known to have strings as extra data.
The new ZeroPadBytes YAML field is a count of zero'd bytes after the end of the load command structure before the next command. This can apply anywhere in the file. MachO2YAML verifies that bytes are zero before populating this field, and YAML2MachO will add zero'd bytes.
The new PayloadBytes YAML field stores all bytes after the end of the load command structure before the next command if they are non-zero. This is a catch all for all unhandled bytes. If MachO2Yaml populates PayloadBytes it will not populate ZeroPadBytes, instead zero'd bytes will be in the PayloadBytes structure.
llvm-svn: 270124