forked from OSchip/llvm-project
407 lines
18 KiB
ReStructuredText
407 lines
18 KiB
ReStructuredText
Symbolication
|
|
=============
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
|
|
LLDB is separated into a shared library that contains the core of the debugger,
|
|
and a driver that implements debugging and a command interpreter. LLDB can be
|
|
used to symbolicate your crash logs and can often provide more information than
|
|
other symbolication programs:
|
|
|
|
- Inlined functions
|
|
- Variables that are in scope for an address, along with their locations
|
|
|
|
The simplest form of symbolication is to load an executable:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) target create --no-dependents --arch x86_64 /tmp/a.out
|
|
|
|
We use the ``--no-dependents`` flag with the ``target create`` command so that
|
|
we don't load all of the dependent shared libraries from the current system.
|
|
When we symbolicate, we are often symbolicating a binary that was running on
|
|
another system, and even though the main executable might reference shared
|
|
libraries in ``/usr/lib``, we often don't want to load the versions on the
|
|
current computer.
|
|
|
|
Using the ``image list`` command will show us a list of all shared libraries
|
|
associated with the current target. As expected, we currently only have a
|
|
single binary:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image list
|
|
[ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out
|
|
/tmp/a.out.dSYM/Contents/Resources/DWARF/a.out
|
|
|
|
Now we can look up an address:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image lookup --address 0x100000aa3
|
|
Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131)
|
|
Summary: a.out`main + 67 at main.c:13
|
|
|
|
Since we haven't specified a slide or any load addresses for individual
|
|
sections in the binary, the address that we use here is a file address. A file
|
|
address refers to a virtual address as defined by each object file.
|
|
|
|
If we didn't use the ``--no-dependents`` option with ``target create``, we
|
|
would have loaded all dependent shared libraries:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image list
|
|
[ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out
|
|
/tmp/a.out.dSYM/Contents/Resources/DWARF/a.out
|
|
[ 1] 8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B 0x0000000000000000 /usr/lib/system/libsystem_c.dylib
|
|
[ 2] 62AA0B84-188A-348B-8F9E-3E2DB08DB93C 0x0000000000000000 /usr/lib/system/libsystem_dnssd.dylib
|
|
[ 3] C0535565-35D1-31A7-A744-63D9F10F12A4 0x0000000000000000 /usr/lib/system/libsystem_kernel.dylib
|
|
...
|
|
|
|
Now if we do a lookup using a file address, this can result in multiple matches
|
|
since most shared libraries have a virtual address space that starts at zero:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image lookup -a 0x1000
|
|
Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096)
|
|
|
|
Address: libsystem_c.dylib[0x0000000000001000] (libsystem_c.dylib.__TEXT.__text + 928)
|
|
Summary: libsystem_c.dylib`mcount + 9
|
|
|
|
Address: libsystem_dnssd.dylib[0x0000000000001000] (libsystem_dnssd.dylib.__TEXT.__text + 456)
|
|
Summary: libsystem_dnssd.dylib`ConvertHeaderBytes + 38
|
|
|
|
Address: libsystem_kernel.dylib[0x0000000000001000] (libsystem_kernel.dylib.__TEXT.__text + 1116)
|
|
Summary: libsystem_kernel.dylib`clock_get_time + 102
|
|
...
|
|
|
|
To avoid getting multiple file address matches, you can specify the name of the
|
|
shared library to limit the search:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image lookup -a 0x1000 a.out
|
|
Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096)
|
|
|
|
Defining Load Addresses for Sections
|
|
------------------------------------
|
|
|
|
When symbolicating your crash logs, it can be tedious if you always have to
|
|
adjust your crashlog-addresses into file addresses. To avoid having to do any
|
|
conversion, you can set the load address for the sections of the modules in
|
|
your target. Once you set any section load address, lookups will switch to
|
|
using load addresses. You can slide all sections in the executable by the same
|
|
amount, or set the load address for individual sections. The ``target modules
|
|
load --slide`` command allows us to set the load address for all sections.
|
|
|
|
Below is an example of sliding all sections in a.out by adding 0x123000 to each
|
|
section's file address:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) target create --no-dependents --arch x86_64 /tmp/a.out
|
|
(lldb) target modules load --file a.out --slide 0x123000
|
|
|
|
|
|
It is often much easier to specify the actual load location of each section by
|
|
name. Crash logs on macOS have a Binary Images section that specifies that
|
|
address of the __TEXT segment for each binary. Specifying a slide requires
|
|
requires that you first find the original (file) address for the __TEXT
|
|
segment, and subtract the two values. If you specify the address of the __TEXT
|
|
segment with ``target modules load section address``, you don't need to do any
|
|
calculations. To specify the load addresses of sections we can specify one or
|
|
more section name + address pairs in the ``target modules load`` command:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) target create --no-dependents --arch x86_64 /tmp/a.out
|
|
(lldb) target modules load --file a.out __TEXT 0x100123000
|
|
|
|
We specified that the __TEXT section is loaded at 0x100123000. Now that we have
|
|
defined where sections have been loaded in our target, any lookups we do will
|
|
now use load addresses so we don't have to do any math on the addresses in the
|
|
crashlog backtraces, we can just use the raw addresses:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image lookup --address 0x100123aa3
|
|
Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131)
|
|
Summary: a.out`main + 67 at main.c:13
|
|
|
|
Loading Multiple Executables
|
|
----------------------------
|
|
|
|
You often have more than one executable involved when you need to symbolicate a
|
|
crash log. When this happens, you create a target for the main executable or
|
|
one of the shared libraries, then add more modules to the target using the
|
|
``target modules add`` command.
|
|
|
|
Lets say we have a Darwin crash log that contains the following images:
|
|
|
|
.. code-block:: text
|
|
|
|
Binary Images:
|
|
0x100000000 - 0x100000ff7 <A866975B-CA1E-3649-98D0-6C5FAA444ECF> /tmp/a.out
|
|
0x7fff83f32000 - 0x7fff83ffefe7 <8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B> /usr/lib/system/libsystem_c.dylib
|
|
0x7fff883db000 - 0x7fff883e3ff7 <62AA0B84-188A-348B-8F9E-3E2DB08DB93C> /usr/lib/system/libsystem_dnssd.dylib
|
|
0x7fff8c0dc000 - 0x7fff8c0f7ff7 <C0535565-35D1-31A7-A744-63D9F10F12A4> /usr/lib/system/libsystem_kernel.dylib
|
|
|
|
First we create the target using the main executable and then add any extra
|
|
shared libraries we want:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) target create --no-dependents --arch x86_64 /tmp/a.out
|
|
(lldb) target modules add /usr/lib/system/libsystem_c.dylib
|
|
(lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib
|
|
(lldb) target modules add /usr/lib/system/libsystem_kernel.dylib
|
|
|
|
|
|
If you have debug symbols in standalone files, such as dSYM files on macOS,
|
|
you can specify their paths using the --symfile option for the ``target create``
|
|
(recent LLDB releases only) and ``target modules add`` commands:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) target create --no-dependents --arch x86_64 /tmp/a.out --symfile /tmp/a.out.dSYM
|
|
(lldb) target modules add /usr/lib/system/libsystem_c.dylib --symfile /build/server/a/libsystem_c.dylib.dSYM
|
|
(lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib --symfile /build/server/b/libsystem_dnssd.dylib.dSYM
|
|
(lldb) target modules add /usr/lib/system/libsystem_kernel.dylib --symfile /build/server/c/libsystem_kernel.dylib.dSYM
|
|
|
|
Then we set the load addresses for each __TEXT section (note the colors of the
|
|
load addresses above and below) using the first address from the Binary Images
|
|
section for each image:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) target modules load --file a.out 0x100000000
|
|
(lldb) target modules load --file libsystem_c.dylib 0x7fff83f32000
|
|
(lldb) target modules load --file libsystem_dnssd.dylib 0x7fff883db000
|
|
(lldb) target modules load --file libsystem_kernel.dylib 0x7fff8c0dc000
|
|
|
|
|
|
Now any stack backtraces that haven't been symbolicated can be symbolicated
|
|
using ``image lookup`` with the raw backtrace addresses.
|
|
|
|
Given the following raw backtrace:
|
|
|
|
.. code-block:: text
|
|
|
|
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
|
|
0 libsystem_kernel.dylib 0x00007fff8a1e6d46 __kill + 10
|
|
1 libsystem_c.dylib 0x00007fff84597df0 abort + 177
|
|
2 libsystem_c.dylib 0x00007fff84598e2a __assert_rtn + 146
|
|
3 a.out 0x0000000100000f46 main + 70
|
|
4 libdyld.dylib 0x00007fff8c4197e1 start + 1
|
|
|
|
We can now symbolicate the load addresses:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image lookup -a 0x00007fff8a1e6d46
|
|
(lldb) image lookup -a 0x00007fff84597df0
|
|
(lldb) image lookup -a 0x00007fff84598e2a
|
|
(lldb) image lookup -a 0x0000000100000f46
|
|
|
|
|
|
Getting Variable Information
|
|
----------------------------
|
|
|
|
If you add the --verbose flag to the ``image lookup --address`` command, you
|
|
can get verbose information which can often include the locations of some of
|
|
your local variables:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) image lookup --address 0x100123aa3 --verbose
|
|
Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 110)
|
|
Summary: a.out`main + 50 at main.c:13
|
|
Module: file = "/tmp/a.out", arch = "x86_64"
|
|
CompileUnit: id = {0x00000000}, file = "/tmp/main.c", language = "ISO C:1999"
|
|
Function: id = {0x0000004f}, name = "main", range = [0x0000000100000bc0-0x0000000100000dc9)
|
|
FuncType: id = {0x0000004f}, decl = main.c:9, compiler_type = "int (int, const char **, const char **, const char **)"
|
|
Blocks: id = {0x0000004f}, range = [0x100000bc0-0x100000dc9)
|
|
id = {0x000000ae}, range = [0x100000bf2-0x100000dc4)
|
|
LineEntry: [0x0000000100000bf2-0x0000000100000bfa): /tmp/main.c:13:23
|
|
Symbol: id = {0x00000004}, range = [0x0000000100000bc0-0x0000000100000dc9), name="main"
|
|
Variable: id = {0x000000bf}, name = "path", type= "char [1024]", location = DW_OP_fbreg(-1072), decl = main.c:28
|
|
Variable: id = {0x00000072}, name = "argc", type= "int", location = r13, decl = main.c:8
|
|
Variable: id = {0x00000081}, name = "argv", type= "const char **", location = r12, decl = main.c:8
|
|
Variable: id = {0x00000090}, name = "envp", type= "const char **", location = r15, decl = main.c:8
|
|
Variable: id = {0x0000009f}, name = "aapl", type= "const char **", location = rbx, decl = main.c:8
|
|
|
|
|
|
The interesting part is the variables that are listed. The variables are the
|
|
parameters and local variables that are in scope for the address that was
|
|
specified. These variable entries have locations which are shown in bold above.
|
|
Crash logs often have register information for the first frame in each stack,
|
|
and being able to reconstruct one or more local variables can often help you
|
|
decipher more information from a crash log than you normally would be able to.
|
|
Note that this is really only useful for the first frame, and only if your
|
|
crash logs have register information for your threads.
|
|
|
|
Using Python API to Symbolicate
|
|
-------------------------------
|
|
|
|
All of the commands above can be done through the python script bridge. The
|
|
code below will recreate the target and add the three shared libraries that we
|
|
added in the darwin crash log example above:
|
|
|
|
.. code-block:: python
|
|
|
|
triple = "x86_64-apple-macosx"
|
|
platform_name = None
|
|
add_dependents = False
|
|
target = lldb.debugger.CreateTarget("/tmp/a.out", triple, platform_name, add_dependents, lldb.SBError())
|
|
if target:
|
|
# Get the executable module
|
|
module = target.GetModuleAtIndex(0)
|
|
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x100000000)
|
|
module = target.AddModule ("/usr/lib/system/libsystem_c.dylib", triple, None, "/build/server/a/libsystem_c.dylib.dSYM")
|
|
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff83f32000)
|
|
module = target.AddModule ("/usr/lib/system/libsystem_dnssd.dylib", triple, None, "/build/server/b/libsystem_dnssd.dylib.dSYM")
|
|
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff883db000)
|
|
module = target.AddModule ("/usr/lib/system/libsystem_kernel.dylib", triple, None, "/build/server/c/libsystem_kernel.dylib.dSYM")
|
|
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff8c0dc000)
|
|
|
|
load_addr = 0x00007fff8a1e6d46
|
|
# so_addr is a section offset address, or a lldb.SBAddress object
|
|
so_addr = target.ResolveLoadAddress (load_addr)
|
|
# Get a symbol context for the section offset address which includes
|
|
# a module, compile unit, function, block, line entry, and symbol
|
|
sym_ctx = so_addr.GetSymbolContext (lldb.eSymbolContextEverything)
|
|
print sym_ctx
|
|
|
|
|
|
Use Builtin Python Module to Symbolicate
|
|
----------------------------------------
|
|
|
|
LLDB includes a module in the lldb package named lldb.utils.symbolication. This module contains a lot of symbolication functions that simplify the symbolication process by allowing you to create objects that represent symbolication class objects such as:
|
|
|
|
- lldb.utils.symbolication.Address
|
|
- lldb.utils.symbolication.Section
|
|
- lldb.utils.symbolication.Image
|
|
- lldb.utils.symbolication.Symbolicator
|
|
|
|
|
|
**lldb.utils.symbolication.Address**
|
|
|
|
This class represents an address that will be symbolicated. It will cache any
|
|
information that has been looked up: module, compile unit, function, block,
|
|
line entry, symbol. It does this by having a lldb.SBSymbolContext as a member
|
|
variable.
|
|
|
|
**lldb.utils.symbolication.Section**
|
|
|
|
This class represents a section that might get loaded in a
|
|
lldb.utils.symbolication.Image. It has helper functions that allow you to set
|
|
it from text that might have been extracted from a crash log file.
|
|
|
|
**lldb.utils.symbolication.Image**
|
|
|
|
This class represents a module that might get loaded into the target we use for
|
|
symbolication. This class contains the executable path, optional symbol file
|
|
path, the triple, and the list of sections that will need to be loaded if we
|
|
choose the ask the target to load this image. Many of these objects will never
|
|
be loaded into the target unless they are needed by symbolication. You often
|
|
have a crash log that has 100 to 200 different shared libraries loaded, but
|
|
your crash log stack backtraces only use a few of these shared libraries. Only
|
|
the images that contain stack backtrace addresses need to be loaded in the
|
|
target in order to symbolicate.
|
|
|
|
Subclasses of this class will want to override the
|
|
locate_module_and_debug_symbols method:
|
|
|
|
.. code-block:: text
|
|
|
|
class CustomImage(lldb.utils.symbolication.Image):
|
|
def locate_module_and_debug_symbols (self):
|
|
# Locate the module and symbol given the info found in the crash log
|
|
|
|
Overriding this function allows clients to find the correct executable module
|
|
and symbol files as they might reside on a build server.
|
|
|
|
**lldb.utils.symbolication.Symbolicator**
|
|
|
|
This class coordinates the symbolication process by loading only the
|
|
lldb.utils.symbolication.Image instances that need to be loaded in order to
|
|
symbolicate an supplied address.
|
|
|
|
**lldb.macosx.crashlog**
|
|
|
|
lldb.macosx.crashlog is a package that is distributed on macOS builds that
|
|
subclasses the above classes. This module parses the information in the Darwin
|
|
crash logs and creates symbolication objects that represent the images, the
|
|
sections and the thread frames for the backtraces. It then uses the functions
|
|
in the lldb.utils.symbolication to symbolicate the crash logs.
|
|
|
|
This module installs a new ``crashlog`` command into the lldb command
|
|
interpreter so that you can use it to parse and symbolicate macOS crash
|
|
logs:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) command script import lldb.macosx.crashlog
|
|
"crashlog" and "save_crashlog" command installed, use the "--help" option for detailed help
|
|
(lldb) crashlog /tmp/crash.log
|
|
...
|
|
|
|
The command that is installed has built in help that shows the options that can
|
|
be used when symbolicating:
|
|
|
|
.. code-block:: text
|
|
|
|
(lldb) crashlog --help
|
|
Usage: crashlog [options] [FILE ...]
|
|
|
|
Symbolicate one or more darwin crash log files to provide source file and line
|
|
information, inlined stack frames back to the concrete functions, and
|
|
disassemble the location of the crash for the first frame of the crashed
|
|
thread. If this script is imported into the LLDB command interpreter, a
|
|
``crashlog`` command will be added to the interpreter for use at the LLDB
|
|
command line. After a crash log has been parsed and symbolicated, a target will
|
|
have been created that has all of the shared libraries loaded at the load
|
|
addresses found in the crash log file. This allows you to explore the program
|
|
as if it were stopped at the locations described in the crash log and functions
|
|
can be disassembled and lookups can be performed using the addresses found in
|
|
the crash log.
|
|
|
|
.. code-block:: text
|
|
|
|
Options:
|
|
-h, --help show this help message and exit
|
|
-v, --verbose display verbose debug info
|
|
-g, --debug display verbose debug logging
|
|
-a, --load-all load all executable images, not just the images found
|
|
in the crashed stack frames
|
|
--images show image list
|
|
--debug-delay=NSEC pause for NSEC seconds for debugger
|
|
-c, --crashed-only only symbolicate the crashed thread
|
|
-d DISASSEMBLE_DEPTH, --disasm-depth=DISASSEMBLE_DEPTH
|
|
set the depth in stack frames that should be
|
|
disassembled (default is 1)
|
|
-D, --disasm-all enabled disassembly of frames on all threads (not just
|
|
the crashed thread)
|
|
-B DISASSEMBLE_BEFORE, --disasm-before=DISASSEMBLE_BEFORE
|
|
the number of instructions to disassemble before the
|
|
frame PC
|
|
-A DISASSEMBLE_AFTER, --disasm-after=DISASSEMBLE_AFTER
|
|
the number of instructions to disassemble after the
|
|
frame PC
|
|
-C NLINES, --source-context=NLINES
|
|
show NLINES source lines of source context (default =
|
|
4)
|
|
--source-frames=NFRAMES
|
|
show source for NFRAMES (default = 4)
|
|
--source-all show source for all threads, not just the crashed
|
|
thread
|
|
-i, --interactive parse all crash logs and enter interactive mode
|
|
|
|
|
|
The source for the "symbolication" and "crashlog" modules are available in git.
|
|
|