openCC/CoreDumpDebugging.md

150 lines
4.7 KiB
Markdown
Raw Permalink Normal View History

2023-03-31 10:20:00 +08:00
Originally published at https://rakyll.org/coredumps/.
---
Debugging is highly useful to examine the execution flow
and to understand the current state of a program.
A core file is a file that contains the memory dump of a running
process and its process status. It is primarily used for post-mortem
debugging of a program, as well as to understand a program's state
while it is still running. These two cases make debugging of core dumps
a good diagnostics aid to postmortem and analyze production
services.
I will use a simple hello world web server in this article,
but in real life our programs might get very
complicated easily.
The availability of core dump analysis gives you an
opportunity to resurrect a program from specific snapshot
and look into cases that might only reproducible in certain
conditions/environments.
__Note__: This flow only works on Linux at this point end-to-end,
I am not quite sure about the other Unixes but it is not
yet supported on macOS. Windows is not supported at this point.
Before we begin, you need to make sure that your ulimit
for core dumps are at a reasonable level. It is by default
0 which means the max core file size can only be zero.
I usually set it to unlimited on my development machine by typing:
```
$ ulimit -c unlimited
```
Then, make sure you have [delve](https://github.com/derekparker/delve)
installed on your machine.
Here is a `main.go` that contains a simple handler and it starts an HTTP server.
```
$ cat main.go
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "hello world\n")
})
log.Fatal(http.ListenAndServe("localhost:7777", nil))
}
```
Let's build this and have a binary.
```
$ go build .
```
Lets assume, in the future, there is something messy going on with
this server but you are not so sure about what it might be.
You might have instrumented your program in various ways but it
might not be enough for getting any clue from the existing
instrumentation data.
Basically, in a situation like this, it would be nice to have a
snapshot of the current process, and then use that snapshot to dive
into to the current state of your program with your existing debugging
tools.
There are several ways to obtain a core file. You might have been
already familiar with crash dumps, these are basically core dumps
written to disk when a program is crashing. Go doesn't enable crash dumps
by default but gives you this option on Ctrl+backslash when
`GOTRACEBACK` env variable is set to "crash".
```
$ GOTRACEBACK=crash ./hello
(Ctrl+\)
```
It will crash the program with stack trace printed and core dump file
will be written.
Another option is to retrieve a core dump from a running process
without having to kill a process.
With `gcore`, it is possible to get the core
files without crashing. Lets start the server again:
```
$ ./hello &
$ gcore 546 # 546 is the PID of hello.
```
We have a dump without crashing the process. The next step
is to load the core file to delve and start analyzing.
```
$ dlv core ./hello core.546
```
Alright, this is it! This is no different than the typical delve interactive.
You can backtrace, list, see variables, and more. Some features will be disabled
given a core dump is a snapshot and not a currently running process, but
the execution flow and the program state will be entirely accessible.
```
(dlv) bt
0 0x0000000000457774 in runtime.raise
at /usr/lib/go/src/runtime/sys_linux_amd64.s:110
1 0x000000000043f7fb in runtime.dieFromSignal
at /usr/lib/go/src/runtime/signal_unix.go:323
2 0x000000000043f9a1 in runtime.crash
at /usr/lib/go/src/runtime/signal_unix.go:409
3 0x000000000043e982 in runtime.sighandler
at /usr/lib/go/src/runtime/signal_sighandler.go:129
4 0x000000000043f2d1 in runtime.sigtrampgo
at /usr/lib/go/src/runtime/signal_unix.go:257
5 0x00000000004579d3 in runtime.sigtramp
at /usr/lib/go/src/runtime/sys_linux_amd64.s:262
6 0x00007ff68afec330 in (nil)
at :0
7 0x000000000040f2d6 in runtime.notetsleep
at /usr/lib/go/src/runtime/lock_futex.go:209
8 0x0000000000435be5 in runtime.sysmon
at /usr/lib/go/src/runtime/proc.go:3866
9 0x000000000042ee2e in runtime.mstart1
at /usr/lib/go/src/runtime/proc.go:1182
10 0x000000000042ed04 in runtime.mstart
at /usr/lib/go/src/runtime/proc.go:1152
(dlv) ls
> runtime.raise() /usr/lib/go/src/runtime/sys_linux_amd64.s:110 (PC: 0x457774)
105: SYSCALL
106: MOVL AX, DI // arg 1 tid
107: MOVL sig+0(FP), SI // arg 2
108: MOVL $200, AX // syscall - tkill
109: SYSCALL
=> 110: RET
111:
112: TEXT runtime·raiseproc(SB),NOSPLIT,$0
113: MOVL $39, AX // syscall - getpid
114: SYSCALL
115: MOVL AX, DI // arg 1 pid
```