For an introduction into gdb, check out RMS's gdb Tutorial (note that this is not "the" RMS). Especially mind the example debugging sessions.
Compile with flags -g3 -gdwarf-2 so that macro definitions are compiled along as well. Remove any optimization flags like -O2.
An overview of commands:
n | Next instruction |
s | Step into |
fin | Finish current function (step out) |
p var | Print variable |
p macro | Print macro if compiled with -g3 -gdwarf-2 |
p *var | Print variable to which is pointed by var, but if you're printing a string (defined as char*), don't precede with a * |
p/x var | Print variable hexadecimal |
p/x var@length | Print char array hexadecimal, array has length characters |
set var | Modify variable |
l | List source code around current step |
l start,end | List source code, parameters optional |
c | Continue running |
f | Current frame (I don't know why it's called like that, this just shows the current line) |
bt | Show backtrace |
disp var | Like print, but repeat after every step |
i macro macroname | Print information about macro |
i args | Print arguments of current function |
i locals | Print all local variables |
x/3tb ptr | Examine address ptr, print 3 bytes in binary |
x/3xb ptr | The same, except print 3 bytes in hex |
x/60cb ptr | The same, except print 60 bytes as characters including ASCII value |
j line | Jump straight to line |
Breakpoint commands:
b main | Set breakpoint at function main |
b blah.c:88 | Set breakpoint at file blah.c, line 88 |
b 7 if var==0 | Break at line 7 if variable var equals zero |
i b | List all breakpoints |
d n | Delete breakpoint n |
Some useful options:
listsize count | Number of lines shown when giving the l (list) command |
history save on | Saves the command history when exiting |
follow-fork-mode ask | Set fork-mode to ask, which can also be parent or child |
Put options in $HOME/.gdbrc in the form of lines like:
set option param1
When a segfault occurs, your program immediately quits. To search for the cause of the segfault, it's useful to let your program "dump core". This core file can then be loaded into the debugger to see the "backtrace", i.e. the stack of function calls that led to dumping the core. To let core dumping happen, compile your program with debugging options (-g) and enter the following command either on the commandline or in your shell startup file:
ulimit -c unlimited
This sets no limits to the size of the core file. But be careful: core dumps with full debugging information can easily add up in size. An example of a few core dumps:
-rw------- 1 test users 56M 2007-05-24 22:40 /tmp/core_srv_tscu_29328 -rw------- 1 test users 57M 2007-06-04 18:07 /tmp/core_srv_tscu_4102 -rw------- 1 test users 57M 2007-06-05 20:48 /tmp/core_srv_tscu_5266
Particularly pay attention to the 5th column, the file size.
When the program is run and the segfault occurs, a file is created which is called core.pid in the directory where the executable was started. To change this behaviour, edit /etc/sysctl.conf and add a line like:
kernel.core_pattern=/tmp/core_%e_%p
Check man proc for more options. After doing a sysctl -p as root, all core files will be written to /tmp with a name like core_executablename_pid. To see the current setting:
$ sysctl kernel.core_pattern
You can test this as follows:
$ cat
Now press CTRL-\ (Ctrl and backslash).
Quit (core dumped)
If that doesn't work:
$ cat & $ kill -3 $!
Start the program in the debugger and load the core file. Then give the bracktrace command to show what it was doing at the time of the crash:
$ gdb ./srv_dlr Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) core core.3902 (gdb) bt #0 memcpy () from /lib/libc.so.6 #1 dlrdb_query (p=, seq_flag=1, max_id=694) at dlrdb.c:227 #2 dlr_packet_submit (p=, seq_flag=1, answer=1, nhk_avail=) at dlr.c:175 #3 dlr_packet (opts=, nhk_avail=, p=) at dlr.c:236 #4 process_packet (opts=, nhk_avail=, p=) at process.c:93 #5 run_server (opts=) at server.c:428 #6 tuce_srv (opts=) at server.c:580 #7 main (argc=2, argv=) at main.c:180
You can now jump to the part of the backtrace where you think might be the culprit. Since we can be confident that libc's memcpy isn't a problem itself, we check frame 1:
(gdb) f 1 #1 dlrdb_query (p=, seq_flag=1, max_id=694) at dlrdb.c:227 227 memcpy((void *)p_ptr, (void *)bufptr, PRIM_HDRLEN + length + 1); (gdb)
We can now see what the contents of the variables were before we experienced the segmentation violation, but first you'll want to give the 'l' list command to see the context of the current source code line.
If you type bt and just get a listing as follows:
(gdb) bt #0 ?? () #1 ?? () #2 ?? () ... more lines ...
Then either you didn't compile with debugging flags, or you forgot to pass the original executable when you started gdb.
If you have a core file lying around and you don't know where it came from, use the file utility:
$ file core core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, SVR4-style, from 'telisky'
If you can reproduce the issue, you can also run the program through gdb. Any signals will be caught and can then be examined. For instance, the program 'srv_dlr' generated a segmentation fault upon startup:
$ gdb ./srv_dlr Using host libthread_db library "/lib/libthread_db.so.1". (gdb)
Run the program. In this case, the segmentation fault immediately happens:
(gdb) r Starting program: src/telis/tuce/server/srv_dlr [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 11573)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 16384 (LWP 11573)] 0x401afbce in ____strtol_l_internal () from /lib/libc.so.6 (gdb)
The segmentation fault just happened. Let's see where it occurred with the backtrace command:
(gdb) bt #0 0x401afbce in ____strtol_l_internal () from /lib/libc.so.6 #1 0x401af90a in __strtol_internal () from /lib/libc.so.6 #2 0x401ad226 in atoi () from /lib/libc.so.6 #3 0x08051c4e in dlrdb_init () at dlrdb.c:45 #4 0x0805238f in dlr_initialize () at dlr.c:38 #5 0x0804b727 in process_init (opts=0xbffff470) at process.c:43 #6 0x0804b58b in tuce_srv (opts=0xbffff470) at server.c:563 #7 0x0804a3e8 in main (argc=1, argv=0xbffff514) at main.c:180
We can probably trust Gnu's libc, so let's check out frame number three:
(gdb) f 3 #3 0x08051c4e in dlrdb_init () at dlrdb.c:45 45 actual_id = atoi(row[0]); (gdb)
We can examine the variables here:
(gdb) p row[0] $1 = 0x0 (gdb)
That wasn't the expected content of variable row[0]. Now we can examine why this variable wasn't properly filled.
You're taking over a codebase. You compile, run the binary and nothing happens. It just sits there, waiting for something... If you want to know what's it doing, start gdb program_name. Type 'r' and when it's done starting, press CTRL-C and type 'bt'.
This will show the current call stack. You can now see what function the program is stuck in. When you're done examining, type 'c' plus Enter. The program will continue running.
To debug a multithreaded program, the following commands are useful:
i th | List all threads |
thr number | Switch to thread number (now you can do a backtrace, set breakpoints, step, etc.) |