Tag - python

Entries feed

Thursday 11 August 2011

Low-level Python debugging with GDB

Introduction

Your Python program is crashing or unresponsive and you can't figure it out with traditional tools (printing, PDB) ? Then this tutorial might help you !

GDB is a very powerful debugging tool, but it's hardly intuitive to use and moreover it doesn't understand Python data structures. With no hacking it will only be able to print the system stack and addresses of some variables whereas what you need is the Python stack and Python local variables in a human readable form.

To start, install GDB and a debug build of your Python interpreter (python2.x-dbg on Debian-like systems).

This last file contains a few macros for GDB that will enable it to print Python locals and stack.

Test program

Let's pretend we don't know why this simple program doesn't stop and why it's unresponsive :

1
2
3
4
5
6
7
8
9
import time
 
def next(i):
    time.sleep(10)
    i = 1 - i
 
i = 1
while True:
    next(i)

Easy case

If your problem is easily reproducible, then you're in luck. Restart your script with the debug build of the Python interpreter and attach to it with GDB :

1
2
3
python2.7-dbg test.py &
[1] 7877
gdb -p 7877

At this point the Python interpreter has been interrupted and the script is paused so we can inspect it. First, we can see the Python stack of our script :

1
2
3
4
5
(gdb) py-bt
#5 Frame 0x242ae50, for file test.py, line 4, in next (i=1)
    time.sleep(10)
#8 Frame 0x2427e00, for file test.py, line 9, in <module> ()
    next(i)

To avoid all confusion : the most recent call comes first in this trace unlike when the backtrace is printed from Python. In GDB, the most recent call is called active or selected. We can print Python code and local variables in the selected frame :

1
2
3
4
5
6
7
8
9
10
11
12
(gdb) py-list 
   1    import time
   2    
   3    def next(i):
  >4        time.sleep(10)
   5        i = 1 - i
   6    
   7    i = 1
   8    while True:
   9        next(i)
(gdb) py-locals 
i = 1

With py-up and py-down macros we can change the selected Python frame. You must be aware that the process is still frozen and those commands don't actually do anything to the process.

1
2
3
4
5
6
7
8
9
10
(gdb) py-up
#8 Frame 0x2427e00, for file test.py, line 9, in <module> ()
    next(i)
(gdb) py-list 
   4        time.sleep(10)
   5        i = 1 - i
   6    
   7    i = 1
   8    while True:
  >9        next(i)

At this point GDB has the same behavior as PDB, it is good but not so helpful. If you were unable to figure it out with PDB, then you're probably dealing with some low-level problem in some Python internal, external lib or system call. To understand what happens you will need to explore the system stack :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(gdb) backtrace
#0  0x00007f48671cddf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000058b22c in floatsleep (secs=10) at ../Modules/timemodule.c:943
#2  0x000000000058a05f in time_sleep (self=0x0, args=(10,)) at ../Modules/timemodule.c:206
#3  0x000000000048867e in PyCFunction_Call (func=<built-in function sleep>, arg=(10,), kw=0x0)
    at ../Objects/methodobject.c:81
#4  0x0000000000525efe in call_function (pp_stack=0x7fff3ac2ca40, oparg=1) at ../Python/ceval.c:4013
#5  0x0000000000520f59 in PyEval_EvalFrameEx (f=Frame 0xe59e50, for file test.py, line 4, in next (i=1), throwflag=0)
    at ../Python/ceval.c:2666
#6  0x00000000005263c5 in fast_function (func=<function at remote 0xe074f8>, pp_stack=0x7fff3ac2cdc0, n=1, na=1, nk=0)
    at ../Python/ceval.c:4099
#7  0x00000000005260d5 in call_function (pp_stack=0x7fff3ac2cdc0, oparg=1) at ../Python/ceval.c:4034
#8  0x0000000000520f59 in PyEval_EvalFrameEx (f=Frame 0xe56e00, for file test.py, line 9, in <module> (), throwflag=0)
    at ../Python/ceval.c:2666
#9  0x0000000000523744 in PyEval_EvalCodeEx (co=0xdb90f0, globals=
    {'__builtins__': <module at remote 0x7f4868754470>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0xe0e678>, '__name__': '__main__', 'next': <function at remote 0xe074f8>, '__doc__': None}, locals=
    {'__builtins__': <module at remote 0x7f4868754470>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0xe0e678>, '__name__': '__main__', 'next': <function at remote 0xe074f8>, '__doc__': None}, args=0x0, 
    argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:3253
#10 0x0000000000519b86 in PyEval_EvalCode (co=0xdb90f0, globals=
    {'__builtins__': <module at remote 0x7f4868754470>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0xe0e678>, '__name__': '__main__', 'next': <function at remote 0xe074f8>, '__doc__': None}, locals=
[...]

We can now see that our program is stuck in a call to select(), in the libc (you might not actually see exactly where the last call was made unless you have a debug version of that external library). Now you should probably use GDB commands finish and return to see if the execution thread comes back into the Python interpreter. If not, it's probably a bug with an external library which should be reproducible outside of Python.

Hard case

You might not be able to trigger systematically the bug which may be happening like once a day on one of your production servers. In this case we absolutely need to perform the analysis right on the production server where you found the unresponsive process. As this process is running on an optimized and stripped version of the Python interpreter, the stack trace will give you very few info :

1
2
3
4
5
6
7
8
9
10
11
12
13
(gdb) bt
#0  0x00007f1b7f02cdf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000005048c2 in ?? ()
#2  0x00000000004b90a9 in PyEval_EvalFrameEx ()
#3  0x00000000004b9673 in PyEval_EvalFrameEx ()
#4  0x00000000004bf600 in PyEval_EvalCodeEx ()
#5  0x00000000004c0082 in PyEval_EvalCode ()
#6  0x00000000004df2d2 in ?? ()
#7  0x00000000004dfe64 in PyRun_FileExFlags ()
#8  0x00000000004e096e in PyRun_SimpleFileExFlags ()
#9  0x00000000004f09dd in Py_Main ()
#10 0x00007f1b7ef7cead in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x000000000041f0a1 in _start ()

Only public symbols of libpython are visible, we absolutely don't know where we are in the Python script and we have no idea of the Python stack. Let's install the debug version of Python, it will at least install GDB symbols for the Python interpreter :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(gdb) bt
#0  0x00007f0b1773bdf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000005048c2 in floatsleep (self=<value optimized out>, args=<value optimized out>)
    at ../Modules/timemodule.c:943
#2  time_sleep (self=<value optimized out>, args=<value optimized out>) at ../Modules/timemodule.c:206
#3  0x00000000004b90a9 in call_function (f=<value optimized out>, throwflag=<value optimized out>)
    at ../Python/ceval.c:4013
#4  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:2666
#5  0x00000000004b9673 in fast_function (f=<value optimized out>, throwflag=<value optimized out>)
    at ../Python/ceval.c:4099
#6  call_function (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:4034
#7  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:2666
#8  0x00000000004bf600 in PyEval_EvalCodeEx (co=0x7f0b18c0f8b0, globals=<value optimized out>, 
    locals=<value optimized out>, args=<value optimized out>, argcount=<value optimized out>, 
    kws=<value optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:3253
#9  0x00000000004c0082 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, 
    locals=<value optimized out>) at ../Python/ceval.c:667
#10 0x00000000004df2d2 in run_mod (mod=<value optimized out>, filename=<value optimized out>, globals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, locals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, flags=
<value optimized out>, arena=<value optimized out>) at ../Python/pythonrun.c:1346
#11 0x00000000004dfe64 in PyRun_FileExFlags (fp=0x2693690, filename=0x7ffffc54e67b "test.py", 
    start=<value optimized out>, globals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, locals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, closeit=1,
flags=0x7ffffc54ce30) at ../Python/pythonrun.c:1332
#12 0x00000000004e096e in PyRun_SimpleFileExFlags (fp=0x2693690, filename=<value optimized out>, closeit=1, flags=
    0x7ffffc54ce30) at ../Python/pythonrun.c:936
#13 0x00000000004f09dd in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at ../Modules/main.c:599
#14 0x00007f0b1768bead in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x000000000041f0a1 in _start ()

It is better, we now know the module and the file, but nothing about local variables or Python stack. Do not try to use py-* macros, they will not work as almost all Python internals are "optimized out", they will probably trigger a segmentation fault by trying to print Python objects with _PyObject_Dump.

The only chance you have to find exactly where the code is failing is by carefully inspecting all the internal Python variables, some of them are still usable and can be used to find out what's going on. For example :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(gdb) select-frame 2
(gdb) info locals
secs = 10
(gdb) select-frame 3
(gdb) info locals
callargs = (10,)
flags = <value optimized out>
tstate = <value optimized out>
func = <built-in function sleep>
w = <value optimized out>
na = <value optimized out>
nk = <value optimized out>
n = <value optimized out>
pfunc = 0x26aace0
x = <value optimized out>

Frame 2 was a call in timemodule.c and showed us that the argument of the function call was 10 secs.

Frame 3 is in PyEval_EvalFrameEx() (main Python bytecode interpretation routine) it brings us back into the interpreter. Almost all local variables were optimized, func tells us that the function call was for the function sleep. Finally :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(gdb) select-frame 4
(gdb) info locals 
sp = 0x26aace8
stack_pointer = <value optimized out>
next_instr = 0x7f0b18bc0760 "\001d\002"
opcode = <value optimized out>
oparg = <value optimized out>
why = WHY_NOT
err = 0
x = <value optimized out>
v = <value optimized out>
w = <value optimized out>
u = <value optimized out>
t = <value optimized out>
stream = 0x0
fastlocals = <value optimized out>
freevars = 0x26aace0
retval = <value optimized out>
tstate = 0x25d20a0
co = 0x7f0b18c0fa30
instr_ub = -1
instr_lb = 0
instr_prev = -1
first_instr = 0x7f0b18bc0754 "t"
names = ('time', 'sleep')
consts = (None, 10, 1)
(gdb) p tstate->frame->f_globals 
$5 = 
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': 
<function at remote 0x7f0b18bc2578>, '__doc__': None}
(gdb) p tstate->frame->f_lineno 
$6 = 3

Here we go test.py line 3, i = 1, function call to time.sleep(10) !

Insanely hard case

If this those steps are still insufficient, you might try to set breakpoints on call_function() and let the script run a little bit with continue or step.

The final and ultimate solution is to run PyEval_EvalFrameEx() step by step. Grab the source of CPython and go to the Python directory before launching GDB (it must be the source of that exact same version of the Python interpreter that runs your script) :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(gdb) finish
Run till exit from #0  0x00007fc13a60adf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
0x00000000004dc17b in floatsleep (self=<value optimized out>, args=<value optimized out>)
    at ../Modules/timemodule.c:914
914     ../Modules/timemodule.c: No such file or directory.
        in ../Modules/timemodule.c
(gdb) next
time_sleep (self=<value optimized out>, args=<value optimized out>) at ../Modules/timemodule.c:206
206     in ../Modules/timemodule.c
(gdb) cd python2.7-2.7.2/Python/
Working directory /home/grapsus/ludia/python2.7-2.7.2/Python.
(gdb) next
208         Py_INCREF(Py_None);
(gdb) next
209         return Py_None;
(gdb) next
210     }

Notice that it doesn't work until you cd to the Python directory of CPython source tree. Same thing if you want to debug step by step some Python module, like gevent, you will need the source code of the very same version that's running the script.

It is very time-consuming and you'll probably need a Python bytecode reference to follow what's going on but you'll eventually find the issue.

Conclusion

Even with a strongly optimized and stripped Python interpreter it is possible to debug or at least analyze a buggy Python script.

References

Monday 25 July 2011

Sleep sort performance test

Introduction

Sleep sort is a very curious and original algorithm. It brings several interesting questions. Such as, how to measure its complexity or in what cases does it perform the best.

Principle

In a nutshell: for each element to sort, start a thread or a process which sleeps a time proportional to the element and then pushes it back to the main thread.

Trivial Bash implementation:

1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
 
function f {
        sleep $1
        echo $1
}
 
while [ -n "$1" ] ; do 
        f $1 &
        shift
done
wait

Test suite

In order to study it and measure its performance we (me and Hartok) decided to implement the Sleep Sort in Python and gevent.

Our program generates a set of random numbers and sorts it with Sleep sort then with the Python built-in Quick sort and records timings. It works with three parameters:

  • set size
  • maximum value in the set
  • the ratio between sleep time and values in the set (for example with 10 the thread for 42 sleeps for 42/10 seconds)

This last parameter is pretty tricky: the higher it is, the faster the Sleep sort works. But if it is too high, the time to start a new thread (or greenlet) will be more important than the sleep time for low values and the result will be incorrect. It seems to be strongly hardware dependent and a little tied to the other parameters.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#!/usr/bin/python
 
import sys
import random
import time
import copy
import gevent
 
divisor = 900000
 
def sort(numbers):
        """ Sorts a list with Sleep Sort algorithm and measures time
        """
        begin = time.time()
 
        def f(n, result):
                gevent.sleep(float(n)/divisor)
                result.append(n)
        
        result = []
        greenlets = []
        # spawn a sleep greenlet for each value
        for n in numbers:
                greenlets.append(gevent.Greenlet.spawn(f, n, result))
 
        # wait for all greenlets to end
        gevent.joinall(greenlets)
        return (time.time()-begin, result)
 
def systemsort(numbers):
        """ Sorts a list with system sort function and measures time
        """
        begin = time.time()
        result = copy.copy(numbers)
        result.sort()
        return (time.time()-begin, result)
 
def generate(n, m): 
        """ Generates a list of n random numbers, each being less than m
        """
        numbers = []
        for i in range(n):
                numbers.append(random.randint(1, m))
        return numbers
 
if __name__ == '__main__':
        # read params
        try:
                n = int(sys.argv[1])
                m = int(sys.argv[2])
        except BaseException:
                print "usage: %s n m [d]" % sys.argv[0]
                print " - n number of random numbers"
                print " - m maximum random value"
                print " - d time divisor (optionnal), default is %d" % divisor
                sys.exit(1)
 
        try:
                divisor = int(sys.argv[3])
        except:
                pass
 
        # generate random values
        numbers = generate(n, m)
        # sleep sort them
        (sleep_time, sleep_result) = sort(numbers)
        # system sort them
        (system_time, system_result) = systemsort(numbers)
 
        if system_result == sleep_result:
                print "sleep sort succeed"
                print "sleep: %f" % sleep_time
                print "system: %f" % system_time
        else:
                print "sleep sort failed (time divisor too ambitious ?)"
                print "sleep: (%d elements)" % len(sleep_result)
                #print sleep_result.__str__()
                print "system: (%d elements)" % len(system_result)
                #print system_result.__str__()
                errors = 0
                for n in range(n):
                        if system_result[n] != sleep_result[n]:
                                errors = errors + 1
                print "%d errors" % errors

Test results

quicksort.png

sleepsort.png

ratio.png

The overall performance of the Sleep sort is awful, but it has interesting performance when it comes to sets with few different values.