Tag - hack

Entries feed

Thursday 11 August 2011

Low-level Python debugging with GDB

Introduction

Your Python program is crashing or unresponsive and you can't figure it out with traditional tools (printing, PDB) ? Then this tutorial might help you !

GDB is a very powerful debugging tool, but it's hardly intuitive to use and moreover it doesn't understand Python data structures. With no hacking it will only be able to print the system stack and addresses of some variables whereas what you need is the Python stack and Python local variables in a human readable form.

To start, install GDB and a debug build of your Python interpreter (python2.x-dbg on Debian-like systems).

This last file contains a few macros for GDB that will enable it to print Python locals and stack.

Test program

Let's pretend we don't know why this simple program doesn't stop and why it's unresponsive :

1
2
3
4
5
6
7
8
9
import time
 
def next(i):
    time.sleep(10)
    i = 1 - i
 
i = 1
while True:
    next(i)

Easy case

If your problem is easily reproducible, then you're in luck. Restart your script with the debug build of the Python interpreter and attach to it with GDB :

1
2
3
python2.7-dbg test.py &
[1] 7877
gdb -p 7877

At this point the Python interpreter has been interrupted and the script is paused so we can inspect it. First, we can see the Python stack of our script :

1
2
3
4
5
(gdb) py-bt
#5 Frame 0x242ae50, for file test.py, line 4, in next (i=1)
    time.sleep(10)
#8 Frame 0x2427e00, for file test.py, line 9, in <module> ()
    next(i)

To avoid all confusion : the most recent call comes first in this trace unlike when the backtrace is printed from Python. In GDB, the most recent call is called active or selected. We can print Python code and local variables in the selected frame :

1
2
3
4
5
6
7
8
9
10
11
12
(gdb) py-list 
   1    import time
   2    
   3    def next(i):
  >4        time.sleep(10)
   5        i = 1 - i
   6    
   7    i = 1
   8    while True:
   9        next(i)
(gdb) py-locals 
i = 1

With py-up and py-down macros we can change the selected Python frame. You must be aware that the process is still frozen and those commands don't actually do anything to the process.

1
2
3
4
5
6
7
8
9
10
(gdb) py-up
#8 Frame 0x2427e00, for file test.py, line 9, in <module> ()
    next(i)
(gdb) py-list 
   4        time.sleep(10)
   5        i = 1 - i
   6    
   7    i = 1
   8    while True:
  >9        next(i)

At this point GDB has the same behavior as PDB, it is good but not so helpful. If you were unable to figure it out with PDB, then you're probably dealing with some low-level problem in some Python internal, external lib or system call. To understand what happens you will need to explore the system stack :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(gdb) backtrace
#0  0x00007f48671cddf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000058b22c in floatsleep (secs=10) at ../Modules/timemodule.c:943
#2  0x000000000058a05f in time_sleep (self=0x0, args=(10,)) at ../Modules/timemodule.c:206
#3  0x000000000048867e in PyCFunction_Call (func=<built-in function sleep>, arg=(10,), kw=0x0)
    at ../Objects/methodobject.c:81
#4  0x0000000000525efe in call_function (pp_stack=0x7fff3ac2ca40, oparg=1) at ../Python/ceval.c:4013
#5  0x0000000000520f59 in PyEval_EvalFrameEx (f=Frame 0xe59e50, for file test.py, line 4, in next (i=1), throwflag=0)
    at ../Python/ceval.c:2666
#6  0x00000000005263c5 in fast_function (func=<function at remote 0xe074f8>, pp_stack=0x7fff3ac2cdc0, n=1, na=1, nk=0)
    at ../Python/ceval.c:4099
#7  0x00000000005260d5 in call_function (pp_stack=0x7fff3ac2cdc0, oparg=1) at ../Python/ceval.c:4034
#8  0x0000000000520f59 in PyEval_EvalFrameEx (f=Frame 0xe56e00, for file test.py, line 9, in <module> (), throwflag=0)
    at ../Python/ceval.c:2666
#9  0x0000000000523744 in PyEval_EvalCodeEx (co=0xdb90f0, globals=
    {'__builtins__': <module at remote 0x7f4868754470>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0xe0e678>, '__name__': '__main__', 'next': <function at remote 0xe074f8>, '__doc__': None}, locals=
    {'__builtins__': <module at remote 0x7f4868754470>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0xe0e678>, '__name__': '__main__', 'next': <function at remote 0xe074f8>, '__doc__': None}, args=0x0, 
    argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:3253
#10 0x0000000000519b86 in PyEval_EvalCode (co=0xdb90f0, globals=
    {'__builtins__': <module at remote 0x7f4868754470>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0xe0e678>, '__name__': '__main__', 'next': <function at remote 0xe074f8>, '__doc__': None}, locals=
[...]

We can now see that our program is stuck in a call to select(), in the libc (you might not actually see exactly where the last call was made unless you have a debug version of that external library). Now you should probably use GDB commands finish and return to see if the execution thread comes back into the Python interpreter. If not, it's probably a bug with an external library which should be reproducible outside of Python.

Hard case

You might not be able to trigger systematically the bug which may be happening like once a day on one of your production servers. In this case we absolutely need to perform the analysis right on the production server where you found the unresponsive process. As this process is running on an optimized and stripped version of the Python interpreter, the stack trace will give you very few info :

1
2
3
4
5
6
7
8
9
10
11
12
13
(gdb) bt
#0  0x00007f1b7f02cdf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000005048c2 in ?? ()
#2  0x00000000004b90a9 in PyEval_EvalFrameEx ()
#3  0x00000000004b9673 in PyEval_EvalFrameEx ()
#4  0x00000000004bf600 in PyEval_EvalCodeEx ()
#5  0x00000000004c0082 in PyEval_EvalCode ()
#6  0x00000000004df2d2 in ?? ()
#7  0x00000000004dfe64 in PyRun_FileExFlags ()
#8  0x00000000004e096e in PyRun_SimpleFileExFlags ()
#9  0x00000000004f09dd in Py_Main ()
#10 0x00007f1b7ef7cead in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x000000000041f0a1 in _start ()

Only public symbols of libpython are visible, we absolutely don't know where we are in the Python script and we have no idea of the Python stack. Let's install the debug version of Python, it will at least install GDB symbols for the Python interpreter :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(gdb) bt
#0  0x00007f0b1773bdf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000005048c2 in floatsleep (self=<value optimized out>, args=<value optimized out>)
    at ../Modules/timemodule.c:943
#2  time_sleep (self=<value optimized out>, args=<value optimized out>) at ../Modules/timemodule.c:206
#3  0x00000000004b90a9 in call_function (f=<value optimized out>, throwflag=<value optimized out>)
    at ../Python/ceval.c:4013
#4  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:2666
#5  0x00000000004b9673 in fast_function (f=<value optimized out>, throwflag=<value optimized out>)
    at ../Python/ceval.c:4099
#6  call_function (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:4034
#7  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:2666
#8  0x00000000004bf600 in PyEval_EvalCodeEx (co=0x7f0b18c0f8b0, globals=<value optimized out>, 
    locals=<value optimized out>, args=<value optimized out>, argcount=<value optimized out>, 
    kws=<value optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:3253
#9  0x00000000004c0082 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, 
    locals=<value optimized out>) at ../Python/ceval.c:667
#10 0x00000000004df2d2 in run_mod (mod=<value optimized out>, filename=<value optimized out>, globals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, locals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, flags=
<value optimized out>, arena=<value optimized out>) at ../Python/pythonrun.c:1346
#11 0x00000000004dfe64 in PyRun_FileExFlags (fp=0x2693690, filename=0x7ffffc54e67b "test.py", 
    start=<value optimized out>, globals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, locals=
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': <function at remote 0x7f0b18bc2578>, '__doc__': None}, closeit=1,
flags=0x7ffffc54ce30) at ../Python/pythonrun.c:1332
#12 0x00000000004e096e in PyRun_SimpleFileExFlags (fp=0x2693690, filename=<value optimized out>, closeit=1, flags=
    0x7ffffc54ce30) at ../Python/pythonrun.c:936
#13 0x00000000004f09dd in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at ../Modules/main.c:599
#14 0x00007f0b1768bead in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x000000000041f0a1 in _start ()

It is better, we now know the module and the file, but nothing about local variables or Python stack. Do not try to use py-* macros, they will not work as almost all Python internals are "optimized out", they will probably trigger a segmentation fault by trying to print Python objects with _PyObject_Dump.

The only chance you have to find exactly where the code is failing is by carefully inspecting all the internal Python variables, some of them are still usable and can be used to find out what's going on. For example :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(gdb) select-frame 2
(gdb) info locals
secs = 10
(gdb) select-frame 3
(gdb) info locals
callargs = (10,)
flags = <value optimized out>
tstate = <value optimized out>
func = <built-in function sleep>
w = <value optimized out>
na = <value optimized out>
nk = <value optimized out>
n = <value optimized out>
pfunc = 0x26aace0
x = <value optimized out>

Frame 2 was a call in timemodule.c and showed us that the argument of the function call was 10 secs.

Frame 3 is in PyEval_EvalFrameEx() (main Python bytecode interpretation routine) it brings us back into the interpreter. Almost all local variables were optimized, func tells us that the function call was for the function sleep. Finally :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(gdb) select-frame 4
(gdb) info locals 
sp = 0x26aace8
stack_pointer = <value optimized out>
next_instr = 0x7f0b18bc0760 "\001d\002"
opcode = <value optimized out>
oparg = <value optimized out>
why = WHY_NOT
err = 0
x = <value optimized out>
v = <value optimized out>
w = <value optimized out>
u = <value optimized out>
t = <value optimized out>
stream = 0x0
fastlocals = <value optimized out>
freevars = 0x26aace0
retval = <value optimized out>
tstate = 0x25d20a0
co = 0x7f0b18c0fa30
instr_ub = -1
instr_lb = 0
instr_prev = -1
first_instr = 0x7f0b18bc0754 "t"
names = ('time', 'sleep')
consts = (None, 10, 1)
(gdb) p tstate->frame->f_globals 
$5 = 
    {'__builtins__': <module at remote 0x7f0b18c8cad0>, '__file__': 'test.py', '__package__': None, 'i': 1, 'time': <module at remote 0x7f0b18bc7210>, '__name__': '__main__', 'next': 
<function at remote 0x7f0b18bc2578>, '__doc__': None}
(gdb) p tstate->frame->f_lineno 
$6 = 3

Here we go test.py line 3, i = 1, function call to time.sleep(10) !

Insanely hard case

If this those steps are still insufficient, you might try to set breakpoints on call_function() and let the script run a little bit with continue or step.

The final and ultimate solution is to run PyEval_EvalFrameEx() step by step. Grab the source of CPython and go to the Python directory before launching GDB (it must be the source of that exact same version of the Python interpreter that runs your script) :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(gdb) finish
Run till exit from #0  0x00007fc13a60adf3 in select () from /lib/x86_64-linux-gnu/libc.so.6
0x00000000004dc17b in floatsleep (self=<value optimized out>, args=<value optimized out>)
    at ../Modules/timemodule.c:914
914     ../Modules/timemodule.c: No such file or directory.
        in ../Modules/timemodule.c
(gdb) next
time_sleep (self=<value optimized out>, args=<value optimized out>) at ../Modules/timemodule.c:206
206     in ../Modules/timemodule.c
(gdb) cd python2.7-2.7.2/Python/
Working directory /home/grapsus/ludia/python2.7-2.7.2/Python.
(gdb) next
208         Py_INCREF(Py_None);
(gdb) next
209         return Py_None;
(gdb) next
210     }

Notice that it doesn't work until you cd to the Python directory of CPython source tree. Same thing if you want to debug step by step some Python module, like gevent, you will need the source code of the very same version that's running the script.

It is very time-consuming and you'll probably need a Python bytecode reference to follow what's going on but you'll eventually find the issue.

Conclusion

Even with a strongly optimized and stripped Python interpreter it is possible to debug or at least analyze a buggy Python script.

References

Wednesday 15 September 2010

Lightweight HTTP server in BASH with PHP support

No kidding, I wrote this HTTP server in Bourne Shell. It supports most of HTTP 1.0 headers, Keep-alive requests, directory listing and PHP scripts. By its nature, this piece of software is not secure (it is fun though) and isn't intended for production purposes : <insert the usual NO WARRANTY boilerplate bullshit here>.

I tested it with PHPMyAdmin which I consider to be heavy PHP software and it works pretty well. It is not well commented, really I just wrote it for fun, learning BASH and HTTP protocol.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
#!/bin/bash
 
# Written by Alexis Bezverkhyy <alexis@grapsus.net> in 2008
# This is free and unencumbered software released into the public domain.
# For more information, please refer to <http://unlicense.org/>
 
# This script should be run via inetd, first parameter is WWW root path
 
# Uncomment for debugging
#exec 2>/tmp/log ; set -x
 
NUM="$RANDOM"
DOCUMENT_ROOT="$1"
KEEP_ALIVE="keep-alive"
 
while [ "$KEEP_ALIVE" == "keep-alive" ] ; do
KEEP_ALIVE="close"
 
for i in seq 1 5; do
  read -t 5 line
  if [ -n "$line" ] ; then break; fi
done
 
if grep -sqv 'HTTP' <<< "$line" ; then exit ; fi
#echo `date`" BEGIN $line" >> /tmp/"$NUM"-log
 
REQUEST_METHOD=`cut -d ' ' -f 1 <<< "$line"`
REQUEST_URI=`cut -d ' ' -f 2 <<< "$line" | sed 's/%20/ /'`
SCRIPT_NAME=`cut -d '?' -f 1 <<< "$REQUEST_URI"`
SCRIPT_FILENAME=`sed -e 's#//#/#' -e 's#/$##' <<< "$DOCUMENT_ROOT$SCRIPT_NAME"`
QUERY_STRING=''
if grep -sq '?' <<< "$REQUEST_URI" ; then
  QUERY_STRING=`cut -d '?' -f 2 <<< "$REQUEST_URI"`
fi
 
while read -t 1 line ; do
  line=`strings <<< "$line"`
  if grep -sqi '^Content-length' <<< "$line" ; then
    CONTENT_LENGTH=`cut -d ' ' -f 2 <<< "$line"`
  elif grep -sqi '^Content-type' <<< "$line" ; then
    CONTENT_TYPE=`cut -d ' ' -f 2 <<< "$line"`
  elif grep -sqi '^Connection' <<< "$line" ; then
    KEEP_ALIVE=`cut -d ' ' -f 2 <<< "$line"`
  elif grep -sqi '^Cookie' <<< "$line" ; then
    HTTP_COOKIE=`sed 's/Cookie:[ ]*//i' <<< "$line"`
  fi
  if [ -z "$line" -a "$REQUEST_METHOD" == "POST" -a -n "$CONTENT_LENGTH" ] ; then
    read -n "$CONTENT_LENGTH" line
    echo "$line" > /tmp/"$NUM"-post
    break
  elif [ -z "$line" ] ; then
    break
  fi
done
 
# some security
if grep -sq '\.\.' <<< "$SCRIPT_FILENAME" || ( namei "$SCRIPT_FILENAME" | grep -sq '\->') ; 
then
  SCRIPT_FILENAME='./'  
fi
 
if [ -d "$SCRIPT_FILENAME" ] ; then
  echo -en 'HTTP/1.0 200 OK\r\nContent-type: text/html\r\n\r\n'
  dir=`sed 's#'"$DOCUMENT_ROOT"'##' <<< "$SCRIPT_FILENAME"`
  if [ -z "$dir" ] ; then
    dir='/' ; parent='/'
  else
    parent=`sed 's#/[^/]\+$##' <<< "$dir"`
    if [ -z "$parent" ] ; then parent='/' ; fi
  fi
  echo "<html><head><title>Index of $dir</title></head>
  <body><h3>Index of $dir</h3>
  <table>
    <tr>
      <td><b>Name</b></td>
      <td><b>Last modified</b></td>
      <td><b>Size</b></td>
    </tr>
    <tr><td colspan=\"3\">[D] <a href=\"$parent\">..</a></td></tr>"
  for item in "$SCRIPT_FILENAME"/* ; do
    if [ "$item" == "$SCRIPT_FILENAME"'/*' ] ; then break ; fi
    name=`basename "$item"`
    link=`sed 's#'"$DOCUMENT_ROOT"'##' <<< "$item"`
    stat=`ls -lhd --time-style='+%d-%m-%y#%H:%m' "$item"`
    mtime=`cut -d ' ' -f 6 <<< "$stat" | sed 's/#/ /'`
    size=`cut -d ' ' -f 5 <<< "$stat"`
    echo "<tr><td>"
    if [ -L "$item" ] ; then
      echo "[S] $name<br/>"
    elif [ -d "$item" ] ; then
      echo '[D] <a href="'"$link"'">'"$name"'</a><br/>'
    else
      echo '[F] <a href="'"$link"'">'"$name"'</a><br/>'
    fi
    echo "</td><td>$mtime</td><td>$size</td></tr>"
  done
  echo "</table></body></html>"
elif [ -f "$SCRIPT_FILENAME" ] ; then
  mime='text/html'
  if grep -Esqv '\.(php|htm|html)$' <<< "$SCRIPT_FILENAME" ; then
    mime=`file -b --mime-type $SCRIPT_FILENAME`
  fi
  if grep -sq '\.php$' <<< "$SCRIPT_FILENAME" ; then
    for var in `env | cut -d '=' -f 1` ; do
      if [ "$var" != "PATH" -a "$var" != "PWD" -a "$var" != "LANG" -a "$var" != "SHLVL" ] ; then
        export -n "$var"
      fi
    done
    export REQUEST_URI REQUEST_METHOD QUERY_STRING DOCUMENT_ROOT SCRIPT_FILENAME \
    SCRIPT_NAME CONTENT_LENGTH CONTENT_TYPE GATEWAY_INTERFACE='CGI/1.1' \
    HTTP_HOST=`hostname -i` HTTP_COOKIE REDIRECT_STATUS=1
    if [ "$REQUEST_METHOD" == "GET" ] ; then
      php-cgi $SCRIPT_FILENAME \
      `tr '&' ' ' <<< "$QUERY_STRING"` > /tmp/"$NUM"-php
    else
      php-cgi $SCRIPT_FILENAME \
      `tr '&' ' ' <<< "$QUERY_STRING"` > /tmp/"$NUM"-php < /tmp/"$NUM"-post
    fi
    HTTP_STATUS=`grep -i '^Status: .*$' /tmp/"$NUM"-php | cut -d ' ' -f 2`
    if [ -z "$HTTP_STATUS" ] ; then
      HTTP_STATUS='200'
    fi
    OUT="head"
    cat /tmp/"$NUM"-php | while read ; do
      if [ "$OUT" = 'head' ] ; then
        REPLY=$(strings <<< "$REPLY")
        if [ -z "$REPLY" ] ; then
          OUT='body'
          continue
        fi
      fi
      echo "$REPLY" >> /tmp/"$NUM"-php-"$OUT"
    done
    echo -en "HTTP/1.0 $HTTP_STATUS OK\r\nContent-type: $mime\r\nContent-length:"\
    `ls -l /tmp/"$NUM"-php-body | cut -d ' ' -f 5`"\r\nConnection: $KEEP_ALIVE\r\n"
    cat /tmp/"$NUM"-php-head
    echo -en "\r\n"
    cat /tmp/"$NUM"-php-body
  else
    echo -en "HTTP/1.0 200 OK\r\nContent-type: $mime\r\nContent-length: "\
    `ls -l "$SCRIPT_FILENAME" | cut -d ' ' -f 5`"\r\nConnection: $KEEP_ALIVE\r\n\r\n"
    cat "$SCRIPT_FILENAME"
  fi
  rm -f /tmp/"$NUM"-php /tmp/"$NUM"-php-body /tmp/"$NUM"-php-head /tmp/"$NUM"-post 
  # /tmp/"$NUM"-log
else
  echo -en 'HTTP/1.0 404 NOT FOUND\n\rContent-type: text/plain\r\n\r\n404 File not found'
fi
#echo `date`" END" >> /tmp/"$NUM"-log
done

Here's the inetd configuration I use to run it :

8080	stream	tcp	nowait	grapsus	/usr/sbin/tcpd /home/grapsus/bin/http.sh /home/grapsus/www

I know BASH supports sockets, but this support is disabled in most Unix distributions (especially on Debian).

Let me know what you think about it or the improvements you made.

Sunday 12 September 2010

PHP tricks : generate an MS Excel file

Here's the shortest and the fastest way I found to convert a CSV file to MS Excel format. It supports string and numeric fields. I hope this function can avoid you using huge Excel PHP classes which are too complicated and slow (and require reading a lot of documentation) for such a basic task.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/* Written by Alexis Bezverkhyy <alexis@grapsus.net> in September 2010
 * This is free and unencumbered software released into the public domain.
 * For more information, please refer to <http://unlicense.org/> */
 
/** Convert a CSV file to MS Excel format
 * 
 * @param string $in input file
 * @param string $out output
 * @param string $glue CSV glue
 * @param string $enclosure CSV enclosure character
 */
function csv2xls($in, $out, $glue=";", $enclosure='"')
{
        $fp_in = fopen($in, "r");
        $fp_out = fopen($out, "w");
        
        /* write Excel BOF */
        fputs($fp_out, pack("ssssss", 0x809, 0x8, 0x0, 0x10, 0x0, 0x0));
        
        /* Read CSV fields */
        for($row = 0; $fields = fgetcsv($fp_in, 0, $glue, $enclosure); $row++)
        {
                foreach($fields as $col=>$value)
                {
                        $value = trim($value);
                        $value = utf8_decode($value);
                        
                        /* string cell */
                        if(!is_numeric($value))
                        {
                                $l = strlen($value);
                                fputs($fp_out,
                                        pack("ssssss", 0x204, 8 + $l, $row, $col, 0x0, $l).$value);
                        }
                        /* numeric cell */
                        else 
                        {
                                fputs($fp_out,
                                        pack("sssss", 0x203, 14, $row, $col, 0x0).pack("d", $value));
                        }
                }
        }
        
        /* write Excel EOF */
        fputs($fp_out, pack("ss", 0x0A, 0x00));
        
        fclose($fp_out);
        fclose($fp_in);
}

Wednesday 25 August 2010

The perfect Eclipse, PDT (PHP developpement tools) and Xdebug setup !

Eclipse is a great IDE and Xdebug adds a lot of usefull features for PHP debugging. With PDT plugin for Eclipse, you can use Eclipse as a debug client for Xdebug and debug live PHP code !

I won't describe here how to setup each of these tools. You may find many better written articles about that (with lots of screenshots and all other fancy stuff). I will simply expose how to solve a few very annoying bugs in this configuration which can drive you crazy.

No output when debugging in browser

When you choose to debug a PHP script in an external browser, the output of your script isn't sent to the browser until the debug session is terminated ! I verified it with Wireshark, the browser gets absolutely nothing (not even the HTTP headers) and keeps waiting. None of the *ob_* or *flush* functions seem to help it. I kept trying different PHP options and even editing Xdebug source code until I found the implicit_flush setting which makes it work !!! Just add

implicit_flush = On

to your php.ini and you'll be able to see your output in live !

I really think it's a bug because when you normally run a PHP script without output buffering, the headers and the content are sent to the client before the end of execution. Maybe Xdebug messes up some internal PHP configuration when it sets up a debugging session.

Very annoying and useless DEBUG SESSION ENDED new page

As you terminate a browser debug session, a new browser window pops up saying DEBUG SESSION ENDED. WTF ?! Why do we need to make a HTTP query to stop the debug session ?! The DBGP protocol used by PDT has a stop command. One more time with Wireshark I saw that this command is issued by PDT and Xdebug answers ok to it !

The only solution I found was to make a wrapper for Firefox to ignore these queries. It works very well. Even better, I found a way to close the Firefox window with your application when you terminate the debug session !

Eclipse opens web browsers with additionnal parameters you cannot disable !

Here comes another Eclipse bug, when you set up an external browser some implicit parameters are passed to it and there is no way to disable it. For example, it doesn't execute

firefox %URL%

but

firefox -remote openURL(%URL%)

WFT ?! I don't want it to grab my existing Firefox instance and mess around with it ! So here's the wrapper to solve the two problems above. Create a new Firefox profile named eclipse by running

firefox -profile-manager -no-remote

Now set up Eclipse to use the following script as web browser. Debug sessions will be opened in a new window and when you terminate it, this window will automatically close !

1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash
 
URL=$(grep -o 'http://[^)]*' <<< "$@")
echo "$URL" >> /tmp/url
if !(grep -q 'XDEBUG_SESSION_STOP_NO_EXEC' <<< "$URL") ; then
        firefox -no-remote -P eclipse "$URL" &
        echo "$!" > /tmp/eclipse-firefox.pid
else
        kill $(cat /tmp/eclipse-firefox.pid)
        rm -f /tmp/eclipse-firefox.pid
fi

Notes

I made my tests on PHP 5.3, Xdebug 2.1, Xdebug 2.2-dev, Eclipse Ganymede, Eclipse Galileo, Eclipse Helios and PDT 2.1 and PDT 2.2 on Debian SID (amd64).

Let me know if these workarounds did it for you or if you found more elegant solutions to those problems.

Edit : here's a port of this wrapper on Windows.

Tuesday 29 December 2009

Transfert d'appel Free à distance

Chez Free il est impossible, pour raisons de sécurité, de paramétrer un renvoi d'appels depuis une autre adresse IP que la sienne. Voici un script shell à éxécuter sur une machine chez soi pour contourner ce système et paramétrer le transfert d'appels depuis l'extérieur :

transfert.sh :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
 
LOGIN=""
MDP=""
 
TMP="/tmp/transtmp"
if [ ! -d "$TMP" ] ; then
         mkdir -p "$TMP" || (echo "Dossier temporaire inaccessible" ; exit 1)
fi
 
NUMERO="$1"
if [ -n "$NUMERO" ] ; then
        TRANSINC='&transinc=transinc'
fi
 
USERAGENT='Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'
 
wget --no-check-certificate --quiet --keep-session-cookies --save-cookies="$TMP/free-cookie" \
 --user-agent="$USERAGENT" --post-data='login='"$LOGIN"'&ok=Connexion&pass='"$MDP" \
 -O "$TMP/login.html" https://subscribes.free.fr/login/login.pl
 
ID=`grep -o 'id=[[:alnum:]]*&idt=[[:alnum:]]*' "$TMP/login.html"  | tail -n 1` 
if [ -z "$ID" ] ; then
        echo "Login : erreur."
        exit 1
fi
echo "Login : OK"
 
wget --no-check-certificate --quiet --keep-session-cookies --load-cookies="$TMP/free-cookie" \
 --user-agent="$USERAGENT" --post-data='appsort=0000&delai=5&incon='\
 "$NUMERO"'&mevo_delai=35&mevo_mode=5&nonrep=&occup='"$TRANSINC" \
 -O "$TMP/transfert.html" 'https://adsls.free.fr/admin/tel/adminservice_valid.pl?'"$ID"
 
export LC_ALL=c
if grep -sq 'Donn.*valid.' "$TMP/transfert.html" ; then
        echo "Transfert : OK"
else
        echo "Transfert : erreur"
        exit 2
fi
 
rm -f "$TMP/*"

Si aucun numéro n'est fourni en paramètre, le script désactive le renvoi d'appels, sinon il l'active pour ce numéro. Ne pas oublier de renseigner son login et son mot de passe Free au début du code.

Voici une version CGI (à placer dans un dossier cgi-bin Apache ou Lighttpd) qu'on peut exécuter par exemple à partir d'un smartphone ( transfert-cgi.sh?numero=... ) :

transfert-cgi.sh :

1
2
3
4
5
6
7
8
#!/bin/bash
 
echo "<html><body>"
 
NUMERO=`echo "$QUERY_STRING" | cut -f 2 -d '='`
./transfert.sh "$NUMERO"
 
echo "</body></html>"

En plus ces scripts peuvent servir de base pour d'autres automatisations chez le même FAI, il suffit de récupérer la requête à rejouer avec Firebug ou Wireshark et de la lancer à la place de la deuxième requête du script.

Édition du 16 juillet 2010

L'interface de Free est passée en HTTPS, j'ai adapté le script présenté dans cet article.

- page 1 of 2