lundi 25 juillet 2011

Sleep sort performance test

Introduction

Sleep sort is a very curious and original algorithm. It brings several interesting questions. Such as, how to measure its complexity or in what cases does it perform the best.

Principle

In a nutshell: for each element to sort, start a thread or a process which sleeps a time proportional to the element and then pushes it back to the main thread.

Trivial Bash implementation:

1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
 
function f {
        sleep $1
        echo $1
}
 
while [ -n "$1" ] ; do 
        f $1 &
        shift
done
wait

Test suite

In order to study it and measure its performance we (me and Hartok) decided to implement the Sleep Sort in Python and gevent.

Our program generates a set of random numbers and sorts it with Sleep sort then with the Python built-in Quick sort and records timings. It works with three parameters:

  • set size
  • maximum value in the set
  • the ratio between sleep time and values in the set (for example with 10 the thread for 42 sleeps for 42/10 seconds)

This last parameter is pretty tricky: the higher it is, the faster the Sleep sort works. But if it is too high, the time to start a new thread (or greenlet) will be more important than the sleep time for low values and the result will be incorrect. It seems to be strongly hardware dependent and a little tied to the other parameters.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#!/usr/bin/python
 
import sys
import random
import time
import copy
import gevent
 
divisor = 900000
 
def sort(numbers):
        """ Sorts a list with Sleep Sort algorithm and measures time
        """
        begin = time.time()
 
        def f(n, result):
                gevent.sleep(float(n)/divisor)
                result.append(n)
        
        result = []
        greenlets = []
        # spawn a sleep greenlet for each value
        for n in numbers:
                greenlets.append(gevent.Greenlet.spawn(f, n, result))
 
        # wait for all greenlets to end
        gevent.joinall(greenlets)
        return (time.time()-begin, result)
 
def systemsort(numbers):
        """ Sorts a list with system sort function and measures time
        """
        begin = time.time()
        result = copy.copy(numbers)
        result.sort()
        return (time.time()-begin, result)
 
def generate(n, m): 
        """ Generates a list of n random numbers, each being less than m
        """
        numbers = []
        for i in range(n):
                numbers.append(random.randint(1, m))
        return numbers
 
if __name__ == '__main__':
        # read params
        try:
                n = int(sys.argv[1])
                m = int(sys.argv[2])
        except BaseException:
                print "usage: %s n m [d]" % sys.argv[0]
                print " - n number of random numbers"
                print " - m maximum random value"
                print " - d time divisor (optionnal), default is %d" % divisor
                sys.exit(1)
 
        try:
                divisor = int(sys.argv[3])
        except:
                pass
 
        # generate random values
        numbers = generate(n, m)
        # sleep sort them
        (sleep_time, sleep_result) = sort(numbers)
        # system sort them
        (system_time, system_result) = systemsort(numbers)
 
        if system_result == sleep_result:
                print "sleep sort succeed"
                print "sleep: %f" % sleep_time
                print "system: %f" % system_time
        else:
                print "sleep sort failed (time divisor too ambitious ?)"
                print "sleep: (%d elements)" % len(sleep_result)
                #print sleep_result.__str__()
                print "system: (%d elements)" % len(system_result)
                #print system_result.__str__()
                errors = 0
                for n in range(n):
                        if system_result[n] != sleep_result[n]:
                                errors = errors + 1
                print "%d errors" % errors

Test results

quicksort.png

sleepsort.png

ratio.png

The overall performance of the Sleep sort is awful, but it has interesting performance when it comes to sets with few different values.

jeudi 13 janvier 2011

Hexadecimal dump in C

It can be very handy, in your C programs, to be able to print raw data in a UNIX xxd fashion. Here is a short function to do that :

hexdump.c :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <stdio.h>
#include <ctype.h>
 
#ifndef HEXDUMP_COLS
#define HEXDUMP_COLS 8
#endif
 
void hexdump(void *mem, unsigned int len)
{
        unsigned int i, j;
        
        for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
        {
                /* print offset */
                if(i % HEXDUMP_COLS == 0)
                {
                        printf("0x%06x: ", i);
                }
 
                /* print hex data */
                if(i < len)
                {
                        printf("%02x ", 0xFF & ((char*)mem)[i]);
                }
                else /* end of block, just aligning for ASCII dump */
                {
                        printf("   ");
                }
                
                /* print ASCII dump */
                if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
                {
                        for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
                        {
                                if(j >= len) /* end of block, not really printing */
                                {
                                        putchar(' ');
                                }
                                else if(isprint(((char*)mem)[j])) /* printable char */
                                {
                                        putchar(0xFF & ((char*)mem)[j]);        
                                }
                                else /* other char */
                                {
                                        putchar('.');
                                }
                        }
                        putchar('\n');
                }
        }
}
 
#ifdef HEXDUMP_TEST
int main(int argc, char *argv[])
{
        hexdump(argv[0], 20);
 
        return 0;
}
#endif

hexdump.h

1
2
3
4
5
6
#ifndef _HEXDUMP_H
#define _HEXDUMP_H
 
void hexdump(void *mem, unsigned int len);
 
#endif

It can be tested as a standalone program (which dumps 20 bytes of memory strating at *argv) :

cc --ansi -DHEXDUMP_TEST -o hexdump hexdump.c
./hexdump
0x000000: 2e 2f 68 65 78 64 75 6d ./hexdum
0x000008: 70 00 53 53 48 5f 41 47 p.SSH_AG
0x000010: 45 4e 54 5f             ENT_

samedi 1 janvier 2011

A script for splitting videos using ffmpeg

Here is a small bash script for automatically cutting a video file into smaller chunks of fixed length. ffmpeg cannot output multiple files, but it has start offset and duration parameters which my script uses to actually split the file.

For example, if we have a video called 'video.mpeg' which is 3000 seconds long and we run

ffsplit.sh video.mpeg 1200

we will obtain three files : video-001.mpeg (1200 seconds), video-002.mpeg (1200 seconds) and video-003.mpeg (600 seconds).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#!/bin/bash
 
# Written by Alexis Bezverkhyy <alexis@grapsus.net> in 2011
# This is free and unencumbered software released into the public domain.
# For more information, please refer to <http://unlicense.org/>
 
function usage {
        echo "Usage : ffsplit.sh input.file chunk-duration [output-filename-format]"
        echo -e "\t - input file may be any kind of file reconginzed by ffmpeg"
        echo -e "\t - chunk duration must be in seconds"
        echo -e "\t - output filename format must be printf-like, for example myvideo-part-%04d.avi"
        echo -e "\t - if no output filename format is given, it will be computed\
 automatically from input filename"
}
 
IN_FILE="$1"
OUT_FILE_FORMAT="$3"
typeset -i CHUNK_LEN
CHUNK_LEN="$2"
 
DURATION_HMS=$(ffmpeg -i "$IN_FILE" 2>&1 | grep Duration | cut -f 4 -d ' ')
DURATION_H=$(echo "$DURATION_HMS" | cut -d ':' -f 1)
DURATION_M=$(echo "$DURATION_HMS" | cut -d ':' -f 2)
DURATION_S=$(echo "$DURATION_HMS" | cut -d ':' -f 3 | cut -d '.' -f 1)
let "DURATION = ( DURATION_H * 60 + DURATION_M ) * 60 + DURATION_S"
 
if [ "$DURATION" = '0' ] ; then
        echo "Invalid input video"
        usage
        exit 1
fi
 
if [ "$CHUNK_LEN" = "0" ] ; then
        echo "Invalid chunk size"
        usage
        exit 2
fi
 
if [ -z "$OUT_FILE_FORMAT" ] ; then
        FILE_EXT=$(echo "$IN_FILE" | sed 's/^.*\.\([a-zA-Z0-9]\+\)$/\1/')
        FILE_NAME=$(echo "$IN_FILE" | sed 's/^\(.*\)\.[a-zA-Z0-9]\+$/\1/')
        OUT_FILE_FORMAT="${FILE_NAME}-%03d.${FILE_EXT}"
        echo "Using default output file format : $OUT_FILE_FORMAT"
fi
 
N='1'
OFFSET='0'
let 'N_FILES = DURATION / CHUNK_LEN + 1'
 
while [ "$OFFSET" -lt "$DURATION" ] ; do
        OUT_FILE=$(printf "$OUT_FILE_FORMAT" "$N")
        echo "writing $OUT_FILE ($N/$N_FILES)..."
        ffmpeg -i "$IN_FILE" -vcodec copy -acodec copy -ss "$OFFSET" -t "$CHUNK_LEN" "$OUT_FILE"
        let "N = N + 1"
        let "OFFSET = OFFSET + CHUNK_LEN"
done

mercredi 15 septembre 2010

Lightweight HTTP server in BASH with PHP support

No kidding, I wrote this HTTP server in Bourne Shell. It supports most of HTTP 1.0 headers, Keep-alive requests, directory listing and PHP scripts. By its nature, this piece of software is not secure (it is fun though) and isn't intended for production purposes : <insert the usual NO WARRANTY boilerplate bullshit here>.

I tested it with PHPMyAdmin which I consider to be heavy PHP software and it works pretty well. It is not well commented, really I just wrote it for fun, learning BASH and HTTP protocol.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
#!/bin/bash
 
# Written by Alexis Bezverkhyy <alexis@grapsus.net> in 2008
# This is free and unencumbered software released into the public domain.
# For more information, please refer to <http://unlicense.org/>
 
# This script should be run via inetd, first parameter is WWW root path
 
# Uncomment for debugging
#exec 2>/tmp/log ; set -x
 
NUM="$RANDOM"
DOCUMENT_ROOT="$1"
KEEP_ALIVE="keep-alive"
 
while [ "$KEEP_ALIVE" == "keep-alive" ] ; do
KEEP_ALIVE="close"
 
for i in seq 1 5; do
  read -t 5 line
  if [ -n "$line" ] ; then break; fi
done
 
if grep -sqv 'HTTP' <<< "$line" ; then exit ; fi
#echo `date`" BEGIN $line" >> /tmp/"$NUM"-log
 
REQUEST_METHOD=`cut -d ' ' -f 1 <<< "$line"`
REQUEST_URI=`cut -d ' ' -f 2 <<< "$line" | sed 's/%20/ /'`
SCRIPT_NAME=`cut -d '?' -f 1 <<< "$REQUEST_URI"`
SCRIPT_FILENAME=`sed -e 's#//#/#' -e 's#/$##' <<< "$DOCUMENT_ROOT$SCRIPT_NAME"`
QUERY_STRING=''
if grep -sq '?' <<< "$REQUEST_URI" ; then
  QUERY_STRING=`cut -d '?' -f 2 <<< "$REQUEST_URI"`
fi
 
while read -t 1 line ; do
  line=`strings <<< "$line"`
  if grep -sqi '^Content-length' <<< "$line" ; then
    CONTENT_LENGTH=`cut -d ' ' -f 2 <<< "$line"`
  elif grep -sqi '^Content-type' <<< "$line" ; then
    CONTENT_TYPE=`cut -d ' ' -f 2 <<< "$line"`
  elif grep -sqi '^Connection' <<< "$line" ; then
    KEEP_ALIVE=`cut -d ' ' -f 2 <<< "$line"`
  elif grep -sqi '^Cookie' <<< "$line" ; then
    HTTP_COOKIE=`sed 's/Cookie:[ ]*//i' <<< "$line"`
  fi
  if [ -z "$line" -a "$REQUEST_METHOD" == "POST" -a -n "$CONTENT_LENGTH" ] ; then
    read -n "$CONTENT_LENGTH" line
    echo "$line" > /tmp/"$NUM"-post
    break
  elif [ -z "$line" ] ; then
    break
  fi
done
 
# some security
if grep -sq '\.\.' <<< "$SCRIPT_FILENAME" || ( namei "$SCRIPT_FILENAME" | grep -sq '\->') ; 
then
  SCRIPT_FILENAME='./'  
fi
 
if [ -d "$SCRIPT_FILENAME" ] ; then
  echo -en 'HTTP/1.0 200 OK\r\nContent-type: text/html\r\n\r\n'
  dir=`sed 's#'"$DOCUMENT_ROOT"'##' <<< "$SCRIPT_FILENAME"`
  if [ -z "$dir" ] ; then
    dir='/' ; parent='/'
  else
    parent=`sed 's#/[^/]\+$##' <<< "$dir"`
    if [ -z "$parent" ] ; then parent='/' ; fi
  fi
  echo "<html><head><title>Index of $dir</title></head>
  <body><h3>Index of $dir</h3>
  <table>
    <tr>
      <td><b>Name</b></td>
      <td><b>Last modified</b></td>
      <td><b>Size</b></td>
    </tr>
    <tr><td colspan=\"3\">[D] <a href=\"$parent\">..</a></td></tr>"
  for item in "$SCRIPT_FILENAME"/* ; do
    if [ "$item" == "$SCRIPT_FILENAME"'/*' ] ; then break ; fi
    name=`basename "$item"`
    link=`sed 's#'"$DOCUMENT_ROOT"'##' <<< "$item"`
    stat=`ls -lhd --time-style='+%d-%m-%y#%H:%m' "$item"`
    mtime=`cut -d ' ' -f 6 <<< "$stat" | sed 's/#/ /'`
    size=`cut -d ' ' -f 5 <<< "$stat"`
    echo "<tr><td>"
    if [ -L "$item" ] ; then
      echo "[S] $name<br/>"
    elif [ -d "$item" ] ; then
      echo '[D] <a href="'"$link"'">'"$name"'</a><br/>'
    else
      echo '[F] <a href="'"$link"'">'"$name"'</a><br/>'
    fi
    echo "</td><td>$mtime</td><td>$size</td></tr>"
  done
  echo "</table></body></html>"
elif [ -f "$SCRIPT_FILENAME" ] ; then
  mime='text/html'
  if grep -Esqv '\.(php|htm|html)$' <<< "$SCRIPT_FILENAME" ; then
    mime=`file -b --mime-type $SCRIPT_FILENAME`
  fi
  if grep -sq '\.php$' <<< "$SCRIPT_FILENAME" ; then
    for var in `env | cut -d '=' -f 1` ; do
      if [ "$var" != "PATH" -a "$var" != "PWD" -a "$var" != "LANG" -a "$var" != "SHLVL" ] ; then
        export -n "$var"
      fi
    done
    export REQUEST_URI REQUEST_METHOD QUERY_STRING DOCUMENT_ROOT SCRIPT_FILENAME \
    SCRIPT_NAME CONTENT_LENGTH CONTENT_TYPE GATEWAY_INTERFACE='CGI/1.1' \
    HTTP_HOST=`hostname -i` HTTP_COOKIE REDIRECT_STATUS=1
    if [ "$REQUEST_METHOD" == "GET" ] ; then
      php-cgi $SCRIPT_FILENAME \
      `tr '&' ' ' <<< "$QUERY_STRING"` > /tmp/"$NUM"-php
    else
      php-cgi $SCRIPT_FILENAME \
      `tr '&' ' ' <<< "$QUERY_STRING"` > /tmp/"$NUM"-php < /tmp/"$NUM"-post
    fi
    HTTP_STATUS=`grep -i '^Status: .*$' /tmp/"$NUM"-php | cut -d ' ' -f 2`
    if [ -z "$HTTP_STATUS" ] ; then
      HTTP_STATUS='200'
    fi
    OUT="head"
    cat /tmp/"$NUM"-php | while read ; do
      if [ "$OUT" = 'head' ] ; then
        REPLY=$(strings <<< "$REPLY")
        if [ -z "$REPLY" ] ; then
          OUT='body'
          continue
        fi
      fi
      echo "$REPLY" >> /tmp/"$NUM"-php-"$OUT"
    done
    echo -en "HTTP/1.0 $HTTP_STATUS OK\r\nContent-type: $mime\r\nContent-length:"\
    `ls -l /tmp/"$NUM"-php-body | cut -d ' ' -f 5`"\r\nConnection: $KEEP_ALIVE\r\n"
    cat /tmp/"$NUM"-php-head
    echo -en "\r\n"
    cat /tmp/"$NUM"-php-body
  else
    echo -en "HTTP/1.0 200 OK\r\nContent-type: $mime\r\nContent-length: "\
    `ls -l "$SCRIPT_FILENAME" | cut -d ' ' -f 5`"\r\nConnection: $KEEP_ALIVE\r\n\r\n"
    cat "$SCRIPT_FILENAME"
  fi
  rm -f /tmp/"$NUM"-php /tmp/"$NUM"-php-body /tmp/"$NUM"-php-head /tmp/"$NUM"-post 
  # /tmp/"$NUM"-log
else
  echo -en 'HTTP/1.0 404 NOT FOUND\n\rContent-type: text/plain\r\n\r\n404 File not found'
fi
#echo `date`" END" >> /tmp/"$NUM"-log
done

Here's the inetd configuration I use to run it :

8080	stream	tcp	nowait	grapsus	/usr/sbin/tcpd /home/grapsus/bin/http.sh /home/grapsus/www

I know BASH supports sockets, but this support is disabled in most Unix distributions (especially on Debian).

Let me know what you think about it or the improvements you made.

dimanche 12 septembre 2010

PHP tricks : generate an MS Excel file

Here's the shortest and the fastest way I found to convert a CSV file to MS Excel format. It supports string and numeric fields. I hope this function can avoid you using huge Excel PHP classes which are too complicated and slow (and require reading a lot of documentation) for such a basic task.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/* Written by Alexis Bezverkhyy <alexis@grapsus.net> in September 2010
 * This is free and unencumbered software released into the public domain.
 * For more information, please refer to <http://unlicense.org/> */
 
/** Convert a CSV file to MS Excel format
 * 
 * @param string $in input file
 * @param string $out output
 * @param string $glue CSV glue
 * @param string $enclosure CSV enclosure character
 */
function csv2xls($in, $out, $glue=";", $enclosure='"')
{
        $fp_in = fopen($in, "r");
        $fp_out = fopen($out, "w");
        
        /* write Excel BOF */
        fputs($fp_out, pack("ssssss", 0x809, 0x8, 0x0, 0x10, 0x0, 0x0));
        
        /* Read CSV fields */
        for($row = 0; $fields = fgetcsv($fp_in, 0, $glue, $enclosure); $row++)
        {
                foreach($fields as $col=>$value)
                {
                        $value = trim($value);
                        $value = utf8_decode($value);
                        
                        /* string cell */
                        if(!is_numeric($value))
                        {
                                $l = strlen($value);
                                fputs($fp_out,
                                        pack("ssssss", 0x204, 8 + $l, $row, $col, 0x0, $l).$value);
                        }
                        /* numeric cell */
                        else 
                        {
                                fputs($fp_out,
                                        pack("sssss", 0x203, 14, $row, $col, 0x0).pack("d", $value));
                        }
                }
        }
        
        /* write Excel EOF */
        fputs($fp_out, pack("ss", 0x0A, 0x00));
        
        fclose($fp_out);
        fclose($fp_in);
}

- page 1 de 2