2017-08-13

Quick Python performance tuning cheat-sheet

Just a few commands without any context:

Profiling with cProfile

This helped me to find slowest functions, because when optimizing, I need to focus on these (best ration of work needed vs. benefits). This helped me to find function which did some unnecessary calsulations over and over again:

$ python -m cProfile -o cProfile-first_try.out ./layout-generate.py ...
$ python -m pstats cProfile-first_try.out 
Welcome to the profile statistics browser.
cProfile-first_try.out% sort
Valid sort keys (unique prefixes are accepted):
cumulative -- cumulative time
module -- file name
ncalls -- call count
pcalls -- primitive call count
file -- file name
line -- line number
name -- function name
calls -- call count
stdname -- standard name
nfl -- name/file/line
filename -- file name
cumtime -- cumulative time
time -- internal time
tottime -- internal time
cProfile-first_try.out% sort tottime
cProfile-first_try.out% stats 10
Sat Aug 12 23:19:40 2017    cProfile-first_try.out

         18508294 function calls (18501563 primitive calls) in 8.369 seconds

   Ordered by: internal time
   List reduced from 2447 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    27837    4.230    0.000    5.015    0.000 ./utils_matrix2layout.py:14(get_distance_matrix_2d)
    10002    1.356    0.000    1.513    0.000 ./utils_matrix2layout.py:244(get_measured_error_2d)
  5674796    0.572    0.000    0.572    0.000 /usr/lib64/python2.7/collections.py:90(__iter__)
  5340664    0.219    0.000    0.219    0.000 {math.sqrt}
  5432768    0.189    0.000    0.189    0.000 {abs}
   230401    0.183    0.000    0.183    0.000 /usr/lib64/python2.7/collections.py:71(__setitem__)
        1    0.178    0.178    0.282    0.282 ./utils_matrix2layout.py:543(count_angles_layout)
    10018    0.119    0.000    0.345    0.000 /usr/lib64/python2.7/_abcoll.py:548(update)
        1    0.102    0.102    6.749    6.749 ./utils_matrix2layout.py:393(iterate_evolution)
     1142    0.092    0.000    0.111    0.000 /usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py:1299(svd)

To explain the columns, Instant User’s Manual says:

tottime
for the total time spent in the given function (and excluding time made in calls to sub-functions)
cumtime
is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.

Lets compile to C with Cython

Simply performing this on a module which does most of the work gave me about 20% speedup:

# dnf install python2-Cython
$ cython utils_matrix2layout.py
$ gcc `python2-config --cflags --ldflags` -shared utils_matrix2layout.c -o utils_matrix2layout.so

There is much more to do to optimize it, but that would need additional work, so not now :-) Some helpful links: