Profiling with gperftools

gperftools is a set of tools for performance profiling and memory checking. One of the main advantages of the CPU profiler is a very nice graphical output, low overhead and very simple use (the profiled application doesn't require any recompilation, the profiling is enabled by simply preloading the profiler's library).

Installation

On debian-based systems the tools are packaged under the google-perftools package. For graphical output you also need graphviz installed:

sudo apt-get install google-perftools graphviz

Note that all the tools have the “google-” prefix under debian - the prefix may be missing on other systems (and is also missing in the official documentation).

Profiling the whole process runtime

This section describes profiling a process from its start to its termination. This can be useful to profile the start and termination times of Geany.

  1. To start profiling, run
    LD_PRELOAD=/usr/lib/libprofiler.so CPUPROFILE=/tmp/geany.prof geany

    LD_PRELOAD is used to preload the profiler's library, CPUPROFILE determines the location of the profiler's output, and in the above example, geany is the name of the binary to be executed. The path to the libprofiler library may differ distribution by distribution - I have also seen it under /usr/lib/libprofiler.so.0.

  2. Do whatever you want to profile in Geany. When finished, close Geany.
  3. At this point, /tmp/geany.prof contains the CPU profile information. To view it graphically, run
    google-pprof --web /usr/local/bin/geany /tmp/geany.prof

    (Note the absolute path to Geany has to be used here.) This opens your default web browser with the graphical profile in the svg format. Below is a part of the profile when finding a performance problem in Python ctags parser in Geany:

Every node represents a function called. Gpreftools uses sample-based profiling so at regular intervals it checks which function is being called and records this number for each function. These numbers are then shown in the graph - the first number in every function corresponds to the time spent in the function itself, the second number is the total time spent in the function plus all the functions the function calls. More useful than the sample numbers is the percentual information telling us how many percents of the total time are spent in the given function.

In the above graph we can see the skipEverything() function takes 55.2% of total time so it looks like a good candidate for optimization.

Note: I keep getting the __nss_hosts_lookup in the plot even though no host lookup is actually being performed. This one can be ignored and is probably caused by the fact that some system function is being called and its symbol name isn't known. It's always sufficient to look at the function that calls this __nss_hosts_lookup which might be the one which needs optimization.

Note: I wasn't able to get profiles with reasonable level of information from the GTK libraries provided by the system so if there is some GTK-related problem, it's best to compile GTK by yourself.

Profiling only part of the process runtime

Most often in Geany it's much more useful to profile just some particular feature such as what CPU amount is used when just typing text (which involves all the ctags parsing, tag management, symbol tree updates and scintilla lexing and displaying). In these cases we don't want the profile to contain irrelevant information e.g. from what Geany does when it's started or terminated.

Fortunately, gperftools makes it very simple thanks to the possibility to install a signal handler for which the profiler listens. When the signal is received for the first time, profiling starts and when it's received for the second time, profiling ends.

  1. Run
    LD_PRELOAD=/usr/lib/libprofiler.so CPUPROFILE=/tmp/geany.prof CPUPROFILESIGNAL=12 geany

    where CPUPROFILESIGNAL tells the number of the signal the profiler listens for.

  2. Perform any preparation steps in Geany that shouldn't be visible in the profile (e.g. open a big file).
  3. Tell the profiler to start profiling:
    killall -12 geany

    (12 is the signal number we used in the previous step)

  4. Now perform the action you want to profile in Geany, e.g. start typing.
  5. Tell the profiler to stop profiling by sending it the signal again:
    killall -12 geany
  6. Display the profile in the browser:
    google-pprof --web /usr/local/bin/geany /tmp/geany.prof.0

    Notice the extension 0 in the profiler file - you can record multiple profiles during a single run by sending the signal several times and the individual profiles are numbered by the extension starting from 0.

OS X

There's a bug in older versions of OS X which prevents getting good results from the profiler. There's a (scary) way to fix this by binary patching the OS X kernel (I've used it, it worked, but no guarantees at all).

Other options

There are many additional options in gperftools and while I found the above sufficient for profiling Geany, there may be cases where some extra parameter may be necessary (such as filtering the graph to some subset, more frequent sampling, etc.). See the CPU profiler documentation for more info.

Print/export