fncdump

This manual describes the profiler called fncdump, and how you can use it to determine which parts of a program are taking most of the execution time. We assume that you know how to write, compile, and execute programs. fncdump was written by Yannick Perret.

Copyright (C) 2001 Yannick Perret.

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual into another language, under the same conditions as for modified versions.

Introduction to Profiling

Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing. This information can show you which pieces of your program are slower than you expected, and might be candidates for rewriting to make your program execute faster. It can also tell you which functions are being called more or less often than you expected. This may help you spot bugs that had otherwise been unnoticed.

Since the profiler uses information collected during the actual execution of your program, it can be used on programs that are too large or too complex to analyze by reading the source. However, how your program is run will affect the information that shows up in the profile data. If you don't use some feature of your program while it is being profiled, no profile information will be generated for that feature.

fncdump profiler is divided into two part. The first one is the fnccheck library, which needs to be linked with the program to be profiled. The second part is the fncdump program itself which extract and display profile information for your program.

Profiling has several steps:

The next three chapters explain these steps in greater detail.

Several forms of output are available from the analysis.

The flat profile shows how much time your program spent in each function, and how many times that function was called. If you simply want to know which functions burn most of the cycles, it is stated concisely here. See section The Flat Profile.

The call graph shows, for each function, which functions called it or which other functions it called. See section The Call Graph.

To better understand how profiling works, you may wish to read a description of its implementation. See section Implementation of Profiling.

Compiling a Program for Profiling

The first step in generating profile information for your program is to compile and link it with profiling enabled. Remember that fncdump can only be used with gcc V2.95.2 or higher.

To compile a source file for profiling, specify the `-finstrument-functions -g' options when you run the compiler (This is in addition to the options you normally use).

To link the program for profiling, add the `fnccheck' library to your link command (`-lfnccheck'). Here are examples:

gcc -finstrument-functions -g -c myprog.c utils.c
gcc -o myprog myprog.o utils.o -lfnccheck

The `-finstrument-function -g' options also works with a command that both compiles and links:

gcc -o myprog myprog.c utils.c -finstrument-functions -g -lfnccheck

If you compile only some of the modules of the program with `-finstrument-function -g', you can still profile the program, but you won't get complete information about the modules that were compiled without `-finstrument-function -g'. The only information you get is the time spend in them, seen as local time for the calling functions.

Executing the Program

Once the program is compiled for profiling, you must run it in order to generate the information that fncdump needs. Simply run the program as usual, using the normal arguments, file names, etc. The program should run normally, producing the same output as usual. It will, however, run somewhat slower than normal because of the time spent collecting and the writing the profile data.

The way you run the program--the arguments and input that you give it--may have a dramatic effect on what the profile information shows. The profile data will describe the parts of the program that were activated for the particular input you use. For example, if the first command you give to your program is to quit, the profile data will show the time used in initialization and in cleanup, but not much else.

Your program will write the profile data into a file called `fnccheck.out' just before exiting. If there is already a file called `fnccheck.out', its contents are overwritten. You can specify an other file name by setting the shell variable `FNCCHK_OUTFILE' with the needed file name.

If your program uses fork() function, two processes will have to dump their data. In this case, the main process will dump into `fnccheck.out' or whatever name you set, and the child(s) process(es) will dump into `fnccheck.out_PID_OF_CHILD'. You can prevent childs to dump by using the shell variable `FNCCHK_NOCHILD'.

The `fnccheck.out' is written when returning from main or by calling exit. Some signals are catched to perform file writing on kills, at the condition that the program don't redirect them.

The `fnccheck.out' file is written in the program's current working directory at the time it exits. This means that if your program calls chdir, the `fnccheck.out' file will be left in the last directory your program chdir'd to. If you don't have permission to write in this directory, the file is not written, and you will get an error message.

Several shell variables can be used to change profile behavior:

fncdump Command Summary

After you have a profile data file `fnccheck.out', you can run fncdump to interpret the information in it. The fncdump program prints a flat profile and a call graph on standard output. Typically you would redirect the output of fncdump into a file with `>'.

You run fncdump like this:

fncdump [options] executable-file [> outfile]

Here square-brackets indicate optional arguments.

Output Options

These options specify which of several output formats fncdump should produce.

-avg
-average
The `-avg' option causes fncdump to display an estimated over-cost time of fnccheck library, expressed in time per function.
-sfile <file>
If the `-sfile' option is given, fncdump uses `<file>' as profile data file instead of default `./fnccheck.out'.
-sort <n> / +sort <n>
The `-sort' specify how `fncdump' sorts the flat profile. Available sorts are: 1: sorted by local time. 2: sorted by total time (default). 3: sorted by number of calls. 4: sorted by functions name. 5: not sorted (execution order). The `+sort' option reverse the sort order.
-calls / +calls
This option causes fncdump to display calls list of each function, instead of displaying list of callers of each functions. The `+calls' option display both the two lists.
-no-minmax
This option causes fncdump not to display MIN and MAX time list. This list shows for each function the MIN and MAX total time spend in it.
-func-details
The `-func-details' option causes fncdump to print a list of each function, with its corresponding file and line number. This option can't be used with `-nm' option.
-call-details / -calls-details
The `-call-details' option causes fncdump to display file and line number of callers site in calls list. This option can't be used with `-nm' option.
-fullname
Used with `-func-details' or `-call-details', causes fncdump to print full pathnames for files instead of basenames.
-call-graph
This option causes fncdump to output the call-graph of the profiled program, using VCG format (a GPL call-graph displayer). No other information is then displayed.
-cycles
This option causes fncdump to add the detected cycles during execution. Two kind of cycles can be displayed. Simple recursions (a function calls itself), and cycles, in which case the begin and the end of the cycle are displayed.
-sum-profile 'f1.out' 'f2.out' 'fr.out'
This option causes fncdump to add the profile data files `f1.out' and `f2.out' contents, and to generates a new profile data file called `fr.out'. If `fr.out' is `-', `stdout' is used. If you sum files from different programs or different compilations, the new profile data file will have very few sence. Note: you can use the same name for `fr.out' as for `f1.out' or `f2.out' (overwrite is possible).

Misc Options

These options control various aspects of the `fncdump' behavior.

-nm
This option indicates to `fncdump' to use nm program to extract function names from symbol addresses, instead of using library bfd from binutils. In this case, file and line number of functions are not available.
-addr2line
This option causes `fncdump' to use addr2line program to extract function names from symbol addresses, instead of using library bfd from binutils. Note: if library bfd is not available on your system, `fncdump' can be compiled without using it. In this case, default method is addr2line.
--help / -help / --h / -h
Causes `fncdump' to display a help message, with options list .
--version /-version / --v / -v
Causes `fncdump' to display its version number.
--misc
Causes `fncdump' to display author, contact and bug report information.
--details
Cause `fncdump' to display explanations on available options, output contents and data profile file.

Symspecs

These options specify how to set or remove functions from fncdump display.

-only <flist>
The `-only' option set which functions are to be displayed in the fncdump functions list. `<flist>' is a comma separated list of function names. Only these functions will be used to generate profile, call-graph...
-not <flist>
The `-not' option set which functions are to be removed from the fncdump function list. `<flist>' is a comma separated list of function names. All these functions will be removed from functions list to generate profile, call-graph...
-propagate
This option indicates that functions set or removed (using `-only' or `-not' options) are propagated to childs. It means that if a function f() calls a function g(), using options `-only f -propagate' will make both `f()' and `g()' to be used in functions list. In the same way, using options `-not f -propagate' will make both `f()' and `g()' to be removed from functions list.
-rpropagate
This option indicates that functions set or removed (using `-only' or `-not' options) are propagated to callers. It means that if a function g() is called by a function f(), using options `-only g -rpropagate' will make both `f()' and `g()' to be used in functions list. In the same way, using options `-not g -rpropagate' will make both `f()' and `g()' to be removed from functions list.
-no-spontaneous
The `-no-spontaneous' option causes fncdump to not display functions that seem to be called spontaneously. This can happen if you profile functions that are called by non-profiled functions. Note: this option not apply to main() function.
-no-unresolved
The `-no-unresolved' option causes fncdump to not display functions for which names are not find in the executable. This can happen if you give an invalid executable name to `fncdump', or if you profile functions which are loaded at execution time (using dlopen(...) from dlfcn.h. In this case the functions addresses are not available in executable. The default behavior of `fncdump' is to replace unresolved function names by their symbol address (i.e. `<0x8049178>').
-real-maxtime
The `-real-time' option causes fncdump to compute total execution time using final functions list (after `-not' or `-only' options applied). The total execution time is used to compute time percentages in flat profile display. WARNING: this option is desactivated in version 1.2, due to changes in memory representation of call-graph. It will be activated again in future release.

Interpreting fncdump's Output

fncdump first displays the cycles detected during execution, if the option `-cycles' is activated.

fncdump then displays various informations about the profiled program.

the executable name
This is the name of the profiled program.
Total execution time
Most of the time, this time is the total execution time of main() function. It can be different if main() is not in a profiled object, or if main() has been removed from functions list using `-not' or `-only' options. In this case the execution time is followed by the message `(not main() time)'.
The clock type used
Indicates which kind of clock was used to compute time. Possible values are `real clock time', which indicates that the 'user' clock is used; the second possible value is `system time', which indicates that CPU process time is used.
number of realloc
Indicates how many reallocations where performed during profile. This number is always zero if `FNCCHK_DYNAMIC' is not set. If this number is not zero, bias can be present in times (but it is not possible to know where the bias is).

Then the flat profile is displayed (see below).

After the flat profile, some other informations are displayed:

number of unresolved functions not shown
It is the number of hidden functions due to option `-no-unresolved'.
number of `-not/-only' functions not shown
It is the number of hidden functions due to options `-not' and `-only'.
final stack size
It is the effective stack size used during execution.
number of functions
It is the total number of functions.

Then the MIN/MAX time table is displayed. For each function, its max and min total execution time is displayed. This table is not displayed if option `-no-minmax' is used.

The final data displayed is the call-graph (see below).

The Flat Profile

The flat profile gives for each function various informations, such as local time spend in the function, total execution time, number of calls...

Here is a example of flat profile on a small program:

|      local      |      total      |          |           |
|      sec. |  %  |      sec. |  %  | calls    |t. sec/call| name
|-----------|-----|-----------|-----|----------|-----------|--------
|   0.068976|  0.3|  23.331874|100.0|         1|  23.331874| main
|   5.056025| 21.7|  10.078856| 43.2|         4|   2.519714| recurs_a
|   5.022831| 21.5|   8.822895| 37.8|         4|   2.205724| recurs_b
|   4.037846| 17.3|   8.077933| 34.6|         4|   2.019483| recurs_1s
|   5.105893| 21.9|   5.105893| 21.9|         1|   5.105893| test
|   4.040068| 17.3|   4.040068| 17.3|         4|   1.010017| s1
|   0.000097|  0.0|   0.000180|  0.0|        32|   0.000006| recurs
|   0.000020|  0.0|   0.000030|  0.0|         1|   0.000030| f3
|   0.000010|  0.0|   0.000010|  0.0|         3|   0.000003| f1
|   0.000006|  0.0|   0.000006|  0.0|         1|   0.000006| small

The first two columns display the local time spend in each function. Local time is the total execution time of the function minus the time spend in (profiled) child functions. The second column is the percentage of this time regards to the total execution time.

The next two columns display the total time spend in each function. The second column is the percentage of this time regards to the total execution time.

The fifth column shows the number of calls during execution for each function.

The next column indicate the average time per call, computed as the total execution time divided by the number of calls.

The last column displays the corresponding function name.

The Call Graph

Call-graph shows for each function the list of its callers, or its childs if using option `-calls'.

Here is a example of call-graph for a small program:

'main' [0] spontaneously called.

'recurs_a' [1] called by:
   [0],
   [2],

'recurs_b' [2] called by:
   [1],

'recurs_1s' [3] called by:
   [0],
   [3],

'test' [4] called by:
   [0],

's1' [5] called by:
   [3],

'recurs' [6] called by:
   [0],
   [6],

'f3' [7] called by:
   [0],

'f1' [8] called by:
   [7],

'small' [9] called by:
   [0],

If option `-calls' is used, callers are replaced by list of child functions. For the same example it becomes:

'main' [0] calls:
   [1],
   [3],
   [4],
   [6],
   [7],
   [9],

'recurs_a' [1] calls:
   [2],

'recurs_b' [2] calls:
   [1],

'recurs_1s' [3] calls:
   [3],
   [5],

'test' [4] calls: nobody.

's1' [5] calls: nobody.

'recurs' [6] calls:
   [6],

'f3' [7] calls:
   [8],

'f1' [8] calls: nobody.

'small' [9] calls: nobody.

If option `-call-details' is used, file and line number of each function and each caller is added to output. Here is an example for the same program:

'main' [0] (essai.c:128) spontaneously called.

'recurs_a' [1] (essai.c:68) called by:
   [0](essai.c:132),
   [2](essai.c:94),

'recurs_b' [2] (essai.c:82) called by:
   [1](essai.c:78),

'recurs_1s' [3] (essai.c:56) called by:
   [0](essai.c:134),
   [3](essai.c:61),

'test' [4] (essai.c:121) called by:
   [0](essai.c:139),

's1' [5] (essai.c:16) called by:
   [3](essai.c:58),
'recurs' [6] (essai.c:48) called by:
   [0](essai.c:130),
   [6](essai.c:51),

'f3' [7] (essai.c:40) called by:
   [0](essai.c:142),

'f1' [8] (essai.c:22) called by:
   [7](essai.c:41),

'small' [9] (essai.c:113) called by:
   [0](essai.c:145),

Details of Profiling

More details about how fnccheck library perform its job.

Implementation of Profiling

The fnccheck library uses a new feature of gcc V2.95.2 (or higher) called `-finstrument-functions'. This option makes `gcc' to add to every functions a call at enter and at exit. The fnccheck library gives these two functions called void __cyg_profile_func_enter(void *this_fn, void *call_site) and void __cyg_profile_func_exit(void *this_fn, void *call_site).

The fnccheck library manages a stack of function calls and a list of functions. For each of them, the number of calls, the time spend in them (local and total) is computed. Moreover a list of callers is associated to functions in order to manage call-graph.

At exit time, all these data are dumped into a file `./fnccheck.out' in order to be used with fncdump program.

Note: the fnccheck library only works with symbol addresses of functions, and not with names. Names are extracted from executable by fncdump.

Profiling Data File Format

The profile data file `./fnccheck.out' format is:

 HEADER
 # of elements
 f1_ptr  #calls sec usec locsec locusec minsec minusec maxsec maxusec
 f2_ptr  #calls sec usec locsec locusec minsec minusec maxsec maxusec
 ...
 # of realloc
 stack_size func_table_size
 nb_of_callers_f1 caller1_ptr rc1_ptr caller2_ptr rc2_ptr ...
 nb_of_callers_f2 caller1_ptr rc1_ptr caller2_ptr rc2_ptr ...
 ...
 time_mode
 exec_name

HEADER : identifier (including file-format version) of stat-files # el : number of functions fi_ptr : symbol of the function nb_calls : number of calls for this function nb_sec : total time in the function

nb_usec : corresponding usec

_locsec : local time in the function

_locusec : corresponding usec

min_sec : minimum time spend in the function

_subusec : corresponding usec

min_sec : maximum time spend in the function

_subusec : corresponding usec

#realloc : number of 'realloc' performed during execution

stack_size: final stack size used during execution

func_tbl_s: number of function stored during execution

nbofcaller: number of callers for the corresp. function

calleri_pt: symbol of ith caller for the function

rci_ptr : symbol of caller of the function

time_mode : the FNCCHK_TIME type

exec_name : corresp. exec name (not implemented)

fnccheck's Internal Operation

Here are the treatments performed by fnccheck library for each enter/exit of functions:

At the first call:

Initializations of hash-tables, options, stack. Trapping exits with atexit() and signal() functions.

When entering a function:

*
check if the function is still registered, and register it if it is not the case.
*
add the call-site to callers list for this function
*
store the function and its entering time in the stack
*
compute the additional local time for the calling function. It is the time between the last active time of this function minus the current time.

When exiting a function:

*
compute the total execution time (difference between entering time and current time)
*
increment number of calls
*
compute the additional local time for the calling function. It is the time between the last active time of this function minus the current time.
*
check if we are exiting main() function, and is yes call the dumping function.

When leaving the program:

Creates the profile data file and dumps data into it. Freed allocated tables and remove treatments (because in some cases, in particular for C++ programs, calls to exit treatments can occur AFTER exiting from main().

Debugging during execution

In order to allow debug during execution, you have to compile the library fnccheck using target `dfnccheck'. This allows you to use the shell variable `FNCCHK_DEBUG'. Valid values are:

2
display collision in hash table;
4
details on `fnccheck' library functions;
8
details on `fnccheck' library treatments.