Back to basics – gprof PT1 (generating a profile)

A while ago at work we were discussing the pros and cons of the classic Linux profiler gprof

What are profiles useful for?

  • Profiles allow you to observe what a program is doing at runtime without runtime access to the source code (although source is needed to build a profile-enabled binary).
  • A performance regression or improvement in a given program will often show up in a profile as a change in either duration or number of calls to some set of functions that changed between software builds.

The output of a profile run against a very simple program looks like the example below.

                     Call graph (explanation follows)


granularity: each sample hit covers 4 byte(s) for 0.17% of 5.82 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    2.38    3.44                 main [1]
                0.88    0.55 666666667/666666667     calcme2 [2]
                0.73    0.28 333333333/333333333     calcme1 [3]
                0.99    0.00 1000000000/1000000000     increment [4]
-----------------------------------------------
                0.88    0.55 666666667/666666667     main [1]
[2]     24.6    0.88    0.55 666666667         calcme2 [2]
                0.55    0.00 666666667/1000000000     return3 [5]
-----------------------------------------------
                0.73    0.28 333333333/333333333     main [1]
[3]     17.3    0.73    0.28 333333333         calcme1 [3]
                0.28    0.00 333333333/1000000000     return3 [5]
-----------------------------------------------
                0.99    0.00 1000000000/1000000000     main [1]
[4]     17.1    0.99    0.00 1000000000         increment [4]
-----------------------------------------------
                0.28    0.00 333333333/1000000000     calcme1 [3]
                0.55    0.00 666666667/1000000000     calcme2 [2]
[5]     14.3    0.83    0.00 1000000000         return3 [5]
-----------------------------------------------

It tells us that of the 5.82 seconds of profiled runtime

  • 2.38 seconds were spent in main()
  • 3.44 seconds in functions called by main (either directly or indirectly).

It also tells us that the function return3() was called 1,000,000,000 times

  • 1/3 by calcme1() and
  • 2/3 by calcme2()

Profiles can give a good overview of where time is spent – albeit with some overhead.

In the profile outputs, the number of calls is exact but the time spent is the result of sampling. So the relative time spent in the functions should be correct, but the absolute time will normally much lower than the runtime/execution time reported by e.g running $ time <some-binary>

generating a profile for gprof to use

There are three pieces to using a profile

  • ask gcc to insert additional code (a function call to __mcount) into the compiled binary
  • ask gcc to tell the kernel to keep track of where the program counter (PC) is running whenever it is sampled (__monstartup)
  • use the gprof tool to process the output.

There is no need for any special hardware support – profiles run fine in a VM/container etc whereas some tools need to access the CPU directly via pass-through, or you need to be root etc.

Continue reading