A while ago at work we were discussing the pros and cons of the classic Linux profiler gprof…
What are profiles useful for?
- Profiles allow you to observe what a program is doing at runtime without runtime access to the source code (although source is needed to build a profile-enabled binary).
- A performance regression or improvement in a given program will often show up in a profile as a change in either duration or number of calls to some set of functions that changed between software builds.
The output of a profile run against a very simple program looks like the example below.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 0.17% of 5.82 seconds
index % time self children called name
<spontaneous>
[1] 100.0 2.38 3.44 main [1]
0.88 0.55 666666667/666666667 calcme2 [2]
0.73 0.28 333333333/333333333 calcme1 [3]
0.99 0.00 1000000000/1000000000 increment [4]
-----------------------------------------------
0.88 0.55 666666667/666666667 main [1]
[2] 24.6 0.88 0.55 666666667 calcme2 [2]
0.55 0.00 666666667/1000000000 return3 [5]
-----------------------------------------------
0.73 0.28 333333333/333333333 main [1]
[3] 17.3 0.73 0.28 333333333 calcme1 [3]
0.28 0.00 333333333/1000000000 return3 [5]
-----------------------------------------------
0.99 0.00 1000000000/1000000000 main [1]
[4] 17.1 0.99 0.00 1000000000 increment [4]
-----------------------------------------------
0.28 0.00 333333333/1000000000 calcme1 [3]
0.55 0.00 666666667/1000000000 calcme2 [2]
[5] 14.3 0.83 0.00 1000000000 return3 [5]
-----------------------------------------------
It tells us that of the 5.82 seconds of profiled runtime
2.38seconds were spent inmain()3.44seconds in functions called by main (either directly or indirectly).
It also tells us that the function return3() was called 1,000,000,000 times
- 1/3 by
calcme1()and - 2/3 by
calcme2()
Profiles can give a good overview of where time is spent – albeit with some overhead.
In the profile outputs, the number of calls is exact but the time spent is the result of sampling. So the relative time spent in the functions should be correct, but the absolute time will normally much lower than the runtime/execution time reported by e.g running $ time <some-binary>
generating a profile for gprof to use
There are three pieces to using a profile
- ask
gccto insert additional code (a function call to__mcount) into the compiled binary - ask gcc to tell the kernel to keep track of where the program counter (PC) is running whenever it is sampled (
__monstartup) - use the
gproftool to process the output.
There is no need for any special hardware support – profiles run fine in a VM/container etc whereas some tools need to access the CPU directly via pass-through, or you need to be root etc.
Continue reading