Understanding Concurrency Parameters in pgbench

Published: (Updated: ) in Workloads & Benchmarks, , , by .

How to use the “jobs” and “clients” parameters in pgbench without going crazy.

pgbench paramaters for concurrency control

pgbench offers two parameters for controlling the concurrency in the benchmark. Namely:

-j, --jobs=NUM number of threads (default: 1)
-c, --client=NUM number of concurrent database clients (default: 1)

Here are the TPS delivered for a simple combination of -j(1,10) and -c(1,10) using a very small (cached) database.

The machine is a GCP instance (e2-standard-8 (8 vCPUs, 32 GB memory). The database size is tiny (Scale Factor 100).

pgbench read-only test.

Firstly I ran pgbench with the -S flag “Select Only” to avoid having the disk be a bottleneck. In this experiment we are mainly interested in the concurrency options.

Transactions Per Second (Read-Only)

The result shows that the number of “clients” (postgres client processes) is the clear dominant factor. With the tiny DB and 8 cores a single pgbench thread (-j=1) is almost able to saturate the 8 cores. With j=1 and c=10 there was about 20% idle across all the cores.

“mpstat -P ALL 1” with j=1 and c=10 (read only transactions)

With 10 pgbench threads and 10 postgres client processes (-j=10 -c=10) all 8 cores were 100% saturated

“mpstat -P ALL 1” with j=10 and c=10 (read only transactions)

pgbench read/write test.

For completeness I re-ran the experiment without the “-S” option. The GCP instance had a single disk and was easily overwhelmed by the amount of IO generated by 8 cores at full blast. At any rate the number of postgres client processes (-c=10) is the clear dominant factor – albeit at a much lower TPS rate (due to the fact that so much time is spent waiting on disk).

Transactions Per Second (Read & Update)
“mpstat -P ALL 1” with j=10 and c=10 (read/write transactions)

What’s really interesting here is that most of the cores are showing “idle” rather than IO wait. I believe that the postgres threads must be waiting on a single writer thread to finish disk IO before they can continue (via lock. or cv_wait. So, in reality all the CPU’s/Threads are blocked on IO, but not directly so the kernel does not know to show that the CPU’s could be doing more work if the IO were faster.


Leave a Comment