Benchmarking with Postgres PT2

In this example we run pgbench with a scale factor of 1000 which equates to a database size of around 15GB. The linux VM has 32G RAM, so we don’t expect to see many reads.

Using prometheus with the Linux node exporter we can see the disk IO pattern from pgbench. As expected the write pattern to the log disk (sda) is quite constant, while the write pattern to the database files (sdb) is bursty.

pgbench with DB size 50% of Linux buffer cache.

I had to tune the parameter checkpoint_completion_target from 0.5 to 0.9 otherwise the SCSI stack became overwhelmed during checkpoints, and caused log-writes to stall.

default pgbench – notice the sharp drop in log-writes before tuning.

Benchmarking with Postgres PT1

Image By Daniel Lundin

In this example, we use Postgres and the pgbench workload generator to drive some load in a virtual machine.  Assume a Linux virtual machine that has Postgres installed. Specifically using a Bitnami virtual appliance.

  • Once the VM has been started, connect to the console
  • Allow access to postgres port 5432 – which is the postgres DB port or allow ssh
  • Note the postgres user password (cat ./bitnami_credentials)
  •  Login to psql from the console or ssh
  • Optionally change password (the password prompted is the one from bitnami_credentials for the postgres database user).
  • Create a DB to run the pgbench workload.  In this case I name the db pgbench-sf10 for “Scale Factor 10”.  Scale Factors are how the size of the database is determined.
  • Initialise the DB with data ready to run the benchmark.  The “createdb” step just creates an empty schema.
    • -i means “initialize”
    • -s means “scale factor” e.g. 10
    • pgbench-sf10 is the database schema to use.  We use the one just created pgbench-sf10
  • Noe run a workload against the DB schema called pgbench-sf10

The workload pattern, and load on the system will vary greatly depending on the scale factor.  

Scale-Factor        Working Set Size

1                                   23M
10                                157M
100                             1.7GB
1000                          15GB
2500                          37GB
5000                         74GB
10000                       147GB



HCI Performance testing made easy (Part 3)

Creating a HCI benchmark to simulate multi-tennent workloads



HCI deployments are typically multi-tennant and often different nodes will support different types of workloads. It is very common to have large resource-hungry databases separated across nodes using anti-affinity rules.  As with traditional storage, applications are writing to a shared storage environment which is necessary to support VM movement.  It is the shared storage that often causes performance issues for data bases which are otherwise separated across nodes.  We call this the noisy neighbor problem.  A particular problem occurs when a reporting / analytical workload shares storage with a transactional workload.

In such a case we have a Bandwidth heavy workload profile (reporting) sharing with a Latency Sensitive workload (transactional)

In the past it has been difficult to measure the noisy neighbor impact without going to the trouble of configuring the entire DB stack, and finding some way to drive it.  However in X-Ray we can do exactly this sort of workload.  We supply a pre-configured scenario which we call the  DB Colocation test.

The DB Colocation test utilizes two properties of X-Ray not found in other benchmarking tools

  • Time based benchmark actions
  • Distinct per-VM workload patterns
  • Ability to provision particular workloads, to particular hosts

In our example scenario X-Ray begins by starting a workload modeled after a transactional DB (we call this the OLTP workload) on one of the nodes.  This workload runs for 60 minutes.  Then after 30 minutes X-Ray starts workloads modeled after reporting/analytical workloads on two other nodes (we call this the DSS workload).

After 30 minutes we have three independent workloads running on three independent nodes, but sharing the same storage.  The key thing to observe is the impact on the latency sensitive (OLTP) workload.  In this experiment it is the DSS workloads which are the noisy neighbor, since they will tend to utilize a lot of the storage bandwidth.  An ideal result is one where there is very little interference with the running OLTP workload, even though we expect latency to increase.  We can compare the impact on the OLTP workload by comparing the IOPS/response time during the first 30 minutes (no interference) with the remaining 60 minutes (after the DSS workloads are started).  We should expect to see some increase in response time from the OLTP application because the other nodes in the cluster have gone from idle to under-load.  The key thing to observe is whether the OLTP IOP target rate (4,000 IOPS) is achieved when the reporting workload is applied.


X-Ray Scenario configuration

We specify the timing rules and workloads in the test.yml file.  You can modify this to contain whichever values suit your model.  I covered editing an existing workload in Part 1.

The overall scenario begins with the OLTP workload, which will run for 3600 seconds (1 hour).  The stagger_secs value is used if there are multiple OLTP sub-workloads.  In the simple case we do use a single OLTP workload.

The scenario pauses for 1800 seconds using the test.wait specification then immediately starts the DSS workload

Finally the scenario uses the workload.Wait specification to wait for the OLTP workload to finish (approx 1 hour) before the test is deemed completed.

X-Ray Workload specification

The DB Co-Location test uses two workload profiles that aim to simulate transactional (OLTP) and reporting/analytical (DSS) workloads.  The specifications for those workloads are contained in the two .fio files (oltp.fio and dss.fio)


The OLTP workload (oltp.fio) that we ship as  has the following characteristcs based on typical configurations that we see in the field (of course you can change these to whatever you like).

  • Target IOP rate of 4,000 IOPS
  • 4 “Data” Disks
    • 50/50 read/write ratio.
    • 90% 8KB, 10% 32KB bloc-ksize
    • 8 outstanding IO per disk
  • 2 “Log” Disks
    • 100% write
    • 90% sequential
    • 32k block-size
    • 1 outstanding IO per disk

The idea here is to simulate the two main storage workloads of a DB.  The “data” portion and the “log” portion.  Log writes are just used to commit transactions and so are 100% write.  The only time the logs are read is during DB recovery, which is not part of this scenario.  The “Data” disks are doing both reads (from DB cache misses) and writes committed transactions.  A 50/50 read/write mix might be considered too write intensive – but we wanted to stress the storage in this scenario.


The DSS workload is configured to have the following characteristics

  • Target IOP rate of 1400 IOPS
  • 4 “Data” Disks
    • 100% Read workload with 1MB blocksize
    • 10 Outstanding IOs
  • 2 “Log” Disk
    • 100% Write workload
    • 90% sequential
    •  32K block-size
    • 1 outstanding IO per disk

The idea here is to simulate a large database doing a lot of reads across a large workingset size.  The IO to the data disks is entirely read, and uses large blocks to simulate a database scanning a lot of records.  The “Log” disks have a very light workload, purely to simulate an active database which is probably updating a few tables used for housekeeping.