In this example, we use Postgres and the pgbench workload generator to drive some load in a virtual machine. Assume a Linux virtual machine that has Postgres installed. Specifically using a Bitnami virtual appliance.

Once the VM has been started, connect to the console
Allow access to postgres port 5432 – which is the postgres DB port or allow ssh

$ sudo ufw allow 5432

Note the postgres user password (cat ./bitnami_credentials)
Login to psql from the console or ssh

psql -U postgres

Optionally change password (the password prompted is the one from bitnami_credentials for the postgres database user).

psql -U postgres
postgres=# alter user postgres with password 'NEW_PASSWORD';
postgresl=# \q

Create a DB to run the pgbench workload. In this case I name the db pgbench-sf10 for “Scale Factor 10”. Scale Factors are how the size of the database is determined.

$ sudo -u postgres createdb pgbench-sf10

Initialise the DB with data ready to run the benchmark. The “createdb” step just creates an empty schema.
- -i means “initialize”
- -s means “scale factor” e.g. 10
- pgbench-sf10 is the database schema to use. We use the one just created pgbench-sf10

$ sudo -u postgres pgbench -i -s 10 pgbench-sf10

Noe run a workload against the DB schema called pgbench-sf10

$ sudo -u postgres pgbench pgbench-sf10

The workload pattern, and load on the system will vary greatly depending on the scale factor.

Scale-Factor Working Set Size

1 23M
10 157M
100 1.7GB
1000 15GB
2500 37GB
5000 74GB
10000 147GB

Creating a HCI benchmark to simulate multi-tennent workloads

HCI deployments are typically multi-tennant and often different nodes will support different types of workloads. It is very common to have large resource-hungry databases separated across nodes using anti-affinity rules. As with traditional storage, applications are writing to a shared storage environment which is necessary to support VM movement. It is the shared storage that often causes performance issues for data bases which are otherwise separated across nodes. We call this the noisy neighbor problem. A particular problem occurs when a reporting / analytical workload shares storage with a transactional workload.

In such a case we have a Bandwidth heavy workload profile (reporting) sharing with a Latency Sensitive workload (transactional)

In the past it has been difficult to measure the noisy neighbor impact without going to the trouble of configuring the entire DB stack, and finding some way to drive it. However in X-Ray we can do exactly this sort of workload. We supply a pre-configured scenario which we call the DB Colocation test.

The DB Colocation test utilizes two properties of X-Ray not found in other benchmarking tools

Time based benchmark actions
Distinct per-VM workload patterns
Ability to provision particular workloads, to particular hosts

In our example scenario X-Ray begins by starting a workload modeled after a transactional DB (we call this the OLTP workload) on one of the nodes. This workload runs for 60 minutes. Then after 30 minutes X-Ray starts workloads modeled after reporting/analytical workloads on two other nodes (we call this the DSS workload).

After 30 minutes we have three independent workloads running on three independent nodes, but sharing the same storage. The key thing to observe is the impact on the latency sensitive (OLTP) workload. In this experiment it is the DSS workloads which are the noisy neighbor, since they will tend to utilize a lot of the storage bandwidth. An ideal result is one where there is very little interference with the running OLTP workload, even though we expect latency to increase. We can compare the impact on the OLTP workload by comparing the IOPS/response time during the first 30 minutes (no interference) with the remaining 60 minutes (after the DSS workloads are started). We should expect to see some increase in response time from the OLTP application because the other nodes in the cluster have gone from idle to under-load. The key thing to observe is whether the OLTP IOP target rate (4,000 IOPS) is achieved when the reporting workload is applied.

X-Ray Scenario configuration

We specify the timing rules and workloads in the test.yml file. You can modify this to contain whichever values suit your model. I covered editing an existing workload in Part 1.

The overall scenario begins with the OLTP workload, which will run for 3600 seconds (1 hour). The stagger_secs value is used if there are multiple OLTP sub-workloads. In the simple case we do use a single OLTP workload.

The scenario pauses for 1800 seconds using the test.wait specification then immediately starts the DSS workload

Finally the scenario uses the workload.Wait specification to wait for the OLTP workload to finish (approx 1 hour) before the test is deemed completed.

X-Ray Workload specification

The DB Co-Location test uses two workload profiles that aim to simulate transactional (OLTP) and reporting/analytical (DSS) workloads. The specifications for those workloads are contained in the two .fio files (oltp.fio and dss.fio)

OLTP

The OLTP workload (oltp.fio) that we ship as has the following characteristcs based on typical configurations that we see in the field (of course you can change these to whatever you like).

Target IOP rate of 4,000 IOPS
4 “Data” Disks
- 50/50 read/write ratio.
- 90% 8KB, 10% 32KB bloc-ksize
- 8 outstanding IO per disk
2 “Log” Disks
- 100% write
- 90% sequential
- 32k block-size
- 1 outstanding IO per disk

The idea here is to simulate the two main storage workloads of a DB. The “data” portion and the “log” portion. Log writes are just used to commit transactions and so are 100% write. The only time the logs are read is during DB recovery, which is not part of this scenario. The “Data” disks are doing both reads (from DB cache misses) and writes committed transactions. A 50/50 read/write mix might be considered too write intensive – but we wanted to stress the storage in this scenario.

DSS

The DSS workload is configured to have the following characteristics

Target IOP rate of 1400 IOPS
4 “Data” Disks
- 100% Read workload with 1MB blocksize
- 10 Outstanding IOs
2 “Log” Disk
- 100% Write workload
- 90% sequential
- 32K block-size
- 1 outstanding IO per disk

The idea here is to simulate a large database doing a lot of reads across a large workingset size. The IO to the data disks is entirely read, and uses large blocks to simulate a database scanning a lot of records. The “Log” disks have a very light workload, purely to simulate an active database which is probably updating a few tables used for housekeeping.

n0derunner

benchmarks

Benchmarking with Postgres PT2

Benchmarking with Postgres PT1