How to ensure performance testing is stressing the underlying storage devices, not the OS filesystem.
There are two ways to ensure that IO goes directly to the back-end storage (direct attach disk, SAN or HCI datastore)
Use a “raw” or “physical” device (Use the pattern #<diskID> to specify a disk device)
Use files on the with specific flags to bypass the filesystem cache. (-Su or -Sh)
Be very careful about issuing WRITE workloads to a raw disk (using #<diskID>). If there is a filesystem mounted on the disk – you will corrupt the filesystem. My advice is to only use raw disks that have no formatted filesystem.
What’s the difference between “-Su” and “-Sh”?
For Enterprise storage (SAN or HCI) -Su and -Sh should give the same behavior. Using -Sh additionally uses a hint to disable any caching on the hardware device. Enterprise storage usually does not use on-disk-device caching due to possibility of data loss in the event of power failure. When in doubt, use the -Sh. switch.
Below we will see just how different the results can be depending whether caching is allowed and how reading/writing directly to a device can look quite different to using a filesystem.
diskspd operates on windows filesystems, and will read / write to one or more files concurrently.
The NULL byte problem
By default, when diskspd creates a file it is a file full of NULL bytes. Many storage systems (at least NetApp and Nutanix that I know of) will optimize the layout NULL byte files. This means that test results from NULL byte files will not reflect the performance of real applications that write actual data.
To avoid overly optimistic results, first create the file, then write a randomized data pattern to the file before doing any testing.
Create a file using diskspd -c. e.g. for a 32G file on drive D: then overwrite with random data.
How does database performance behave as more DBs are consolidated?
What impact does running the CVM have on available host resources?
The cluster was able to achieve ~90% of the theoretical maximum.
CVM overhead was 5% for this workload.
The goal was to establish how database performance is affected as additional database workloads are added into the cluster. As a secondary metric, measure the overhead from running the virtual storage controller on the same host as the database servers themselves. We use the Postgres database with pgbench workload and measure the total transactions per second.
4 Node Nutanix cluster, with 2x Xeon CPU’s per host with 20 cores per socket.
Each database is identically configured with
8GB of memory
pgbench benchmark, running the “simple” query set.
The database is sized so that it fits entirely in memory. This is a test of CPU/Memory not IO.
The experiment starts with a single Database on a single host. We add more databases into the cluster until we reach 40 databases in total. At 40 databases with 4 vCPU each and a CPU bound workload we use all 160 CPU cores on the cluster.
The database is configured to fit into the host DRAM memory, and the benchmark runs as fast as it can – the benchmark is CPU bound.
Below are the measured results from running 1-40 databases on the 4 node cluster.
Performance scales almost linearly from 4 to 160 CPU with no obvious bottlenecks before all of the CPU cores are saturated in the host at 40 databases.