Microsoft diskpd. Part 1 Preparing to test.

Installing Disk-Speed (diskspd).

Overview

diskspd operates on windows filesystems, and will read / write to one or more files concurrently.

The NULL byte problem

By default, when diskspd creates a file it is a file full of NULL bytes. Many storage systems (at least NetApp and Nutanix that I know of) will optimize the layout NULL byte files. This means that test results from NULL byte files will not reflect the performance of real applications that write actual data.

Suggested work-around

To avoid overly optimistic results, first create the file, then write a randomized data pattern to the file before doing any testing.

Create a file using diskspd -c. e.g. for a 32G file on drive D: then overwrite with random data.

diskspd.exe -c32G D:\testfile1.dat

This will create a 32G file full of NULL bytes

Default write command sends NULL bytes to disk

Then overwrite with a random pattern

diskspd.exe -w100 -Zr D:\testfile1.dat
same file after being overwritten with diskspd -w100 -Zr
Continue reading

Why does my SSD not issue 1MB IO’s?

First things First

https://commons.wikimedia.org/wiki/File:CDC9762-smd-drive.jpg
CDC 9762 SMD disk drive from 1974

Why do we tend to use 1MB IO sizes for throughput benchmarking?

To achieve the maximum throughput on a storage device, we will usually use a large IO size to maximize the amount of data is transferred per IO request. The idea is to make the ratio of data-transfers to IO requests as large as possible to reduce the CPU overhead of the actual IO request so we can get as close to the device bandwidth as possible. To take advantage of and pre-fetching, and to reduce the need for head movement in rotational devices, a sequential pattern is used.

For historical reasons, many storage testers will use a 1MB IO size for sequential testing. A typical fio command line might look like something this.

fio --name=read --bs=1m --direct=1 --filename=/dev/sda
Continue reading

HammerDB: Avoiding bottlenecks in client.

HammerDB is a great tool for running Database benchmarks. However it is very easy to create an artificial bottleneck which will give a very poor benchmark result.

When setting up HammerDB to run against even a moderate modern server, it is important to avoid displaying the client transaction outputs in the HammerDB UI.

In my case just making this simple changed increased my HammerDB results by over 6X. The reason is that HammerDB spends more time updating its own UI, than it does sending transactions to the DB. When I run HammerDB, I select “Log Output to Temp” and “Use Unique Log Name”.

  • Either:
    • Un-check the “Show results” mark.
    • Or Ensure that the results are logged to file, not displayed on screen
  • Otherwise the workload generator will become the bottleneck.
Checked -> 124,000 tpm
Un-Checked -> 800,000 tpm<
/pre>



Show Output "Checked"
Show Output "Un-Checked"
HammerDB with "Show results" unchecked. SQL Server uses all the CPU (99%)
HammerDB with "Show results" checked. SQL Servr using ~20% HammerDB driver (Identified as wish86t) is using 99% of one CPU.