Microsoft diskspd. Part 2 How to bypass NTFS Cache.

How to ensure performance testing with diskspd is stressing the underlying storage devices, not the OS filesystem.

Summary tl;dr

There are two ways to ensure that IO goes directly to the back-end storage (direct attach disk, SAN or HCI datastore) Before going further see Part 1 – Preparing to test with diskspd

Use a “raw” or “physical” device (Use the pattern #<diskID> to specify a disk device)
Use files on the with specific flags to bypass the filesystem cache. (-Su or -Sh)

Be very careful about issuing WRITE workloads to a raw disk (using #<diskID>). If there is a filesystem mounted on the disk – you will corrupt the filesystem. My advice is to only use raw disks that have no formatted filesystem.

What’s the difference between “-Su” and “-Sh”?

For Enterprise storage (SAN or HCI) -Su and -Sh should give the same behavior. Using -Sh additionally uses a hint to disable any caching on the hardware device. Enterprise storage usually does not use on-disk-device caching due to possibility of data loss in the event of power failure. When in doubt, use the -Sh. switch.

Below we will see just how different the results can be depending whether caching is allowed and how reading/writing directly to a device can look quite different to using a filesystem.

Using an existing filesystem.

1. Reading a file on filesystem with cache DISABLED

Often we want to test the storage performance of an exsiting filesystem. To do that, use the -Su flag to bypass the filesystem cache.

Although the specific IOP figures reported will entirely depend on the underlying storage, the diskspd reported number and the number reported from the storage device should be quite close.

Test 1 – Issue an 8KB random read test against the file F:\testfile4.dat and bypass the cache.

diskspd.exe -w0 -r -b8k -Su -d10 -o32 F:\testfile4.dat

The IO performance reported by diskspd is about 68,500 IOPS

The backend storage (in this case Nutanix HCI) reports 69,000 IOPS which is close enough. We can be fairly sure that we are exercising the storage.

Back-end Storage IOPS (Ops/s -> Read) matches the diskspd number

2. Reading a file on filesystem with cache ENABLED (default)

With caching enabled (the default) the result from diskspd is expected to be higher than the underlying storage.

Test 2 – Issue an 8KB random read test against the file F:\testfile4.dat and allow the filesystem to cache.

diskspd.exe -w0 -r -b8k -d10 -o32 F:\testfile4.dat

With caching enabled, diskspd show 215,000 IOPS (compared 69,000 when bypassing cache)

The storage shows ZERO IOPS since all the IO is satisfied in the Windows filesystem cache and no IO’s are serviced by the back-end storage.

Backend IOPS are Zero when buffered IO is used.

3. Writing to a file on filesystem with caching disabled (-Su)

With buffering/caching disabled, the IOPS reported by diskspd should be close to the value observed at the back-end storage.

diskspd.exe -w100 -r -b8k -Su -Zr -d10 -o32 F:\testfile4.dat

diskspd reports ~30,000 IOPS

The backend storage shows similar figures as the front end. 34,000 IOPS and IO size is 8K

So unbuffered writes look pretty similar when viewed from the Windows guest OS and the back-end storage. IO size is 8K and IOPS are around 34,000

4. Writing to a file on filesystem with caching enabled (default)

When writing to a buffered filesystem we expect the IO rate to be much higher, and it is. diskspd reports 158,000 IOPS (compared to 34,000 IOPS unbuffered)

Storage traffic with buffered writes.

Unlike the buffered read case, the storage does still show some activity. In the buffered case, diskspd shows 158,000 IOPS, 8KB IO size. Which is what we specified on the CLI. However the storage writes look nothing like that. The backend storage is reporting 39 IOPS – with an IO size of ~244KB.

The reason for this is that the a) The Windows filesystem cache is not infinite and b) The Windows OS would like to save the user from losing data. To account for both these things, the OS flushes the filesystem cache periodically (or when it is under memory pressure). However, the filesystem is at liberty to rearrange or coalesce the multiple 8KB writes into whatever order it likes. So the storage system actually sees a write pattern of 256KB – sequential pattern which is the write pattern of the NTFS back-end (representing the filesystem cache) not the write pattern of the incoming user workload (8KB random).

Using physical devices directly

Another option is to not read via filesystem at all and use physical device. When we use diskspd with physical device, we bypass the filesystem, so the expected behavior is that buffered (default) and unbuffered (-Su) will return about the same performance because there is no filesystem buffer cache.

First of all get a list of disks. I am using Disk 5 – 2GB in size. Which has no filesystem on it.

DISKPART > list disk

Before going further, I need to write out data to the disk, otherwise I get the “optimized for NULL” performance number. See previous article

diskspd.exe -w100 -Zr #5

1. Reading directly from physical device. Cache DISABLED (-Su)

When using physical devices. We expect the results from diskspd to be similar to the underlying storage results REGARDLESS of whether caching is enabled or disabled. This is because when we access the disks directly, there is no cache so using -Su has no effect. The results should be very similar.

First I run with the “Unbuffered switch” -Su

diskspd.exe -w0 -r -b8k -Su -d10 -o32 #5

The result is about 69,000 IOPS. About the same result as the unbuffered test bypassing filesystem cache to a regular file. This is expected since both experiments stress the same underlying storage.

2. Reading directly from physical device. Cache ENABLED (default)

diskspd.exe -w0 -r -b8k  -d10 -o32 #5

Even though diskspd reports that it is “using software cache” above. That is not true when using disk devices directly. We see this in the reported IOPS from diskspd – which is about the same regardless if whether -Su is used or not. In both cases the number reported by diskspd is about the same as the number reported by storage.

Writing to physical devices

Just like with the read workloads, when using raw disk devices using buffered or unbuffered IO makes no difference, because there is no buffer to write into.

1. Writing to a disk device using defualt

diskspd.exe -w100 -r -b8k -Zr -d10 -o32 #5

Diskspd reports 29,000 IOPS

The storage shows about the same number of IOS/

Using the default (buffered) returns about the same number from diskspd (29,000 IOPS). Remember that when we issued buffered IO to the filesystem diskspd reported 158,000 IOPS. Just like the read use-case. “Buffering” has no impact when using using the disk device directly – because there is no filesystem buffer cache.

2 Writing to a disk device using -Su (unbuffered)

diskspd.exe -w100 -r -b8k -Su -Zr -d10 -o32 #5

Storage shows about the same result (30,000 IOPS)

Summary

When using filesystem to test back-end storage. Use the -Su parameter to bypass filesystem cache/buffering for both reads and writes. When bypassing filesystem cache – the numbers reported by diskspd should be very similar to what is observed at the storage.

When using physical devices directly (using the format #<disk number>) there is no need to use the -Su flag since there is no cache anyway. The figures reported by diskspd should be very close to what is observed at the storage.