With help from the Nutanix X-Ray team I have created an IO “benchmark” which simulates a “General Server Virtualization” workload. I call it the “Mixed Workload Simulator“
Mixed Workload Simulator
The goals of this benchmark/generator are :
- Closely simulate 10’s to 100’s of VM’s all acting independently
- Actually deploy 100’s or 1000’s of VMs to the cluster rather than having a single VM do 100x or 1000x the work.
- Incorporate realistic burst patterns as seen when server workloads write through a filesystem
- Incorporate realistic burst patterns seen in the kinds of databases commonly virtualized
- Make the benchmark easy to deploy
- Rate-Limit the IO to closely model the fact that in general clusters of virtualized hosts are generally limited by CPU/Memory than IOPS
- Create a workload that can be used in failure/failover tests. Most current “benchmarks” push the storage or CPU to the very limit which is not at all what we observe in customer environments.
End-User organizations want to evaluate the storage of newly emerging architectures and deployment styles. In particular Hyper-Converged and public cloud infrastructure. Top-Speed is often not the highest criteria – since SSD based storage offers more than most traditional workloads can consume (real applications, not IO generators that just throw away the data that they read/write). For most end-users more important criteria are:
- Multi-tenancy (noisy neighbor tolerance, isolation)
- Time to recover from component failure
- (Related) Performance of front-end IO during recovery.
- Time to re-silver mirror copies
- Time to failover to a remote site
- Ease/Efficiency of increasing cluster scale
The End-Users are forced to use tools developed either for vendor benchmarking (SPC-1, SFS) or microbenchmarks (fio, vdbench, diskspd) developed primarly by engineers to test specific elements of storage under development.
In other words there are very few, if any benchmarks or workload generators that are aimed at End-Users. Furthermore, these generators are often (mis)used to generate the maximum possible IO rates or throughput.
The maximum rate is important to know, but makes failure/fail-over testing almost impossible to evaluate because the storage is operating at its absolute maximum capacity. Under these conditions any failure in the system will tend to have exaggerated results. Most end-users do not operate storage at anywhere near 100% performance capacity.
Typical effect of IO Blender on Storage performance testing.
Since there are so many workloads hitting the SAN, the traditional approach is to observe the average IO characteristics and create a simple model using those. This reflects the IO Blender effect of many many different IO patterns hitting the centralized storage at once.
An example might be 50,000 IOPS, 16K 80% random 70/30 read:write.
A workload simulator is then setup with those parameters and run against the cloud storage. Because it is so painful to coordinate multiple microbenchmark clients – the temptation is to setup as few workers as possible and pump-up the rate that each worker delivers. e.g. in the 50,000 IOP example I might want to avoid coordinating 100’s of VMs and instead run only 10 workers, each doing 10,000 IOPS each. More often End-users understandably want to avoid coordination at all and will test their cloud storage platfrom with a single worker, single disk pushing 50,000 IOPS down to a single disk.
Often cloud/distributed storage gives less than impressive results when simulated this way. The reason is that the simulation is far too simplistic and does nothing to show the many advantages of cloud style storage which excel at handling multiple distinct workload streams. Cloud storage works very well when we simulate the individual workloads and present them just as they would in the real world. The problem is that this is very difficult for end-users to setup. Hence the need for a new approach.
Example of workload breakdown on a typical enterprise SAN.
The workload that hits the SAN is likely serving many hosts. My experience talking to enterprise architects is that the median SAN is supporting 10-20 physical hosts, with anywhere between 10-50 server VMs per host.
My own experience benchmarking SQL Server shows that in broad terms the IOPs needed to support a SQL-Server instance are relatively modest compared to the IOP capacity of modern SSD media. But these DB VM’s are by far the most IO hungry and most critical to the business.
|Size of SQL DB||IOPS Required|
|4 vCPU||5,000 – 10,000|
|6 vCPU||10,000 – 15,000|
|8 vCPU||15,000 – 20,000|
|12 vCPU||20,000 – 30,000|
General Server Virt VMs
For the “General Virt” workloads – i.e all the servers that are not specifically database ervers the pattern I work to is around 500 IOPS per server vm.
With 500 IOPS per VM, 10 VMs per host and 10 Hosts per SAN array we get 50,000 IOPS. Which is what we measured in the fictional example seen at the storage array.
Public Cloud storage comparison (AWS – EBS)
We know that cloud storage works well enough for many customers despite the typical SSD backed volumes offering modest IO rates.
AWS GP3 “General Purpose SSD” (https://aws.amazon.com/ebs/features/)
“Lowest cost SSD volume that balances price performance for a wide variety of transactional workloads”
These volumes support up to 16,000 IOPS per volume – so right off the bat these volumes will not support the 50,000 IOPS observed at the SAN. This single volume is not meant to support hundreds of VMs. In order to see what sort/how much cloud or HCI storage we really need – the IO simulation has to be more realistic
Server IO Patterns
SAN’s serving multiple hosts with multiple VMs receive what looks like a steady stream of traffic due to the dilution/aggregation of the individual workloads.
Within the actual VM’s the IO workload is almost always bursty, especially for traditional server workloads. This is due to two main factors – the incoming work is somewhat bursty (the application end-user is often a human), and the writes are normally batched by a filesystem (EXT/XFS/ZFS for Linux NTFS for Windows).
Database IO Patterns
Interestingly database workloads have a similar pattern even though they bypass the filesystem. In database workloads the bursts come from the fact that ‘write’ transactions (creation & updates) are written to a log file – until the log fills at which point the updated data is flushed to the main data files. The intensity is higher, and the difference in IO rate between “background writes” and “burst writes” is quite large.
Simulating these patterns with existing benchmarks.
It turns out that none of the widely used IO generators (vdbench, fio, IOmeter) do burst patterns at all well – even though it’s the predominant pattern of enterprise workloads. One IO generator did valiantly try to change the game but never found much traction, probably because it felt to complicated for most casual users (file bench: https://github.com/filebench/filebench/wiki)
In the mixed workload simulator I use X-ray to deploy as many VMs as I need. On each deployed VM the workload very simply iterates over a series of calls to fio to emulate a “background” and a “burst” cycle. With some randomization and a range of Small, Medium, Large and Extra-Large (Database) types – it is possible to simulate many different types of virtualized workloads.
Simulating this patterns with X-Ray
We can use X-ray to easily provision VMs with a variety of shapes and sizes (CPU count, Memory size, Disk count and size). For a given cluster we select the mix of VMs we want (Small, Medium, Large, Extra Large). Then we let X-ray setup all the VMs – and each VM does its own completely independent IO pattern complete with bursts across as many nodes as are available in the cluster.
Each VM type (Small, medium, large, Extra-Large) has its own constant base IO rate and a random component. Periodically there is a burst pattern which has a base rate and random component.
DBTYPE="X-Large" BURSTIOPS=$((10000 + RANDOM % 6000)) READRATE=12000 BACKGROUNDIOPS=$((500 + RANDOM % 10)) LOGRATE=$((400 + RANDOM % 500)) CPUFIXED=40 CPURAND=20
For instance the Extra Large VMs have a background write IO rate of 500-510 IOPS and a Burst write rate of 10,000 – 16,000 IOPS. The read rate is a constant 12,000 IOPS and a Log Write rate of 400-900 IOPS.
The total IOPS for an XL (Database) VM may be anything between (500+400+12,000 == 12,900 ) and (12,000+16,000+900 == 28,900)
I use the CPU-burn feature of fio to burn between 40%-60% of the allocated vCPU in the VM. This is particularly important in an HCI environment where user VMs may share the same CPU cores as the virtual storage controller.
Here are example traces of a small VM and XL workloads.
If we zoom in on the write IO rate of the XL VM we see the burstiness.
At the cluster level, the IO rate seems smooth and quite uniform, a steady 88,000 IOPS. But in reality it is built from numerous IO patterns.
Using the mixed workload simulator (https://github.com/garyjlittle/xray) you can easily and accurately simulate mixed server workloads with 10’s, 100’s or 1000’s of VMS all acting independently. By design the workload will use a realistic amount of storage IO which allows end-users to experiment with failure scenarios.