You are here. The art of HCI performance testing

At some point potential Hyper-converged infrastructure (HCI) users want to know – “How fast does this thing go?”.  The real question is “how do we measure that?”.

The simplest test is to run a single VM, with a single disk and issue a single IO at a time.  We see often see this sort of test in bake-offs, and such a test does answer an important question – “what’s the lowest possible response time I can expect from the storage”.

However, this test only gives a single data point.  Since nobody purchases a HCI cluster to run a single VM,  we also need to know what happens when multiple VMs are run at the same time.  This is a much more difficult test to conduct, and many end-users lack access and experience with tools that can give the full picture.

In the example below, the single VM, single vdisk, single IO result is at the very far left of the chart.  Since it’s impossible to read I will tell you that the result is about 2,500 IOPS at ~400 microseconds.  (in fact we know that if the IOPS are 2,500 the response time MUST be 400 microseconds 1/2,500 == .0004 seconds)

However with a single VM, the cluster is mostly idle, and has capacity to do much more work.   In this X-Ray test I add another worker VM doing the exact same workload pattern to every node in the cluster every 5 minutes.

By the time we reach the end of the test, the total IOPS have increased to around 600,000 and the response time only increased by an additional 400 microseconds.

In other words the cluster was able to achieve 240X the amount of work measured by the single VM on a single node with only a 2X increase in response time, which is still less than 1ms.


The overall result is counter-intuitive, because the rate of change in IOPS (240X) is way out of line with the increase in response time (2X).  The single VM test is using only a fraction of the cluster capacity.

When comparing  HCI clusters to traditional storage arrays – you should expect the traditional array to outperform the cluster at the far left of the chart, but as work scales up the latent capacity of the HCI cluster is able to provide huge amounts of IO with very low response times.

You can run this test yourself by adding this custom workload to X-Ray

Author: gary

Performance hacker @ nutanix.com