HCI Performance testing made easy (Part 3)

Creating a HCI benchmark to simulate multi-tennent workloads



HCI deployments are typically multi-tennant and often different nodes will support different types of workloads. It is very common to have large resource-hungry databases separated across nodes using anti-affinity rules.  As with traditional storage, applications are writing to a shared storage environment which is necessary to support VM movement.  It is the shared storage that often causes performance issues for data bases which are otherwise separated across nodes.  We call this the noisy neighbor problem.  A particular problem occurs when a reporting / analytical workload shares storage with a transactional workload.

In such a case we have a Bandwidth heavy workload profile (reporting) sharing with a Latency Sensitive workload (transactional)

In the past it has been difficult to measure the noisy neighbor impact without going to the trouble of configuring the entire DB stack, and finding some way to drive it.  However in X-Ray we can do exactly this sort of workload.  We supply a pre-configured scenario which we call the  DB Colocation test.

The DB Colocation test utilizes two properties of X-Ray not found in other benchmarking tools

  • Time based benchmark actions
  • Distinct per-VM workload patterns
  • Ability to provision particular workloads, to particular hosts

In our example scenario X-Ray begins by starting a workload modeled after a transactional DB (we call this the OLTP workload) on one of the nodes.  This workload runs for 60 minutes.  Then after 30 minutes X-Ray starts workloads modeled after reporting/analytical workloads on two other nodes (we call this the DSS workload).

After 30 minutes we have three independent workloads running on three independent nodes, but sharing the same storage.  The key thing to observe is the impact on the latency sensitive (OLTP) workload.  In this experiment it is the DSS workloads which are the noisy neighbor, since they will tend to utilize a lot of the storage bandwidth.  An ideal result is one where there is very little interference with the running OLTP workload, even though we expect latency to increase.  We can compare the impact on the OLTP workload by comparing the IOPS/response time during the first 30 minutes (no interference) with the remaining 60 minutes (after the DSS workloads are started).  We should expect to see some increase in response time from the OLTP application because the other nodes in the cluster have gone from idle to under-load.  The key thing to observe is whether the OLTP IOP target rate (4,000 IOPS) is achieved when the reporting workload is applied.


X-Ray Scenario configuration

We specify the timing rules and workloads in the test.yml file.  You can modify this to contain whichever values suit your model.  I covered editing an existing workload in Part 1.

The overall scenario begins with the OLTP workload, which will run for 3600 seconds (1 hour).  The stagger_secs value is used if there are multiple OLTP sub-workloads.  In the simple case we do use a single OLTP workload.

The scenario pauses for 1800 seconds using the test.wait specification then immediately starts the DSS workload

Finally the scenario uses the workload.Wait specification to wait for the OLTP workload to finish (approx 1 hour) before the test is deemed completed.

X-Ray Workload specification

The DB Co-Location test uses two workload profiles that aim to simulate transactional (OLTP) and reporting/analytical (DSS) workloads.  The specifications for those workloads are contained in the two .fio files (oltp.fio and dss.fio)


The OLTP workload (oltp.fio) that we ship as  has the following characteristcs based on typical configurations that we see in the field (of course you can change these to whatever you like).

  • Target IOP rate of 4,000 IOPS
  • 4 “Data” Disks
    • 50/50 read/write ratio.
    • 90% 8KB, 10% 32KB bloc-ksize
    • 8 outstanding IO per disk
  • 2 “Log” Disks
    • 100% write
    • 90% sequential
    • 32k block-size
    • 1 outstanding IO per disk

The idea here is to simulate the two main storage workloads of a DB.  The “data” portion and the “log” portion.  Log writes are just used to commit transactions and so are 100% write.  The only time the logs are read is during DB recovery, which is not part of this scenario.  The “Data” disks are doing both reads (from DB cache misses) and writes committed transactions.  A 50/50 read/write mix might be considered too write intensive – but we wanted to stress the storage in this scenario.


The DSS workload is configured to have the following characteristics

  • Target IOP rate of 1400 IOPS
  • 4 “Data” Disks
    • 100% Read workload with 1MB blocksize
    • 10 Outstanding IOs
  • 2 “Log” Disk
    • 100% Write workload
    • 90% sequential
    •  32K block-size
    • 1 outstanding IO per disk

The idea here is to simulate a large database doing a lot of reads across a large workingset size.  The IO to the data disks is entirely read, and uses large blocks to simulate a database scanning a lot of records.  The “Log” disks have a very light workload, purely to simulate an active database which is probably updating a few tables used for housekeeping.



HCI Performance testing made easy (Part 2)

Screenshot of results in X-Ray

Today we will use the simplest workload that X-Ray provides, the “Four Corners” Benchmark.  This is the classic storage benchmark of Random Read/Write and Sequential Read/Write.  Most people understand that this workload tells us very little about how the storage will behave under real workloads, but most people also want to know “How Fast” will the storage go.

Here’s a video of the same process :

First select the Four Corners Microbenchmark” from the test list.  The “For Corners” test is supplied with X-Ray.  Of course you can edit the parameters if you wish.

Then select the target cluster to run the test upon, and add to the test queue for execution.

The results will update in realtime.  X-Ray first creates the test VM’s and powers them on…

If I want to compare different runs then X-Ray has the “Analyze” button.  In my case I am using an engineering build of the product and comparing the same platform with different tuning.  The compare/analyze can be useful for comparing different platforms, hypervisors or HCI vendors since X-Ray can run pretty-much on anything that presents a data store to vCenter as well as Nutanix AOS/Prism.

This result would seem to show that the tuning performed in experiment #2 had a large improvement in Random Write IOPS and did not negatively effect the other results (Random Read, Sequential Read, and Sequential Write).

I can also look at the particular parameters of this test, by selection the Actions->Test Logs

For instance I can look at the Random Read parameters (these are standard fio configuration files)

I can also look at the overall “Four Corners” test configuration which is specified as YAML


HCI Performance testing made easy (Part 1)

In this short series I will describe how to perform performance an resiliency tests on a HCI cluster using X-ray.

X-Ray can do the following for the performance tester.

  • Model IO workloads using standard fio format
  • Create VMs based on user-specified criteria (CPUs, Memory, Number & Size of disks)
  • Provision the VMs  to a HCI cluster (Nutanix AHV, ESXi, Hyper-V)
  • Execute the workloads
  • Display and store the results

In particular X-Ray give the additional benefits that most workload generators do not

  • Specify and deploy workloads with different IO patterns and characteristics
    • Most workload generators create a uniform workload on all workers
  • Execute and terminate sub-workloads on a user-specified timeline
    • e.g. Begin workload 1 then introduce workload 2 and measure the interference
  • Introduce failure scenarios and measure the impact to performance

Here’s a video of X-Ray in action, I export an existing X-Ray test, edit it to create a new test, upload and execute the test.


The files are in my X-ray GitHub