How to measure database scaling & density on Nutanix HCI platform.
By: Date: March 27, 2020 Categories: Nutanix Tags: , , ,

Perfect Scaling Vs Achieved scaling.

So what does the above chart tell us about scaling on the cluster? Are the results, good bad or indifferent? To understand how well the system was able to scale – i.e. continue to give the same performance for each new database added to the system – we compare the measured result to what we would expect in a “perfectly scaling platform”.

The column headed “perfect scaling” takes the measured value for a single database and multiplies that value by the number of databases running. By doing so we can see that the measured value and “perfect” value diverge as the workload begins to saturate the system. This is expected and is mostly due to the effects explained in the Universal Scalability Law (USL)

At maximum load (all 40 cores busy) we see that the measured value is 2194 compared to a “perfect” score of 2560. So we can say that our system scaled to >85% of the absolute theoretical maximum. Which is pretty good.

DB CountTotal CoresFraction of total coresMeasured
Transactions/s (K)
Perfect Scaling
Transactions/s (K)
1DB40.0256464
4DB160.1256256
8DB320.2540512
16DB640.410451024
24DB960.615401536
32DB1280.819572048
40DB160121942560
Measured performance Vs Perfect Scaling

Storage CVM overhead measurement

As we scale the workload, we start to see CPU pressure impact the ability to perfectly scale. Some of this is well understood from the USL in our case – even though the DB workload itself completely independent – the workloads themselves are running on the shared CPU and under a shared hypervisor/scheduler.

We do still need to account for the CVM – which continues to run and provide services.

To try and calculate the overhead I chose to take the values at 80% of saturation. Anything lower than that, and the measured value is equal to the perfect value – showing that there is no resource contention. Past the 80% mark we are clearly in to diminishing returns due to the shared resources having to manage contention and coherency.

At 80% saturation the difference between “perfect” and “measured is about 5% measured (1957) / perfect (2048) = 0.95

So with 160 cores – we expect the CVM to be consuming 5% of 160 cores which is 8 cores across the cluster, or 2 cores per host.

However, the CVM in our setup is configured to use 16 vCPU.

CVM 12% of 16 vCPU – Database 98% of 4 vCPU

The view from Prism confirms that although the CVM is configured with 16 vCPU – it is only using (on average) 12% of that allocation, which is around 2vCPU/cores

Conclusion

  • Scaling is linear (perfect) up to 80% of total CPU capacity, which is expected mostly due to effects explained by the USL
  • Actual CVM utilization is around 2 vCPU – in this example which is mostly a CPU bound workload
  • The AHV Hypervisor is doing a nice job of prioritizing the database workloads over the CVM

Leave a Comment