Nutanix AOS 5.10 ships with a feature called Autonomous Extent Store (AES). AES effectively provides Metadata Locality to complement the existing data locality that has always existed. For large datasets (e.g. a 10TB database with 20% hot data) we observe a 2X improvement in throughput for random access across the 2TB hot dataset.
In our experiment we deliberately size the active working-set to NOT fit into the metadata cache. We uniformly access 2TB with a 100% random access pattern and record the time to access all 2TB. On the same hardware with AES enabled – the time is cut in half. As can be seen in the chart – the throughput is double, as expected.
It is the localization of metadata from AES that contributes to the 2X improvement. AES keeps most of the metadata local to the node – so there is no need to fetch data across-the-wire. Additionally AES reduces the need to cache metadata in DRAM since local access is so fast. For very large datasets, retrieving metadata can contribute a large proportion of the access time. This is true for all storage, so speeding up metadata resolution can make a dramatic improvement to overall throughput as we demonstrate.
During .Next 2018 in London, Nutanix announced performance improvements in the core-datapath said to give up to 2X performance improvements. Here’s a real-world example of that improvement in practice.
I am using X-Ray to simulate a 1TB data restore into an existing database. Specifically the IO sizes are large, an even split of 64K,128K,256K, 1MB and the pattern is 100% random across the entire 1TB dataset.
Normally storage benchmarks using large IO sizes are performed serially, because it’s easier on the storage back-end. That may be realistic for an initial load, but in this case we want to simulate a restore where the pattern is 100% random.
In this case the time to ingest 1TB drops by half when using Nutanix AOS 5.10 with Autonomous Extent Store (AES) enabled Vs the previous traditional extent store.
This improvement is possible because with AES, inserting directly into the extent store is much faster.
For throughput sensitive, random workloads, AES can detect that it will be faster to skip the oplog. Skipping oplog allows AES to eliminate a network round trip to a remote oplog – and instead only make an RF2 copy for the Extent Store. By contrast, when sustained, large random IO is funneled into oplog, the 10Gbit network can become the bottleneck. Even with faster networks, AES will still be a benefit because the CPU and SSD resource usage is also lower. Unfortunately I only have 10Gbit networking in my lab!
The X-Ray files needed to run this test are on github
In a previous post I showed a chart which plots concurrency [X-axis] against throughput (IOPS) on the Y-Axis. Here is that plot again:
Experienced performance chart ogglers will notice the familiar pattern of Littles Law, whereby throughput (X) rises quickly as concurrency (N) is increased. As we follow the chart to the right, the slope flattens out and we achieve a lower increase in throughput, even as we increase concurrency by the same amount at each stage. The flattening of the curve is best understood as Amdahls Law.
Anyone who follows Dr. Neil Gunther and his Universal Scalability Law, will also recognize this curve.
The USL states that taking the values of concurrency and throughput as inputs, we can in fact calculate the scalability of the system. Specifically we are able to calculate the key factors of contention and crosstalk – which limit absolute linear scalability and eventually result in less throughput as additional load is submitted even as the capacity of the system is saturated.
Using his Excel spreadsheet, I was able to input the numbers from my test and derive values that determine scalability.
Taking the largest number (0.074%) the “contention value” (i.e the impact we expect due to Amdahls law) as the limit to linear scaling – we can say that for this particular cluster, running this particular (simplistic/synthetic) workload the Nutanix cluster scales 99.926% linear. Although I did not crank up the concurrency beyond 576, the model shows us that this cluster will start to degrade performance if we try to push concurrency beyond 600 or so. Again, the USL model is for this particular workload – on this particular cluster. Doubling the concurrency of the offered load to 1200 will only net us 500,000 IOPS according to the model.
The high linearity (99.926%) is expected. The workload is 100% read, and with the data-locality feature of Nutanix filesystem – we expect close to 100% scalability.
We will return to these measures of scalability in the future to look at more realistic workloads.