When it comes to assessing fitness of purpose, even audited benchmarks are quite useless unless they incorporate failure testing alongside the load test. I helped develop the TPCx-HCI benchmark which mandates the simulation of a node failure.
We have started seeing misaligned partitions on Linux guests runnning certain HDFS distributions. How these partitions became mis-aligned is a bit of a mystery, because the only way I know how to do this on Linux is to create a partition using old DOS format like this (using -c=dos and -u=cylinders) Continue reading
One way of categorizing Hyperconverged filesystems (or any filesystem really) is by how data is distributed across the nodes, and the method used to track/retrieve that data. The following is based on knowledge of the internals of Nutanix and publicly available information for the other systems.
|Distributed||Distributed data & metadata||Nutanix||Hash||Random data distribution, hash-lookup (object store)||VSAN||Dedupe||Data stored in HA-Pairs, Lookup by fingerprint||Simplivity||Dedupe||Random data distribution, Lookup by fingerprint||Springpath/Hyperflex||Psuedo Distributed||Data stored in HA pairs, Unified namespace via redirection||NetApp C-Mode|
Today I used fio to create some compressible data to test on my Nutanix nodes. I ended up using the following fio params to get what I wanted.
buffer_compress_percentage=50 refill_buffers buffer_pattern=0xdeadbeef
Much of this is well explained in the README for latest version of fio.
Also NOTE Older versions of fio do not support many of the fancy data creation flags, but will not alert you to the fact that fio is ignoring them. I spent quite a bit of time wondering why my data was not compressed, until I downloaded and compiled the latest fio.
The question of why Nutanix uses SATA drive comes up sometimes, especially from customers who have experienced very poor performance using SATA on traditional arrays.
I can understand this anxiety. In my time at NetApp we exclusively used SAS or FC-AL drives in performance test work. At the time there was a huge difference in performance between SCSI and SATA. Even a few short years ago, FC typically spun at 15K RPM whereas SATA was stuck at about a 5K RPM, so experiencing 3X the rotational delay.
These days SAS and SATA are both available in 7200 RPM configurations, and these are the type we use in standard Nutanix nodes. In fact the SATA drives that we use are marketed by Seagate as “Nearline SAS” or NL-SAS. Mainly to differentiate them from the consumer grade SATA drives that are found in cheap laptops. There are hundreds of SAS Vs SATA articles on the web, so I won’t go over the theoretical/historical arguments.
In a Nutanix cluster the “heavy lifting” of IO is mainly done by the SSD’s – leaving the SATA drives to service the few remaining IO’s that miss the SSD tier. Under moderate load, the SATA spindles do pretty well, and since the SATA $/GB is only 60% of SAS. SATA seems like a good choice for mostly-cold data.
From a performance perspective, I decided to run a few experiments to see just how well SATA performs. In the test, the SATA drives are Nutanix standard drives “ST91000640NS” (Seagate, priced around $150). The comparable SAS drives are the same form-factor (2.5 Inch) “AL13SEB900” (Toshiba, priced at about $250 USD). These drives spin at 10K RPM. Both drives hold around 1TB.
There are three experiments per drive type to reveal the impact of seek-times. This is achieved using the “filesize” parameter of fio – which determines the LBA range to read. One thing to note, is that I use a queue-depth of one. Therefore IOPs can be calculated as simply 1/Response-Time (converted to seconds).
[global] bs=8k [randread] rw=randread iodepth=1 ioengine=libaio time_based runtime=10 direct=1 filesize=1g filename=/dev/sdf1
|Working Set Size||7.2K RPM SATA Response Time (ms)||10K RPM SAS Response Time (ms)|
|Working Set Size||Response Time (ms)|
Somewhat intuitively as the workingset (seek) gets larger, the difference between “Real SAS” and “NL-SAS/SATA” gets wider. This is intuitive because with a 1G working-set, the seek-time is close to zero, and so only the rotational delay (based on RPM) is a factor. In fact the difference in response time is the same as the difference in rotational speed (1:1.3).
Also (just for fun) I used the “random_distribution=zipf” function in fio to test the response time when reading across the entire range of the disk – but with a “hotspot” (zipf) rather than a uniform random read – which is pretty unrealistic.
In the “realistic” case – reading across the entire disk on the SATA drives shipped with Nutanix nodes is capable of 8.5 ms response time at 125 IOPS per spindle.
The performance difference between SAS and SATA is often over-stated. At moderate loads SATA performs well enough for most use-cases. Even when delivering fully random IO over the entirety of the disk – SATA can deliver 8K in less than 15ms. Using a more realistic (not 100% random) access pattern the response time is < 10ms.
For a properly sized Nutanix implementation, the intent is to service most IO from Flash. It’s OK to generate some work on HDD from time-to-time even on SATA.