Storage Bus Speeds 2018

Storage bus speeds with example storage endpoints.

Bus Lanes End-Point Theoretical Bandwidth (MB/s) Note
SAS-3 1 HBA <-> Single SATA Drive 600 SAS3<->SATA 6Gbit
SAS-3 1 HBA <-> Single SAS Drive 1200 SAS3<->SAS3 12Gbit
SAS-3 4 HBA <-> SAS/SATA Fanout 4800 4 Lane HBA to Breakout (6 SSD)[2]
SAS-3 8 HBA <-> SAS/SATA Fanout 8400 8 Lane HBA to Breakout (12 SSD)[1]
PCIe-3 1 N/A 1000 Single Lane PCIe3
PCIe-3 4 PCIe <-> SAS HBA or NVMe 4000 Enough for Single NVMe
PCIe-3 8 PICe <-> SAS HBA or NVMe 8000 Enough for SAS-3 4 Lanes
PCIe-3 40 PCIe Bus <-> Processor Socket 40000 Xeon Direct conect to PCIe Bus

 

Notes


All figures here are the theoretical maximums for the busses using rough/easy calculations for bits/s<->bytes/s. Enough to figure out where the throughput bottlenecks are likely to be in a storage system.

  1. SATA devices contain a single SAS/SATA port (connection), and even when they are connected to a SAS3 HBA, the SATA protocol limits each SSD device to ~600MB/s (single port, 6Gbit)
  2. SAS devices may be dual ported (two connections to the device from the HBA(s)) – each with a 12Gbit connection giving a potential bandwidth of 2x12Gbit == 2.4Gbyte/s (roughly) per SSD device.
  3. An NVMe device directly attached to the PCIe bus has access to a bandwidth of 4GB/s by using 4 PCIe lanes – or 8GB/s using 8 PCIe lanes.  On current Xeon processors, a single socket attaches to 40 PCIe lanes directly (see diagram below) for a total bandwidth of 40GB/s per socket.

  • I first started down the road of finally coming to grips with all the different busses and lane types after reading this excellent LSI paper.  I omitted the SAS-2 figures from this article since modern systems use SAS-3 exclusively.

[pdf-embedder url=”https://www.n0derunner.com/wp-content/uploads/2018/04/LSI-SAS-PCI-Bottlenecks.pdf” title=”LSI SAS PCI Bottlenecks”]

 

Intel Processor & PCI connections



SATA on Nutanix. Some experimental data.

The question of  why  Nutanix uses SATA drive comes up sometimes, especially from customers who have experienced very poor performance using SATA on traditional arrays.

I can understand this anxiety.  In my time at NetApp we exclusively used SAS or FC-AL drives in performance test work.  At the time there was a huge difference in performance between SCSI and SATA.  Even a few short years ago, FC typically spun at 15K RPM whereas SATA was stuck at about a 5K RPM, so experiencing 3X the rotational delay.

These days SAS and SATA are both available in 7200 RPM configurations, and these are the type we use in standard Nutanix nodes.  In fact the SATA drives that we use are marketed by Seagate as “Nearline SAS”  or NL-SAS.   Mainly to differentiate them from the consumer grade SATA drives that are found in cheap laptops.  There are hundreds of SAS Vs SATA articles on the web, so I won’t go over the theoretical/historical arguments.

SATA in Hybrid/Tiered Storage

In a Nutanix cluster the “heavy lifting” of IO is mainly done by the SSD’s – leaving the SATA drives to service the few remaining IO’s that miss the SSD tier.  Under moderate load, the SATA spindles do pretty well, and since the SATA  $/GB is only 60% of SAS.  SATA seems like a good choice for mostly-cold data.

Let’s Experiment.

From a performance perspective,  I decided to run a few experiments to see just how well SATA performs.  In the test, the  SATA drives are Nutanix standard drives “ST91000640NS” (Seagate, priced around $150).  The comparable SAS drives are the same form-factor (2.5 Inch)  “AL13SEB900” (Toshiba, priced at about $250 USD).  These drives spin at 10K RPM.  Both drives hold around 1TB.

There are three experiments per drive type to reveal the impact of seek-times.  This is achieved using the “filesize” parameter of fio – which determines the LBA range to read.  One thing to note, is that I use a queue-depth of one.  Therefore IOPs can be calculated as simply 1/Response-Time (converted to seconds).

[global]
bs=8k
rw=randread 
iodepth=1 
ioengine=libaio 
time_based 
runtime=10 
direct=1 
filesize=1g 

[randread]
filename=/dev/sdf1 

Random Distribution. SATA Vs SAS

Working Set Size7.2K RPM SATA Response Time (ms)10K RPM SAS Response Time (ms)
1 GB5.54
100 GB7.54.5
1000 GB12.57

Zipf Distribution. SATA Only.

Working Set SizeResponse Time (ms)
1000G8.5

Somewhat intuitively as the working-set (seek) gets larger, the difference between “Real SAS” and “NL-SAS/SATA” gets wider.  This is intuitive because with a 1GB working-set,  the seek-time is close to zero, and so only the rotational delay (based on RPM) is a factor.  In fact the difference in response time is the same as the difference in rotational speed (1:1.3).

Also  (just for fun) I used the “random_distribution=zipf” function in fio to test the response time when reading across the entire range of the disk – but with a “hotspot” (zipf) rather than a uniform random read – which is pretty unrealistic.

In the “realistic” case – reading across the entire disk on the SATA drives shipped with Nutanix nodes is capable of 8.5 ms response time at 125 IOPS per spindle.

 Conclusion

The performance difference between SAS and SATA is often over-stated.  At moderate loads SATA performs well enough for most use-cases.  Even when delivering fully random IO over the entirety of the disk – SATA can deliver 8K in less than 15ms.  Using a more realistic (not 100% random) access pattern the response time is  < 10ms.

For a properly sized Nutanix implementation, the intent is to service most IO from Flash. It’s OK to generate some work on HDD from time-to-time even on SATA.