Running the ML-Perf Storage benchmark on Nutanix files.

Some technical notes on our submission to the benchmark committee.

Background

For the past few months engineers from Nutanix have been participating in the MLPerftm Storage benchmark which is designed to measure the storage performance required for ML training workloads.  

We are pleased with how well our general-purpose file-server has delivered against the demands of this high throughput workload.  

Benchmark throughput and dataset
  • 125,000 files
  • 16.7 TB of data around 30% of the usable capacity (no “short stroking”)
  • Filesize 57-213MB per file
  • NFS v4 over Ethernet
  • 5GB/s delivered per compute node from single NFSv4 mountpoint
  • 25GB/s delivered across 5 compute nodes from single NFSv4 mountpoint

The dataset was 125,000 files consuming 16.7 TB, the file sizes ranged from 57 MB to 213 MB.  There is no temporal or spatial hotspot (meaning that the entire dataset is read) so there is no opportunity to cache the data in DRAM – the data is being accessed from NVMe flash media. For our benchmark submission we used standard NFSv4, standard Ethernet using the same Nutanix file-serving software that already powers everything from VDI users home-directories, medical images and more. No infiniband or special purpose parallel filesystems were harmed in this benchmark submission.

Continue reading