Working with fio “distribution /pereto” parameter

The fio Pareto parameter allows us to create a workload, which references a very large dataset, but specify a hotspot for the access pattern.  Here’s an example using the same setup as the ILM experiment, but using a Pareto value of 0:8.  My fio file looks like this..

[global]
ioengine=libaio
direct=1
time_based
norandommap
random_distribution=pareto:0.8
The experiment shows that with the access pattern as a Pareto ratio 0:8, meaning 20% of the overall dataset is “hot” the ILM process happens much faster as the hotspot is smaller, and is identified faster than a 100% uniform random access pattern.  We would expect a similar shape for any sort of caching mechanism.

The return of misaligned IO

We have started seeing misaligned partitions on Linux guests runnning certain HDFS distributions.  How these partitions became mis-aligned is a bit of a mystery, because the only way I know how to do this on Linux is to create a partition using old DOS format like this (using -c=dos  and -u=cylinders)  Continue reading “The return of misaligned IO”