Working with fio “distribution /pereto” parameter

The fio Pareto parameter allows us to create a workload, which references a very large dataset, but specify a hotspot for the access pattern.  Here’s an example using the same setup as the ILM experiment, but using a Pareto value of 0:8.  My fio file looks like this..

[global]
ioengine=libaio
direct=1
time_based
norandommap
random_distribution=pareto:0.8
The experiment shows that with the access pattern as a Pareto ratio 0:8, meaning 20% of the overall dataset is “hot” the ILM process happens much faster as the hotspot is smaller, and is identified faster than a 100% uniform random access pattern.  We would expect a similar shape for any sort of caching mechanism.