When it comes to assessing fitness of purpose, even audited benchmarks are quite useless unless they incorporate failure testing alongside the load test. I helped develop the TPCx-HCI benchmark which mandates the simulation of a node failure.
A downtime classic, for several months in 2013 the troubles of a very particular website were front page news across the US. Full Story from Time Magazine (PDF)
When benchmarking filesystems or storage, we need to understand the caching effects. Most often this involves filling the cache and reaching steady state. But how long will it take to fill a cache of a given size? The answer depends of course on the size of the cache, the IO size and the IO rate. So, to simpify let’s just say that a cache consists of some number of entries. For instance a 4GB cache would have 1 million 4KB entries. In my example this is simply a 1M entry cache.
In terms of time to fill the cache, it’s simpler to think about how many entries will need to be read before the cache is filled.
For a random workload, it will be more than 1M “reads”. Let’s see why.
The first read will be inserted into the cache, the second read will probably be inserted into the cache, but there is a small (1/1000000) chance that the second read will actually be already in the cache since it’s random. As the cache gets fuller – the chances of a given read already being present in cache increases. As a result it will take a lot more than 1 million reads to populate the entire cache with a random read workload.
The question is this. Is is possible to predict, how many “reads” it will take to fill the cahe?
In this experiment, we create an array to represent the cache. It has 1M entries. Then using a random number generator, simulate the workload and measure how long it takes to populate the cahche.
After 1,000,000 “reads” there are 633,000 positive entries (entries that have data in them). So what happened to the other 367,000? The 367,000 represent cache “hits” on an existing entry. Since the read “workload” is 100% random, there is some chance that a subsequent read will be for an entry that is already cached. Over the life of 1,000,000 reads around 37% are for an entry that is already cached.
After 2,0000,000 reads the cache contains 864,000 entries. Another 1,000,000 reads yields 950,000.
The fuller that the cache becomes, the fewer new entries are added. Intuitively this makes sense because as the cache becomes more full, more of the “random reads” are satisfied by an existing cache entry.
In my experiments it takes about 17,000,000 “reads” to ensure that every cache entry is filled in a 1M entry cache. Here are the data for 19 runs.
|Iteration||Positive Entries||Empty Slots||1||631998||368002||2||864334||135666||3||950184||49816||4||981630||18370||5||993266||6734||6||997577||2423||7||999080||920||8||999660||340||9||999879||121||10||999951||49||11||999985||15||12||999996||4||13||999998||2||14||999998||2||15||999999||1||16||999999||1||17||1000000||0||18||1000000||0|
Interestingly, the ratio of positive to empty entries after one iteration is always about 0.632:0.368