A 2007 paper, that still has lots to say on the subject of benchmarking storage and filesystems. Primarily aimed at researchers and developers, but is relevant to anyone about to embark on a benchmarking effort.
Understand what you are testing, cached results are fine – as long as that is what you had intended.
The authors are clear on why benchmarks remain important:
“Ideally, users could test performance in their own settings using real work- loads. This transfers the responsibility of benchmarking from author to user. However, this is usually impractical because testing multiple systems is time consuming, especially in that exposing the system to real workloads implies learning how to configure the system properly, possibly migrating data and other settings to the new systems, as well as dealing with their respective bugs.”
We cannot expect end-usersto be experts in benchmarking. It is out duty as experts to provide the tools (benchmarks) that enable users to make purchasing decisions without requiring years of benchmarking expertise.
For this experiment I am using Postgres v11 on Linux 3.10 kernel. The goal was to see what gains can be made from using hugepages. I use the “built in” benchmark pgbench to run a simple set of queries.
Since I am interested in only the gains from hugepages I chose to use the “-S” parameter to pgbench which means perform only the “select” statements. Obviously this masks any costs that might be seen when dirtying hugepages – but it kept the experiment from having to be concerned with writing to the filesystem.
The workstation has 32GB of memory Postgres is given 16GB of memory using the parameter
pgbench creates a ~7.4gb database using a scale-factor of 500
What space savings should you expect when running databases with default compression in a Nutanix cluster? When we ran the TPCx-HCI benchmark on our cluster we realized about 2:1 savings from compression alone. The TPCx-HCI benchmark mimics a database consolidation setup, meaning that there are many databases per host. The uncompressed data size was about 45TB.
Additionally, we configured data at rest encryption (DARE). Using the cluster features allows us to both compress and encrypt (compression first, then encrypt). If the database engine itself handled encryption, it would reduce the ability to compress.
Like ZFS, the Nutanix filesystem uses LZ4 compressiom and 2:1 is about in-line with expectations for a realistic dataset. The TPCx-HCI benchmark uses the E-Gen data generation tool to populate the databases. E-Gen was developed for the TPC-E benchmark and uses sources such as census data and NYSE stock listings to generate real data rather than machine generated strings.