For this experiment I am using Postgres v11 on Linux 3.10 kernel. The goal was to see what gains can be made from using hugepages. I use the “built in” benchmark pgbench to run a simple set of queries.
Since I am interested in only the gains from hugepages I chose to use the “-S” parameter to pgbench which means perform only the “select” statements. Obviously this masks any costs that might be seen when dirtying hugepages – but it kept the experiment from having to be concerned with writing to the filesystem.
The workstation has 32GB of memory Postgres is given 16GB of memory using the parameter
pgbench creates a ~7.4gb database using a scale-factor of 500
What space savings should you expect when running databases with default compression in a Nutanix cluster? When we ran the TPCx-HCI benchmark on our cluster we realized about 2:1 savings from compression alone. The TPCx-HCI benchmark mimics a database consolidation setup, meaning that there are many databases per host. The uncompressed data size was about 45TB.
Additionally, we configured data at rest encryption (DARE). Using the cluster features allows us to both compress and encrypt (compression first, then encrypt). If the database engine itself handled encryption, it would reduce the ability to compress.
Like ZFS, the Nutanix filesystem uses LZ4 compressiom and 2:1 is about in-line with expectations for a realistic dataset. The TPCx-HCI benchmark uses the E-Gen data generation tool to populate the databases. E-Gen was developed for the TPC-E benchmark and uses sources such as census data and NYSE stock listings to generate real data rather than machine generated strings.
Nutanix AOS 5.10 ships with a feature called Autonomous Extent Store (AES). AES effectively provides Metadata Locality to complement the existing data locality that has always existed. For large datasets (e.g. a 10TB database with 20% hot data) we observe a 2X improvement in throughput for random access across the 2TB hot dataset.
In our experiment we deliberately size the active working-set to NOT fit into the metadata cache. We uniformly access 2TB with a 100% random access pattern and record the time to access all 2TB. On the same hardware with AES enabled – the time is cut in half. As can be seen in the chart – the throughput is double, as expected.
It is the localization of metadata from AES that contributes to the 2X improvement. AES keeps most of the metadata local to the node – so there is no need to fetch data across-the-wire. Additionally AES reduces the need to cache metadata in DRAM since local access is so fast. For very large datasets, retrieving metadata can contribute a large proportion of the access time. This is true for all storage, so speeding up metadata resolution can make a dramatic improvement to overall throughput as we demonstrate.