How to use the “jobs” and “clients” parameters in pgbench without going crazy.Continue reading
Scale factor to workingset size lookup for tiny databasesContinue reading
How can database density be measured?
- How does database performance behave as more DBs are consolidated?
- What impact does running the CVM have on available host resources?
- The cluster was able to achieve ~90% of the theoretical maximum.
- CVM overhead was 5% for this workload.
An X-ray workload for measuring application densityContinue reading
Do database workloads benefit from data locality?Continue reading
- One is taking transaction log writes.
- The other is doing reads and writes from the main datafiles.
Since the database size is small (50% the size of the Linux RAM) – the data is mostly cached inside the guest, and so most reads do not hit storage. As a result we only see writes going to the DB files.
Additionally, we see that database datafile writes the arrive in a bursty fashion, and that these write bursts are more intense (~10x) than the log file writes.
Despite the database flushes ocurring in bursts with a decent amount of concurrency the Nutanix CVM provides an average of 1.5ms write response time.
From the Nutanix CVM port 2009 handler, we can access the individual vdisk statistics. In this particular case vDisk 45269 is the data file disk, and 40043 is the database transaction log disk.
The vdisk categorizer correctly identifies the database datafile write pattern as highly random.
As a result, the writes are passed into the replicated oplog
Meanwhile the log writes are categorized as mostly sequential, which is expected for a database log file workload.
Even though the log writes are sequential, they are low-concurrency and small size (looks like mostly 16K-32K). This write pattern is also a good candidate for oplog.
In this example we run pgbench with a scale factor of 1000 which equates to a database size of around 15GB. The linux VM has 32G RAM, so we don’t expect to see many reads.
Using prometheus with the Linux node exporter we can see the disk IO pattern from pgbench. As expected the write pattern to the log disk (sda) is quite constant, while the write pattern to the database files (sdb) is bursty.
I had to tune the parameter checkpoint_completion_target from 0.5 to 0.9 otherwise the SCSI stack became overwhelmed during checkpoints, and caused log-writes to stall.
In this example, we use Postgres and the pgbench workload generator to drive some load in a virtual machine. Assume a Linux virtual machine that has Postgres installed. Specifically using a Bitnami virtual appliance.
- Once the VM has been started, connect to the console
- Allow access to postgres port 5432 – which is the postgres DB port or allow ssh
$ sudo ufw allow 5432
- Note the postgres user password (cat ./bitnami_credentials)
- Login to psql from the console or ssh
psql -U postgres
- Optionally change password (the password prompted is the one from bitnami_credentials for the postgres database user).
psql -U postgres postgres=# alter user postgres with password 'NEW_PASSWORD'; postgresl=# \q
- Create a DB to run the pgbench workload. In this case I name the db pgbench-sf10 for “Scale Factor 10”. Scale Factors are how the size of the database is determined.
$ sudo -u postgres createdb pgbench-sf10
- Initialise the DB with data ready to run the benchmark. The “createdb” step just creates an empty schema.
- -i means “initialize”
- -s means “scale factor” e.g. 10
- pgbench-sf10 is the database schema to use. We use the one just created pgbench-sf10
$ sudo -u postgres pgbench -i -s 10 pgbench-sf10
- Noe run a workload against the DB schema called pgbench-sf10
$ sudo -u postgres pgbench pgbench-sf10
The workload pattern, and load on the system will vary greatly depending on the scale factor.
Scale-Factor Working Set Size
One of the nice things about using public cloud is the ability to use pre-canned application virtual appliances created by companies like Bitnami.
We can use these same appliance images on Nutanix AHV to easily do a Postgres database benchmark
Step 1. Get the bitnami image
Step 2. Unzip the file and convert the bitnami vmdk images to a single qcow2 file.
qemu-img convert *vmdk bitnami.qcow2
Put the bitnami.qcow2 image somewhere accessible to a browser, connected to the Prism service, then upload using the “Image Configuration”
Once the image is uploaded, it’s time to create a new VM based on that image
Once booted, you’ll see the bitnami logo and you can configure the bitnami passwords, enable ssh etc. using the console.
Enable/disable ssh in bitnami images
Connecting to Postgres in bitnami images
Note – when you “sudo -c postgres <some-psql-tool> the password it is asking for is the Postgres DB password (stored in ./bitnami-credentials) not any unix user password.
Once connected to the appliance we can use postgres and pgbench to generate simplistic database workload.
 Do this on a Linux box somewhere. For some reason the conversion failed using my qemu utilities installed via brew. Importing OVAs direct into AHV should be available in the future.
For this experiment I am using Postgres v11 on Linux 3.10 kernel. The goal was to see what gains can be made from using hugepages. I use the “built in” benchmark pgbench to run a simple set of queries.
Since I am interested in only the gains from hugepages I chose to use the “-S” parameter to pgbench which means perform only the “select” statements. Obviously this masks any costs that might be seen when dirtying hugepages – but it kept the experiment from having to be concerned with writing to the filesystem.
The workstation has 32GB of memory
Postgres is given 16GB of memory using the parameter
shared_buffers = 16384MB
pgbench creates a ~7.4gb database using a scale-factor of 500
pgbench -i -s 500
Run the experiment like this
$ pgbench -c 10 -S -T 600 -P 1 p gbench
Default : No hugepages :
tps = 62190.452850 (excluding connections establishing)
tps = 66864.410968 (excluding connections establishing)
+7.5% over default
tps = 69702.358303 (excluding connections establishing)
+12% over default
Getting the default hugepages is as easy as entering a value into /etc/sysctl.conf. To allow for 16GB of hugepages I used the value of 8400, followed by “sysctl -p”
[root@arches gary]# grep huge /etc/sysctl.conf
vm.nr_hugepages = 8400
[root@arches gary]# sysctl -p
To get 1GB hugepages, the kernel has to have it configured during boot e.g.
[root@arches boot]# grep CMDLINE /etc/default/grub
GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/swap vconsole.font=latarcyrheb-sun16 rd.lvm.lv=centos/root crashkernel=auto vconsole.keymap=us rhgb quiet rdblacklist=nouveau default_hugepagesz=1G hugepagesz=1G