Using cloud-init with AHV command line

TL;DR

  • Using cloud-init with AHV is conceptually identical to using KVM/QEMU- we need to use a few different tools with AHV
  • You will need a Linux image that is configured to use cloud-init. A good source is cloud-images.ubuntu.com
  • We will create a cloud-init textual file and create a mountable version using the cloud-localds tool on a Linux host
  • We will attach the cloud-init enabled ubuntu image and our cloud-init customization file to the VM at boot time
  • At boottime ubuntu will access the cloud-init data mounted as a CDROM and do the customization for us
Continue reading

Comparing RDS and Nutanix Cluster performance with HammerDB

tl;dr

In a recent experiment using Amazon RDS instance and a VM running in an on-prem Nutanix cluster, both using Skylake class processors with similar clock speeds and vCPU count. The SQLServer database on Nutanix delivered almost 2X the transaction rate as the same workload running on Amazon RDS.

It turns out that migrating an existing SQLServer VM to RDS using the same vCPU count as on-prem may yield only half the expected performance for CPU heavy database workloads. The root cause is how Amazon thinks about vCPU compared to on-prem.

Benchmark Results

HammerDB results from RDS and Nutanix
Continue reading

Single threaded DB performance on Nutanix HCI

tl;dr

A Nutanix cluster can persist a replicated write across two nodes in around 250 uSec which is critical for single-threaded DB write workloads. The performance compares very well with hosted cloud database instances using the same class of processor (db.r5.4xlarge in the figure below). The metrics below are for SQL insert transactions not the underlying IO.

Single threaded commit heavy insert rates. Latency as seen from SQL insert statement.
Continue reading

AHV Tip: Shutdown multiple VMs in parallel

Often in my lab I want to shutdown a large number of VMs quickly. In the example below I submit the power-off command for a maximum of 50 VMs in parallel. Be aware that we’re using the command line, and in line with true Unix philosophy the OS will assume we know what we are doing and obey us completely and immediately. In other words pasting the below commands to your CVM will immediately shutdown all powered on VMs.

 for i in $(acli  vm.list power_state=on | awk '{ print $(NF) }' |tail -50); do acli vm.off $i &  done

How to deploy Ubuntu cloud images to Nutanix AHV

In this example we use the KVM cloud image from the Canonical Ubuntu image repository. More information on Ubuntu cloud images is on the canonical cloud image page. More detail on the cloud image boot process and cloud-init here: Ubuntu UEC/Imanges.

We can use the Ubuntu cloud image catalog, and specifically use one that has been built to run on KVM. Since AHV is based on KVM/QEMU Nutanix can use that image format directly without any further conversion.

Using a cloud image can be a quicker way to stand up a particular version of Linux without having to go through the Linux installation process (choosing usernames, keyboard types, timezones etc.). However, you will need to pass in a public key so that you can login to the instance once it has booted.

Continue reading

Nutanix Performance for Database Workloads

We’ve come a long way, baby.

Full disclosure. I have worked for Nutanix in the performance engineering group since 2013. My opinions are likely biased, but that also gives me a decent amount of context when it comes to the performance of Nutanix storage over time. We already have a lot of customers running database workloads on Nutanix. But what about those high-performance databases still running on traditional storage?

I dug out a chart that I presented at .Next in 2017 and added to it the performance of a modern platform (AOS 6.0 and an NVME+SSD platform). For this random read microbenchmark performance has more than doubled. If you took a look at a HCI system even a few years back and decided that performance wasn’t where you needed it – there’s a good chance that the HW+SW systems shipping today could meet your needs.

Much more detail below.

Continue reading

How to run vdbench benchmark on any HCI with X-Ray

Many storage performance testers are familiar with vdbench, and wish to use it to test Hyper-Converged (HCI) performance. To accurately performance test HCI you need to deploy workloads on all HCI nodes. However, deploying multiple VMs and coordinating vdbench can be tricky, so with X-ray we provide an easy way to run vdbench at scale. Here’s how to do it.

Continue reading

View from Nutanix storage during Postgres DB benchmark

Following on from the previous [1] [2] experiments with Postgres & pgbench. A quick look at how the workload is seen from the Nutanix CVM.

The Linux VM running postgres has two virtual disks:

  • One is taking transaction log writes.
  • The other is doing reads and writes from the main datafiles.

Since the database size is small (50% the size of the Linux RAM) – the data is mostly cached inside the guest, and so most reads do not hit storage. As a result we only see writes going to the DB files.

Additionally, we see that database datafile writes the arrive in a bursty fashion, and that these write bursts are more intense (~10x) than the log file writes.

Charts from Prometheus/Grafana showing IO rates seen from the perspective of the Linux guest VM

Despite the database flushes ocurring in bursts with a decent amount of concurrency the Nutanix CVM provides an average of 1.5ms write response time.

From the Nutanix CVM port 2009 handler, we can access the individual vdisk statistics. In this particular case vDisk 45269 is the data file disk, and 40043 is the database transaction log disk.

Datafile writes completed in 1.5millisecond average – despite deep queues during burst

The vdisk categorizer correctly identifies the database datafile write pattern as highly random.

Writes to the datbase datafiles are almost entirely random

As a result, the writes are passed into the replicated oplog

The burst of writes hits the oplog as expected

Meanwhile the log writes are categorized as mostly sequential, which is expected for a database log file workload.

Meanwhile, log file writes are mostly categorized as sequential.

Even though the log writes are sequential, they are low-concurrency and small size (looks like mostly 16K-32K). This write pattern is also a good candidate for oplog.

These low-concurrency log writes also hit oplog

Install a bitnami image to Nutanix AHV cluster.

One of the nice things about using public cloud is the ability to use pre-canned application virtual appliances created by companies like Bitnami.

We can use these same appliance images on Nutanix AHV to easily do a Postgres database benchmark

Step 1. Get the bitnami image

wget  https://bitnami.com/redirect/to/587231/bitnami-postgresql-11.3-0-r56-linux-debian-9-x86_64.zip

Step 2. Unzip the file and convert the bitnami vmdk images to a single qcow2[1] file.

qemu-img convert *vmdk bitnami.qcow2

Put the bitnami.qcow2 image somewhere accessible to a browser, connected to the Prism service, then upload using the “Image Configuration”

Once the image is uploaded, it’s time to create a new VM based on that image

Once booted, you’ll see the bitnami logo and you can configure the bitnami passwords, enable ssh etc. using the console.

Enable/disable ssh in bitnami images
Connecting to Postgres in bitnami images
Note – when you “sudo -c postgres <some-psql-tool> the password it is asking for is the Postgres DB password (stored in ./bitnami-credentials) not any unix user password.

Once connected to the appliance we can use postgres and pgbench to generate simplistic database workload.

[1] Do this on a Linux box somewhere. For some reason the conversion failed using my qemu utilities installed via brew. Importing OVAs direct into AHV should be available in the future.

Nutanix AES: Performance By Example PT2

How to improve large DB read performance by 2X

Nutanix AOS 5.10 ships with a feature called Autonomous Extent Store (AES).  AES effectively provides Metadata Locality to complement the existing data locality that has always existed.  For large datasets (e.g. a 10TB database with 20% hot data) we observe a 2X improvement in throughput for random access across the 2TB hot dataset.

In our experiment we deliberately size the active working-set to NOT fit into the metadata cache.   We uniformly access 2TB with a 100% random access pattern and record the time to access all 2TB.  On the same hardware with AES enabled – the time is cut in half.  As can be seen in the chart – the throughput is double, as expected.

It is the localization of metadata from AES that contributes to the 2X improvement.  AES keeps most of the metadata local to the node – so there is no need to fetch data across-the-wire.  Additionally  AES reduces the need to cache metadata in DRAM since local access is so fast. For very large datasets, retrieving metadata can contribute a large proportion of the access time.  This is true for all storage, so speeding up metadata resolution can make a dramatic improvement to overall throughput as we demonstrate.