Using Prometheus and Grafana to monitor a Nutanix Cluster.

Posted on May 17, 2024June 10, 2024 by gary

Using a small python script we can liberate data from the “Analysis” page of prism element and send it to prometheus, where we can combine cluster metrics with other data and view them all on some nice Grafana dashboards.

Continue reading →

A Nutanix / Prometheus exporter in bash

Posted on May 3, 2024May 6, 2024 by gary

Overview

For a fun afternoon project, how about a retro prometheus exporter using Apache/nginx, cgi-bin and bash!?

About prometheus format

A Prometheus exporter simply has to return a page with metric names and metric values in a particular format like below.

ntnx_bash{metric="cluster_read_iops"} 0
ntnx_bash{metric="cluster_write_iops"} 1

When you configure prometheus via prometheus.yml you’re telling prometheus to visit a particular IP:Port over HTTP and ask for a page called metrics – so if the “page” called metrics is a script – the script just has to return (print) out data in the expected format – and prometheus will accept that as a basic “exporter”. The idea here is to write a very simple exporter in bash that connects to a Nutanix cluster – hits the stats API and returns IOPS data for a given container in the correct format.

Continue reading →

Viewing Nutanix cluster metrics in prometheus/grafana

Posted on July 31, 2023August 1, 2023 by gary

Using Nutanix API with prometheus push-gateway.

Many customers would like to view their cluster metrics alongside existing performance data using Prometheus/Grafana

Currently Nutanix does not provide a native exporter for Prometheus to use as a datasource. However we can use the prometheus push-gateway and a simple script which pulls from the native APIs to get data into prometheus. From there we can use Grafana or anything that can connect to Prometheus.

The goal is to be able to view cluster metrics alongside other Grafana dashboards. For example show the current Read/Write IOPS that the cluster is delivering on a per container basis. I’m hard-coding IPs and username/passwords in the script which obviously is not production grade, so don’t do that.

Continue reading →

Effects of CPU topology on sqlserver guests with AHV.

Posted on February 28, 2023February 28, 2023 by gary

VM CPU Topology

The topology (layout) that AHV presents virtual Sockets/CPU to the guest operating system will usually be different than the physical topology. This is expected because we typically present a subset of all cores to the guest VMs.

Usually it is the total number of vCPU given to the VM that matters, not the specific topology, but in the case of SQLserver running an analytical workload (a TPC-H like workload from HammerDB) the topology passed to the VM does make a difference. Between 10% and 20% when measured by the total runtime.

[I think that the reason we see a difference here is that (a) the analytical workloads use hardly any storage bandwidth (I sized the database to fit in memory) and (b) there is probably a lot of cross-talk between the cores/memory as the DB engine issues parallel queries.]

At any rate we see that passing 20 cores as “20 sockets of 1 core” beats the performance of “1 socket with 20 cores” by a wide margin. The physical topology is two sockets of 20 cores on each socket. Thankfully the better performing option is the default.

CPU Topology may make a difference for SQL server running analytical workloads.

Continue reading →

Using cloud-init with AHV command line

Posted on August 26, 2022September 7, 2022 by gary

TL;DR

Using cloud-init with AHV is conceptually identical to using KVM/QEMU- we need to use a few different tools with AHV
You will need a Linux image that is configured to use cloud-init. A good source is cloud-images.ubuntu.com
We will create a cloud-init textual file and create a mountable version using the cloud-localds tool on a Linux host
We will attach the cloud-init enabled ubuntu image and our cloud-init customization file to the VM at boot time
At boottime ubuntu will access the cloud-init data mounted as a CDROM and do the customization for us

Continue reading →

Comparing RDS and Nutanix Cluster performance with HammerDB

Posted on June 14, 2022January 3, 2023 by gary

tl;dr

In a recent experiment using Amazon RDS instance and a VM running in an on-prem Nutanix cluster, both using Skylake class processors with similar clock speeds and vCPU count. The SQLServer database on Nutanix delivered almost 2X the transaction rate as the same workload running on Amazon RDS.

It turns out that migrating an existing SQLServer VM to RDS using the same vCPU count as on-prem may yield only half the expected performance for CPU heavy database workloads. The root cause is how Amazon thinks about vCPU compared to on-prem.

Benchmark Results

Continue reading →

Single threaded DB performance on Nutanix HCI

Posted on May 27, 2022January 3, 2023 by gary

tl;dr

A Nutanix cluster can persist a replicated write across two nodes in around 250 uSec which is critical for single-threaded DB write workloads. The performance compares very well with hosted cloud database instances using the same class of processor (db.r5.4xlarge in the figure below). The metrics below are for SQL insert transactions not the underlying IO.

Single threaded commit heavy insert rates. Latency as seen from SQL insert statement.

Continue reading →

AHV Tip: Shutdown multiple VMs in parallel

Posted on May 6, 2022September 7, 2022 by gary

Often in my lab I want to shutdown a large number of VMs quickly. In the example below I submit the power-off command for a maximum of 50 VMs in parallel. Be aware that we’re using the command line, and in line with true Unix philosophy the OS will assume we know what we are doing and obey us completely and immediately. In other words pasting the below commands to your CVM will immediately shutdown all powered on VMs.

 for i in $(acli  vm.list power_state=on | awk '{ print $(NF) }' |tail -50); do acli vm.off $i &  done

AOS 6.1 Improvements for Day-2 database operations.

Posted on March 8, 2022January 3, 2023 by gary

AOS 6.1 greatly improved database performance on Nutanix especially when the guest VM uses just a single disk for all the database files. The underlying change is known as vdisk sharding. Basically it allows the Nutanix CVM to scale up the number of threads used to service a single virtual disk under heavy load.

Continue reading →

Nutanix Performance for Database Workloads

Posted on November 24, 2021September 14, 2022 by gary

We’ve come a long way, baby.

Full disclosure. I have worked for Nutanix in the performance engineering group since 2013. My opinions are likely biased, but that also gives me a decent amount of context when it comes to the performance of Nutanix storage over time. We already have a lot of customers running database workloads on Nutanix. But what about those high-performance databases still running on traditional storage?

I dug out a chart that I presented at .Next in 2017 and added to it the performance of a modern platform (AOS 6.0 and an NVME+SSD platform). For this random read microbenchmark performance has more than doubled. If you took a look at a HCI system even a few years back and decided that performance wasn’t where you needed it – there’s a good chance that the HW+SW systems shipping today could meet your needs.

Much more detail below.

Continue reading →

A Generalized workload generator for storage IO

Posted on December 22, 2020November 4, 2022 by gary

With help from the Nutanix X-Ray team I have created an IO “benchmark” which simulates a “General Server Virtualization” workload. I call it the “Mixed Workload Simulator“

Continue reading →

Advanced X-Ray: reducing runtime by re-using VMs.

Posted on October 5, 2020July 14, 2021 by gary

How to speed up your X-ray benchmark development cycle by re-using/re-cycling benchmark VMs and more importantly data-sets.

Continue reading →

How to performance test Nutanix on AWS with X-ray

Posted on August 18, 2020April 2, 2021 by gary

End to End Creation of a Nutanix Cluster on AWS and Running X-Ray

Continue reading →

Nutanix X-Ray video Series

Posted on August 10, 2020January 3, 2023 by gary

A series of videos showing how to install, run, modify and analyze HCI clusters with the Nutanix X-ray tool

Continue reading →

How to download and Install Nutanix X-ray on an AHV cluster

Posted on August 10, 2020April 2, 2021 by gary

How to measure database scaling & density on Nutanix HCI platform.

Posted on March 27, 2020November 5, 2022 by gary

How can database density be measured?

How does database performance behave as more DBs are consolidated?
What impact does running the CVM have on available host resources?

tl;dr

The cluster was able to achieve ~90% of the theoretical maximum.
CVM overhead was 5% for this workload.

Continue reading →

How to run vdbench benchmark on any HCI with X-Ray

Posted on March 23, 2020October 8, 2020 by gary

Many storage performance testers are familiar with vdbench, and wish to use it to test Hyper-Converged (HCI) performance. To accurately performance test HCI you need to deploy workloads on all HCI nodes. However, deploying multiple VMs and coordinating vdbench can be tricky, so with X-ray we provide an easy way to run vdbench at scale. Here’s how to do it.

Continue reading →