
How to use the “jobs” and “clients” parameters in pgbench without going crazy.
Continue readingWith help from the Nutanix X-Ray team I have created an IO “benchmark” which simulates a “General Server Virtualization” workload. I call it the “Mixed Workload Simulator“
Continue readingHow to speed up your X-ray benchmark development cycle by re-using/re-cycling benchmark VMs and more importantly data-sets.
Continue readingI have VMs running on bare-metal instances. Each bare-metal instance is in a separate rack by design (for fault tolerance). The bandwidth is 25GbE however, the response time between the hosts is so high that I need multiple streams to consume that bandwidth.
Compared to my local on-prem lab I need many more streams to get the observed throughput close to the theoretical bandwidth of 25GbE
# iperf Streams | AWS Throughput | On-Prem Throughput |
1 | 4.8 Gbit | 21.4 Gbit |
2 | 9 Gbit | 22 Gbit |
4 | 18 Gbit | 22.5 |
8 | 23 Gbit | 23 Gbit |
A series of videos showing how to install, run, modify and analyze HCI clusters with the Nutanix X-ray tool
Continue readingHow to identify optane drives in linux OS using lspci.
Continue readingUse the following SQL to drop the tables and indexes in the HammerDB TPC-H schema, so that you can re-load it.
Continue readingTips and tricks for using diskspd especially useful for those familar with tools like fio
Continue readingHow to ensure performance testing with diskspd is stressing the underlying storage devices, not the OS filesystem.
Continue readingHow to install and setup diskspd before starting your first performance tests and avoiding wrong results due to null byte issues.
Continue readingMany storage performance testers are familiar with vdbench, and wish to use it to test Hyper-Converged (HCI) performance. To accurately performance test HCI you need to deploy workloads on all HCI nodes. However, deploying multiple VMs and coordinating vdbench can be tricky, so with X-ray we provide an easy way to run vdbench at scale. Here’s how to do it.
Continue readingTo achieve the maximum throughput on a storage device, we will usually use a large IO size to maximize the amount of data is transferred per IO request. The idea is to make the ratio of data-transfers to IO requests as large as possible to reduce the CPU overhead of the actual IO request so we can get as close to the device bandwidth as possible. To take advantage of and pre-fetching, and to reduce the need for head movement in rotational devices, a sequential pattern is used.
For historical reasons, many storage testers will use a 1MB IO size for sequential testing. A typical fio command line might look like something this.
fio --name=read --bs=1m --direct=1 --filename=/dev/sdaContinue reading
The real-world achievable SSD performance will vary depending on factors like IO size, queue depth and even CPU clock speed. It’s useful to know what the SSD is capable of delivering in the actual environment in which it’s used. I always start by looking at the performance claimed by the manufacturer. I use these figures to bound what is achievable. In other words, treat the manufacturer specs as “this device will go no faster than…”.
Start by identifying the exact SSD type by using lsscsi. Note that the disks we are going to test are connected by ATA transport type, therefore the maximum queue depth that each device will support is 32.
# lsscsi
[1:0:0:0] cd/dvd QEMU QEMU DVD-ROM 2.5+ /dev/sr0
[2:0:0:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sda
[2:0:1:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdb
[2:0:2:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdc
[2:0:3:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/
The marketing name for these Samsung SSD’s is “SSD 850 EVO 2.5″ SATA III 1TB“
The spec sheet for this ssd claims the following performance characteristics.
Workload (Max) | Spec | Measured |
Sequential Read (QD=8) | 540 MB/s | 534 |
Sequential Write (QD=8) | 520 MB/s | 515 |
Read IOPS 4KB (QD=32) | 98,000 | 80,00 |
Write IOPS 4KB (QD=32) | 90,000 | 67,000 |
How to install Prometheus on OS-X
$ cd /Users/gary.little/Downloads/prometheus-2.16.0-rc.0.darwin-amd64
$ ./prometheus
Prometheus itself does not do much apart from monitor itself, to do anything useful we have to add a scraper/exporter module. The easiest thing to do is add the scraper to monitor OS-X itself. As in Linux the OS exporter is simply called “node exporter”.
Start by downloading the pre-compiled darwin node exporter from prometheus.io
$ cd /Users/gary.little/Downloads/node_exporter-0.18.1.darwin-amd64 $ ./node_exporter INFO[0000] Starting node_exporter (version=0.18.1, branch=HEAD, revision=3db77732e925c08f675d7404a8c46466b2ece83e) source="node_exporter.go:156" INFO[0000] Build context (go=go1.11.10, user=root@4a30727bb68c, date=20190604-16:47:36) source="node_exporter.go:157" INFO[0000] Enabled collectors: source="node_exporter.go:97" INFO[0000] - boottime source="node_exporter.go:104" INFO[0000] - cpu source="node_exporter.go:104" INFO[0000] - diskstats source="node_exporter.go:104" INFO[0000] - filesystem source="node_exporter.go:104" INFO[0000] - loadavg source="node_exporter.go:104" INFO[0000] - meminfo source="node_exporter.go:104" INFO[0000] - netdev source="node_exporter.go:104" INFO[0000] - textfile source="node_exporter.go:104" INFO[0000] - time source="node_exporter.go:104" INFO[0000] Listening on :9100 source="node_exporter.go:170""Continue reading
Some versions of HammerDB (e.g. 3.2) may induce imbalanced NUMA utilization with SQL Server.
This can easily be observed with Resource monitor. When NUMA imbalance occurs one of the NUMA nodes will show much larger utilization than the other. E.g.
The cause and fix is well documented on this blog. In short HammerDB issues a short lived connection, for every persistent connection. This causes the SQL Server Round-robin allocation to send all the persistent worker threads to a single NUMA Node! To resolve this issue, simply comment out line #212 in the driver script.
If successful you will immediately see that the NUMA nodes are more balanced. Whether this results in more/better performance will depend on exactly where the bottleneck is.