How to use the “jobs” and “clients” parameters in pgbench without going crazy.Continue reading
How to speed up your X-ray benchmark development cycle by re-using/re-cycling benchmark VMs and more importantly data-sets.Continue reading
I have VMs running on bare-metal instances. Each bare-metal instance is in a separate rack by design (for fault tolerance). The bandwidth is 25GbE however, the response time between the hosts is so high that I need multiple streams to consume that bandwidth.
Compared to my local on-prem lab I need many more streams to get the observed throughput close to the theoretical bandwidth of 25GbE
|# iperf Streams||AWS Throughput||On-Prem Throughput|
|1||4.8 Gbit||21.4 Gbit|
|2||9 Gbit||22 Gbit|
|8||23 Gbit||23 Gbit|
End to End Creation of a Nutanix Cluster on AWS and Running X-RayContinue reading
Scale factor to workingset size lookup for tiny databasesContinue reading
A series of videos showing how to install, run, modify and analyze HCI clusters with the Nutanix X-ray toolContinue reading
How to identify optane drives in linux OS using lspci.Continue reading
Use the following SQL to drop the tables and indexes in the HammerDB TPC-H schema, so that you can re-load it.Continue reading
Tips and tricks for using diskspd especially useful for those familar with tools like fioContinue reading
How to ensure performance testing with diskspd is stressing the underlying storage devices, not the OS filesystem.Continue reading
How to install and setup diskspd before starting your first performance tests and avoiding wrong results due to null byte issues.Continue reading
How can database density be measured?
- How does database performance behave as more DBs are consolidated?
- What impact does running the CVM have on available host resources?
- The cluster was able to achieve ~90% of the theoretical maximum.
- CVM overhead was 5% for this workload.
Many storage performance testers are familiar with vdbench, and wish to use it to test Hyper-Converged (HCI) performance. To accurately performance test HCI you need to deploy workloads on all HCI nodes. However, deploying multiple VMs and coordinating vdbench can be tricky, so with X-ray we provide an easy way to run vdbench at scale. Here’s how to do it.Continue reading
First things First
Why do we tend to use 1MB IO sizes for throughput benchmarking?
To achieve the maximum throughput on a storage device, we will usually use a large IO size to maximize the amount of data is transferred per IO request. The idea is to make the ratio of data-transfers to IO requests as large as possible to reduce the CPU overhead of the actual IO request so we can get as close to the device bandwidth as possible. To take advantage of and pre-fetching, and to reduce the need for head movement in rotational devices, a sequential pattern is used.
For historical reasons, many storage testers will use a 1MB IO size for sequential testing. A typical fio command line might look like something this.
fio --name=read --bs=1m --direct=1 --filename=/dev/sdaContinue reading
The real-world achievable SSD performance will vary depending on factors like IO size, queue depth and even CPU clock speed. It’s useful to know what the SSD is capable of delivering in the actual environment in which it’s used. I always start by looking at the performance claimed by the manufacturer. I use these figures to bound what is achievable. In other words, treat the manufacturer specs as “this device will go no faster than…”.
Start by identifying the exact SSD type by using lsscsi. Note that the disks we are going to test are connected by ATA transport type, therefore the maximum queue depth that each device will support is 32.
[1:0:0:0] cd/dvd QEMU QEMU DVD-ROM 2.5+ /dev/sr0
[2:0:0:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sda
[2:0:1:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdb
[2:0:2:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdc
[2:0:3:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/
The marketing name for these Samsung SSD’s is “SSD 850 EVO 2.5″ SATA III 1TB“
Identify device specs
The spec sheet for this ssd claims the following performance characteristics.
|Sequential Read (QD=8)||540 MB/s||534|
|Sequential Write (QD=8)||520 MB/s||515|
|Read IOPS 4KB (QD=32)||98,000||80,00|
|Write IOPS 4KB (QD=32)||90,000||67,000|
How to install Prometheus on OS-X
- Download the compiled prometheus binaries from prometheus.io
- Unzip the binary and cd into the directory.
- Run the prometheus binary, from the command line, it will listen on port 9090
$ cd /Users/gary.little/Downloads/prometheus-2.16.0-rc.0.darwin-amd64
- From a local browser, point to localhost:9090
Add a collector/scraper to monitor the OS
Prometheus itself does not do much apart from monitor itself, to do anything useful we have to add a scraper/exporter module. The easiest thing to do is add the scraper to monitor OS-X itself. As in Linux the OS exporter is simply called “node exporter”.
Start by downloading the pre-compiled darwin node exporter from prometheus.io
- Unzip the tar.gz
- cd into the directory
- run the node exporter
$ cd /Users/gary.little/Downloads/node_exporter-0.18.1.darwin-amd64 $ ./node_exporter INFO Starting node_exporter (version=0.18.1, branch=HEAD, revision=3db77732e925c08f675d7404a8c46466b2ece83e) source="node_exporter.go:156" INFO Build context (go=go1.11.10, user=root@4a30727bb68c, date=20190604-16:47:36) source="node_exporter.go:157" INFO Enabled collectors: source="node_exporter.go:97" INFO - boottime source="node_exporter.go:104" INFO - cpu source="node_exporter.go:104" INFO - diskstats source="node_exporter.go:104" INFO - filesystem source="node_exporter.go:104" INFO - loadavg source="node_exporter.go:104" INFO - meminfo source="node_exporter.go:104" INFO - netdev source="node_exporter.go:104" INFO - textfile source="node_exporter.go:104" INFO - time source="node_exporter.go:104" INFO Listening on :9100 source="node_exporter.go:170""Continue reading
Some versions of HammerDB (e.g. 3.2) may induce imbalanced NUMA utilization with SQL Server.
This can easily be observed with Resource monitor. When NUMA imbalance occurs one of the NUMA nodes will show much larger utilization than the other. E.g.
The cause and fix is well documented on this blog. In short HammerDB issues a short lived connection, for every persistent connection. This causes the SQL Server Round-robin allocation to send all the persistent worker threads to a single NUMA Node! To resolve this issue, simply comment out line #212 in the driver script.
If successful you will immediately see that the NUMA nodes are more balanced. Whether this results in more/better performance will depend on exactly where the bottleneck is.