Quick & Dirty Prometheus on OS-X

Install prometheus

  • Download the compiled prometheus binaries from prometheus.io
  • Unzip the binary and cd into the directory.
  • Run the prometheus binary, from the command line, it will listen on port 9090
$ cd /Users/gary.little/Downloads/prometheus-2.16.0-rc.0.darwin-amd64
$ ./prometheus
  • From a local browser, point to localhost:9090
prometheus web-ui

Add a collector/scraper to monitor the OS

Prometheus itself does not do much apart from monitor itself, to do anything useful we have to add a scraper/exporter module. The easiest thing to do is add the scraper to monitor OS-X itself. As in Linux the OS exporter is simply called “node exporter”.

Start by downloading the pre-compiled darwin node exporter from prometheus.io

  • Unzip the tar.gz
  • cd into the directory
  • run the node exporter
$ cd /Users/gary.little/Downloads/node_exporter-0.18.1.darwin-amd64
$ ./node_exporter
 INFO[0000] Starting node_exporter (version=0.18.1, branch=HEAD, revision=3db77732e925c08f675d7404a8c46466b2ece83e)  source="node_exporter.go:156"
 INFO[0000] Build context (go=go1.11.10, user=root@4a30727bb68c, date=20190604-16:47:36)  source="node_exporter.go:157"
 INFO[0000] Enabled collectors:                           source="node_exporter.go:97"
 INFO[0000]  - boottime                                   source="node_exporter.go:104"
 INFO[0000]  - cpu                                        source="node_exporter.go:104"
 INFO[0000]  - diskstats                                  source="node_exporter.go:104"
 INFO[0000]  - filesystem                                 source="node_exporter.go:104"
 INFO[0000]  - loadavg                                    source="node_exporter.go:104"
 INFO[0000]  - meminfo                                    source="node_exporter.go:104"
 INFO[0000]  - netdev                                     source="node_exporter.go:104"
 INFO[0000]  - textfile                                   source="node_exporter.go:104"
 INFO[0000]  - time                                       source="node_exporter.go:104"
 INFO[0000] Listening on :9100                            source="node_exporter.go:170""

We now have both a prometheus server and a node exporter, but they are totally independent processes. To teach the prometheus server about the node exporter, we need to edit the promethus.yml file to tell prometheus how to reach the node exporter.

Since the node exporter runs on a well-known port, we simply tell prometheus which port to talk to, then restart prometheus to pick up the new information.

Here is what my promethus.yml file looks like after adding in the node exporter. The change is that there is a “job_name” added called “node” and it lives on “localhost:9100” which is the default port number for the node exporter.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']

Now restart prometheus to pick up the new config

$ ./prometheus 
level=info ts=2020-02-07T17:26:49.042Z caller=main.go:295 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-02-07T17:26:49.043Z caller=main.go:331 msg="Starting Prometheus" version="(version=2.16.0-rc.0, branch=HEAD, revision=22a04239c937be61df95fdb60f0661684693cf3b)"
level=info ts=2020-02-07T17:26:49.043Z caller=main.go:332 build_context="(go=go1.13.7, user=root@c7d619905021, date=20200131-22:56:34)"
level=info ts=2020-02-07T17:26:49.043Z caller=main.go:333 host_details=(darwin)
level=info ts=2020-02-07T17:26:49.043Z caller=main.go:334 fd_limits="(soft=256, hard=unlimited)"
level=info ts=2020-02-07T17:26:49.043Z caller=main.go:335 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-02-07T17:26:49.051Z caller=main.go:661 msg="Starting TSDB ..."
level=info ts=2020-02-07T17:26:49.051Z caller=web.go:508 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-02-07T17:26:49.055Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1580947174039 maxt=1580947200000 ulid=01E0C9989RFMB89NGY09SVPSC0
level=info ts=2020-02-07T17:26:49.056Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1580947200000 maxt=1580954400000 ulid=01E0C99ACPYCVKDCS1BHA0N07C
level=info ts=2020-02-07T17:26:49.057Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1580954400000 maxt=1580961600000 ulid=01E0D87R60GZV9HPPFX7CSDWZM
level=info ts=2020-02-07T17:26:49.058Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1580990400000 maxt=1580997600000 ulid=01E0DG80PJKN99HQF01V1EKAH1
level=info ts=2020-02-07T17:26:49.075Z caller=head.go:577 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-02-07T17:26:49.081Z caller=head.go:601 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-02-07T17:26:49.082Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=10 maxSegment=14
level=info ts=2020-02-07T17:26:49.084Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=11 maxSegment=14
level=info ts=2020-02-07T17:26:49.087Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=12 maxSegment=14
level=info ts=2020-02-07T17:26:49.115Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=13 maxSegment=14
level=info ts=2020-02-07T17:26:49.116Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=14 maxSegment=14
level=info ts=2020-02-07T17:26:49.118Z caller=main.go:676 fs_type=19
level=info ts=2020-02-07T17:26:49.118Z caller=main.go:677 msg="TSDB started"
level=info ts=2020-02-07T17:26:49.119Z caller=main.go:747 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2020-02-07T17:26:49.143Z caller=main.go:775 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2020-02-07T17:26:49.143Z caller=main.go:630 msg="Server is ready to receive web requests."

Now the “node” exporter will show up as a “target” in the prometheus UI (localhost:9100)

Let’s just see if we can get a pretty line chart of cpu usage. By hitting the link labeled “http://localhost:9100/metrics” we will see all of the things that the node exporter collects and stores.

There’s a likely looking metric labeled node_cpu_seconds_total{cpu=”0″,mode=”user”} which seems like it might fit the bill.
Let’s try to create a chart using that metric. Simply hit the link labeled “Graph”

Then paste the metric into the box and hit execute

Then hit “Graph”

You will see a line that just rises, which seems odd for a CPU line chart. What’s happening here is that prometheus is just scraping the total CPU seconds spent in user-mode since boot and storing that value, so of course it always goes up.

What we are used to seeing in a CPU chart is the fraction of time the CPU spent in “user-mode” over time. For instance If I sample every 10 seconds, and every 10 seconds the value of “node_cpu_seconds_total{cpu=”0”,mode=”user} increases by say “5” then I can say that the rate is 5s/10s or 50% (.5)

So to turn this line into something more familiar I use the “irate” function to turn the ever-increasing value of node_cpu_seconds_total into a rate.

Now this chart is showing rate, as sampled every 30s

So, now I have a chart in prometheus that looks sort-of like my chart from activity monitor, albeit on a different timescale. The prometheus chart is showing the user time ratio sampled every 30s whereas the activity monitor samples every 5 seconds. However, if I change the sampling in prometheus down to 5 seconds in the querey box I get this.

No Data points found.

So we need to bump up the scrape frequency

global:
scrape_interval: 5s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 5s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s)

Trying to go down to 5s still gives me no datapoints found, even though my scrape interval is 5 seconds. Let’s change the scrape interval to 1second

my global config
global:
scrape_interval: 1s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 1s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s)

Even scraping at 1second, I cannot use “1s” in the query. It looks like the query has to be at least 2x the scrape interval specified.

Where’s the data?

One of the great things about prometheus is that it’s more than just real-time monitoring. The scraped data is being stored in a database.

The database by default is stored in the “data” subdirectory You can use the “tsdb” binary (in the same directory as “prometheus” binary itself) to get info

$ ./tsdb ls data
BLOCK ULID MIN TIME MAX TIME NUM SAMPLES NUM CHUNKS NUM SERIES
01E0GB1T9R7FPTD894DJPHKKM8 1580947174039 1580961600000 39281 1838 776
01E0GGBWD1E1GTN796EXRBRKKX 1580990400000 1581012000000 190991 2912 728
01E0GGBSXTYYQ2RR68J8VHS60M 1581091200000 1581098400000 100358 784 784



Paper: A Nine year study of filesystem and storage benchmarking

A 2007 paper, that still has lots to say on the subject of benchmarking storage and filesystems. Primarily aimed at researchers and developers, but is relevant to anyone about to embark on a benchmarking effort.

  • Use a mix of macro and micro benchmarks
  • Understand what you are testing, cached results are fine – as long as that is what you had intended.

The authors are clear on why benchmarks remain important:

Ideally, users could test performance in their own settings using real work- loads. This transfers the responsibility of benchmarking from author to user. However, this is usually impractical because testing multiple systems is time consuming, especially in that exposing the system to real workloads implies learning how to configure the system properly, possibly migrating data and other settings to the new systems, as well as dealing with their respective bugs.”

We cannot expect end-users  to be experts in benchmarking. It is out duty as experts  to provide the tools (benchmarks) that enable users to make purchasing decisions without requiring years of benchmarking expertise.

How scalable is my Nutanix cluster really?

In a previous post I showed a chart which plots concurrency [X-axis] against throughput (IOPS) on the Y-Axis.  Here is that plot again:

Experienced performance chart ogglers will notice the familiar pattern of Littles Law, whereby throughput (X) rises quickly as concurrency (N) is increased.  As we follow the chart to the right, the slope flattens out and we achieve a lower increase in throughput, even as we increase concurrency by the same amount at each stage.  The flattening of the curve is best understood as Amdahls Law.

Anyone who follows Dr. Neil Gunther and his Universal Scalability Law, will also recognize this curve.

The USL states that taking the values of concurrency and throughput as inputs, we can in fact calculate the scalability of the system.  Specifically we are able to calculate the key factors of contention and crosstalk – which limit absolute linear scalability and eventually result in less throughput as additional load is submitted even as the capacity of the system is saturated.

I was fortunate to find both a very useful tool, and an easy-to-read summary of the USL from the Vivid Cortex site.  Both were written by Baron Schwartz.  I encourage anyone interested in scalability to check out his paper.

Using his Excel spreadsheet, I was able to input the numbers from my test and derive values that determine scalability.

Taking the largest number (0.074%)  the “contention value” (i.e the impact we expect due to Amdahls law) as the limit to linear scaling – we can say that for this particular cluster, running this particular (simplistic/synthetic) workload the Nutanix cluster scales 99.926% linear.  Although I did not crank up the concurrency beyond 576, the model shows us that this cluster will start to degrade performance if we try to push concurrency beyond 600 or so.  Again, the USL model is for this particular workload – on this particular cluster.  Doubling the concurrency of the offered load to 1200 will only net us 500,000 IOPS according to the model.

The high linearity (99.926%) is expected. The workload is 100% read, and with the data-locality feature of Nutanix filesystem – we expect close to 100% scalability.

We will return to these measures of scalability in the future to look at more realistic workloads.

Here is the Excel Sheet with my data : VividCortex_USL_Worksheet_v1 You are here