Cross rack network latency in AWS

I have VMs running on bare-metal instances. Each bare-metal instance is in a separate rack by design (for fault tolerance). The bandwidth is 25GbE however, the response time between the hosts is so high that I need multiple streams to consume that bandwidth.

Compared to my local on-prem lab I need many more streams to get the observed throughput close to the theoretical bandwidth of 25GbE

# iperf StreamsAWS ThroughputOn-Prem Throughput
14.8 Gbit21.4 Gbit
29 Gbit22 Gbit
418 Gbit22.5
823 Gbit23 Gbit
Difference in throughput for a 25GbE network on-premises Vs AWS cloud (inter-rack)

Quick & Dirty Prometheus on OS-X

How to install Prometheus on OS-X

Install prometheus

  • Download the compiled prometheus binaries from prometheus.io
  • Unzip the binary and cd into the directory.
  • Run the prometheus binary, from the command line, it will listen on port 9090
$ cd /Users/gary.little/Downloads/prometheus-2.16.0-rc.0.darwin-amd64
$ ./prometheus
  • From a local browser, point to localhost:9090
prometheus web-ui

Add a collector/scraper to monitor the OS

Prometheus itself does not do much apart from monitor itself, to do anything useful we have to add a scraper/exporter module. The easiest thing to do is add the scraper to monitor OS-X itself. As in Linux the OS exporter is simply called “node exporter”.

Start by downloading the pre-compiled darwin node exporter from prometheus.io

  • Unzip the tar.gz
  • cd into the directory
  • run the node exporter
$ cd /Users/gary.little/Downloads/node_exporter-0.18.1.darwin-amd64
$ ./node_exporter
 INFO[0000] Starting node_exporter (version=0.18.1, branch=HEAD, revision=3db77732e925c08f675d7404a8c46466b2ece83e)  source="node_exporter.go:156"
 INFO[0000] Build context (go=go1.11.10, user=root@4a30727bb68c, date=20190604-16:47:36)  source="node_exporter.go:157"
 INFO[0000] Enabled collectors:                           source="node_exporter.go:97"
 INFO[0000]  - boottime                                   source="node_exporter.go:104"
 INFO[0000]  - cpu                                        source="node_exporter.go:104"
 INFO[0000]  - diskstats                                  source="node_exporter.go:104"
 INFO[0000]  - filesystem                                 source="node_exporter.go:104"
 INFO[0000]  - loadavg                                    source="node_exporter.go:104"
 INFO[0000]  - meminfo                                    source="node_exporter.go:104"
 INFO[0000]  - netdev                                     source="node_exporter.go:104"
 INFO[0000]  - textfile                                   source="node_exporter.go:104"
 INFO[0000]  - time                                       source="node_exporter.go:104"
 INFO[0000] Listening on :9100                            source="node_exporter.go:170""
Continue reading

Meltdown, speculative execution and side-channels explainer.

There are a lot of explanations for the current Meltdown/Spectre crisis but many did not do a good job of explaining the core issue if how information is leaked from the secret side,  to the attackers side.  This is my attempt to explain it (mostly to myself to make sure I got it right).

What is going on here generally?

  • Generally speaking an adversary would like to read pieces of memory he is not allowed to.  This can be either reading from kernel memory, or reading memory in the same address space that should not be allowed.  e.g. javascript from a random web page should not be able to read the passwords stored in your browser.
  • Users and the kernel are normally protected from bad-actors via privileged modes, address page tables and the MMU.
  • It turns out that code executed speculatively can read any mapped memory.  Even addresses/address that would not be readable in the normal program flow.
    • Thankfully illegal reads from speculatively executed code are not accessible to the attacker.
    • So, although the speculatively executed code reads an illegal value in micro-code, it is not visible to user-written code (e.g. the javascript in the browser)
  • However, it turns out that we can execute a LOT of code in speculative mode if the pre-conditions are right.
    • In fact modern instruction pipelines (and slow memory) allow >100 instructions to be executed while memory reads are resolved.

How does it work?

  • The attacker reads the illegal memory using speculative execution, then uses the values read – to set data in cache lines that ARE LEGITIMATELY VISIBLE to the attacker.   Thus creating a side channel between the speculatively executed code and the normal user written code.
  • The values in the cache lines are not readable (by user code) – but the fact that the cache lines were loaded (or not) *IS* detectable (via timing) since the L3 cache is shared across address-space.
    • First I ensure the cache lines I want to use in this process are empty.
    • Then I setup some code that reads an illegal value (using speculative execution technique), and depending on whether that value is 0 or !=0 I would read some other (specific address in the attackers address space) that I know will be cached in cache-line 1.  Pretend I execute the second read only if the illegal value is !=0
    • Finally back in normal user code I attempt to read that same address in my “real” user space. And if I get a quick response – I know that the illegal value was !=0, because the only way I get a quick response is if the cache line was loaded during the speculative execution phase.
    • It turns out we can encode an entire byte using this method.  See below.
  • The attacker reads a byte – then by using bit shifting etc. – the attacker encodes all 8 bits in 8 separate cache lines that can then be subsequently read.
  • At this point an attacker has read a memory address he was not allowed to, encoded that value in shared cache-lines and then tested the existence or not of values in the cache lines via timing, and thus re-constructs the value encoded in them during the speculative phase.
    • This is known as “leakage“.
  • Broadly there are two phases in this technique
    • The reading of illegal memory in speculative execution phase then encoding the byte in shared cache lines.
    • Using timing of reads to those same cache lines to determine if they were “set” (loaded e.g.”1″) or unset (empty “0”)  by the attacker to decode the byte from the (set/unset 1/0) cache lines.
  • Side channels have been a known phenomena for years (at least since the 1990s) what’s different now if how easy, and with such little error rate – attackers are able to read arbitrary memory addresses.

I found these papers to be informative and readable.

Simple statistics for performance analysts.

As performance analysts we often have to summarize large amounts of data in order to make engineering decisions or understand existing behavior.  This paper will help you do exactly that!  Many analysts know that using statistics can help, but statistical analysis is a huge field in itself and has its own complexity.  The article below distills the essential techniques that can help you with typical performance analysis tasks.

PDF Download.

[pdf-embedder url=”https://www.n0derunner.com/wp-content/uploads/2018/01/Statistics-for-the-performance-analyst.pdf” title=”Statistics for the performance analyst”]