What space savings should you expect when running databases with default compression in a Nutanix cluster? When we ran the TPCx-HCI benchmark on our cluster we realized about 2:1 savings from compression alone. The TPCx-HCI benchmark mimics a database consolidation setup, meaning that there are many databases per host. The uncompressed data size was about 45TB.
Additionally, we configured data at rest encryption (DARE). Using the cluster features allows us to both compress and encrypt (compression first, then encrypt). If the database engine itself handled encryption, it would reduce the ability to compress.
Like ZFS, the Nutanix filesystem uses LZ4 compressiom and 2:1 is about in-line with expectations for a realistic dataset. The TPCx-HCI benchmark uses the E-Gen data generation tool to populate the databases. E-Gen was developed for the TPC-E benchmark and uses sources such as census data and NYSE stock listings to generate real data rather than machine generated strings.
Storage bus speeds with example storage endpoints.
Theoretical Bandwidth (MB/s)
HBA <-> Single SATA Drive
HBA <-> Single SAS Drive
HBA <-> SAS/SATA Fanout
4 Lane HBA to Breakout (6 SSD)
HBA <-> SAS/SATA Fanout
8 Lane HBA to Breakout (12 SSD)
Single Lane PCIe3
PCIe <-> SAS HBA or NVMe
Enough for Single NVMe
PICe <-> SAS HBA or NVMe
Enough for SAS-3 4 Lanes
PCIe Bus <-> Processor Socket
Xeon Direct conect to PCIe Bus
All figures here are the theoretical maximums for the busses using rough/easy calculations for bits/s<->bytes/s. Enough to figure out where the throughput bottlenecks are likely to be in a storage system.
SATA devices contain a single SAS/SATA port (connection), and even when they are connected to a SAS3 HBA, the SATA protocol limits each SSD device to ~600MB/s (single port, 6Gbit)
SAS devices may be dual ported (two connections to the device from the HBA(s)) – each with a 12Gbit connection giving a potential bandwidth of 2x12Gbit == 2.4Gbyte/s (roughly) per SSD device.
An NVMe device directly attached to the PCIe bus has access to a bandwidth of 4GB/s by using 4 PCIe lanes – or 8GB/s using 8 PCIe lanes. On current Xeon processors, a single socket attaches to 40 PCIe lanes directly (see diagram below) for a total bandwidth of 40GB/s per socket.
I first started down the road of finally coming to grips with all the different busses and lane types after reading this excellent LSI paper. I omitted the SAS-2 figures from this article since modern systems use SAS-3 exclusively.
There are a lot of explanations for the current Meltdown/Spectre crisis but many did not do a good job of explaining the core issue if how information is leaked from the secret side, to the attackers side. This is my attempt to explain it (mostly to myself to make sure I got it right).
What is going on here generally?
Users and the kernel are normally protected from bad-actors via privileged modes, address page tables and the MMU.
It turns out that code executed speculatively can read any mapped memory. Even addresses/address that would not be readable in the normal program flow.
Thankfully illegal reads from speculatively executed code are not accessible to the attacker.
However, it turns out that we can execute a LOT of code in speculative mode if the pre-conditions are right.
In fact modern instruction pipelines (and slow memory) allow >100 instructions to be executed while memory reads are resolved.
How does it work?
The attacker reads the illegal memory using speculative execution, then uses the values read – to set data in cache lines that ARE LEGITIMATELY VISIBLE to the attacker. Thus creating a side channel between the speculatively executed code and the normal user written code.
The values in the cache lines are not readable (by user code) – but the fact that the cache lines were loaded (or not) *IS* detectable (via timing) since the L3 cache is shared across address-space.
First I ensure the cache lines I want to use in this process are empty.
Then I setup some code that reads an illegal value (using speculative execution technique), and depending on whether that value is 0 or !=0 I would read some other (specific address in the attackers address space) that I know will be cached in cache-line 1. Pretend I execute the second read only if the illegal value is !=0
Finally back in normal user code I attempt to read that same address in my “real” user space. And if I get a quick response – I know that the illegal value was !=0, because the only way I get a quick response is if the cache line was loaded during the speculative execution phase.
It turns out we can encode an entire byte using this method. See below.
The attacker reads a byte – then by using bit shifting etc. – the attacker encodes all 8 bits in 8 separate cache lines that can then be subsequently read.
At this point an attacker has read a memory address he was not allowed to, encoded that value in shared cache-lines and then tested the existence or not of values in the cache lines via timing, and thus re-constructs the value encoded in them during the speculative phase.
This is known as “leakage“.
Broadly there are two phases in this technique
The reading of illegal memory in speculative execution phase then encoding the byte in shared cache lines.
Using timing of reads to those same cache lines to determine if they were “set” (loaded e.g.”1″) or unset (empty “0”) by the attacker to decode the byte from the (set/unset 1/0) cache lines.
Side channels have been a known phenomena for years (at least since the 1990s) what’s different now if how easy, and with such little error rate – attackers are able to read arbitrary memory addresses.
I found these papers to be informative and readable.