modern-ops: Python on OS-X

modern-ops requires a lot of automation.  There are too many moving parts to operate at cloud scale without tools. In the old days of a few persistent Unix servers, it was possible to get by with shell scripts but those days are gone.  Python is a great language to move to, I never really got on with Perl.

The only downside to Python is that it’s easy to get into a mess of conflicting packages and versions.

Of course there’s a solution, and the one I have chosen is pyenv + pyenv-virtualenv.  This solution was suggested by George Dowding – who is my personal Python guru.

Basically pyenv – allows multiple versions of python to co-exist on the same computer  (e.g. 2.7, 3.4, 3.6) and virtualenv, allows each Python application to have its own dependencies – e.g. specific versions of libraries, even within the same Python version family.

I am using Mac OS-X to run python, so I need to manage that too.  I am using “Homebrew” package manager.

Generally: Anything that is installed on OS-X from Disk Images or pip-install will use the standard OS-X Python.  Anything installed from git or via brew, will use pyenv.

To get all this in place we do the following. In this example, I am installing the aws cli package, that requires Python 3.4.

  1.  Install Homebrew via usual OS-X install process .
    • $ brew update
    • $ brew install pyenv
  2. Install virtualenv as part of pyenv
    • $ brew install pyenv-virtualenv
  3. Install some python versions (currently these are modern versions of Python 2.x and 3.x
    • pyenv install 3.4.9
    • pyenv install 2.7.15
  4. Create a specific virtualenv for e.g the AWS CLI package with version 3.4.9
    • In this case we call our virtual env awe-cli-pyenv-ve (-ve for virtualenv)
      • $ pyenv virtualenv 3.4.9 aws-cli-pyenv-ve
  5. Activate the virtualenv and “jump into it”.  After activation the prompt will change.
    • $ pyenv activate 3.4.9/envs/aws-cli-pyenv-ve

Note below, the prefix of the virtualenv in the shell prompt.

To get into the virtualenv from a normal shell.

Now we can run the python command.  In this case “aws”

Next we’ll setup the AWS cli so we can start manipulating the AWS environment from the CLI.

Nutanix AES: Performance By Example PT2

How to improve large DB read performance by 2X

Nutanix AOS 5.10 ships with a feature called Autonomous Extent Store (AES).  AES effectively provides Metadata Locality to complement the existing data locality that has always existed.  For large datasets (e.g. a 10TB database with 20% hot data) we observe a 2X improvement in throughput for random access across the 2TB hot dataset.

In our experiment we deliberately size the active working-set to NOT fit into the metadata cache.   We uniformly access 2TB with a 100% random access pattern and record the time to access all 2TB.  On the same hardware with AES enabled – the time is cut in half.  As can be seen in the chart – the throughput is double, as expected.

It is the localization of metadata from AES that contributes to the 2X improvement.  AES keeps most of the metadata local to the node – so there is no need to fetch data across-the-wire.  Additionally  AES reduces the need to cache metadata in DRAM since local access is so fast. For very large datasets, retrieving metadata can contribute a large proportion of the access time.  This is true for all storage, so speeding up metadata resolution can make a dramatic improvement to overall throughput as we demonstrate.

Nutanix AES: Performance By Example.

How to reduce database restore time by 50%

During .Next 2018 in London, Nutanix announced performance improvements in the core-datapath said to give up to 2X performance improvements.  Here’s a real-world example of that improvement in practice.

I am using X-Ray to simulate a 1TB data restore into an existing database.  Specifically the IO sizes are large, an even split of 64K,128K,256K, 1MB and the pattern is 100% random across the entire 1TB dataset.

Normally storage benchmarks using large IO sizes are performed serially, because it’s easier on the storage back-end.  That may be realistic for an initial load, but in this case we want to simulate a restore where the pattern is 100% random.

In this case the time to ingest 1TB drops by half when using Nutanix AOS 5.10 with Autonomous Extent Store (AES) enabled Vs the previous traditional extent store.

This improvement is possible because with AES, inserting directly into the extent store is much faster.

For throughput sensitive, random workloads, AES can detect that it will be faster to skip the oplog. Skipping oplog allows AES to eliminate a network round trip to a remote oplog – and instead only make an RF2 copy for the Extent Store.    By contrast, when sustained, large random IO is funneled into oplog, the 10Gbit network  can become the bottleneck.  Even with faster networks, AES will still be a benefit because the CPU and SSD resource usage is also lower.  Unfortunately I only have 10Gbit networking in my lab!

The X-Ray files needed to run this test are on github