Lamentations of the OSS consumer: I'd like to read the $@#%ing manual but no one has written it by Matt Wrock

I've been an active open source consumer and contributor for the past few years and overall being involved in both roles has been one of the peak experiences of my career and I only wish  I had discovered open source much much sooner. However its not all roses, and things can be rough on both sides of the pull request. Especially for those new to these ecosystems and even more so if you come from a heritage of tools and culture not originally friendly to open source.

Yesterday I received a great email from someone asking about how to navigate a very popular infrastructure testing project where the documentation can be sparse in some respects. The project is ServerSpec - a really fantastic tool that many gain value from every day. The question comes from someone new to the chef community and ServerSpec is a key tool used in the chef ecosystem. The questioner immediately won my respect. They were curious but not bitter (at least did not admit to being so) and wanted to know how to learn more and start to contribute and get to a point of writing more creative tests.

This inspired me because i love interacting with people who are passionate about this craft and who like myself want to learn and improve themselves. It also struck a nerve since I have alot of opinions about approaching OSS projects and empathy for those new to the playing field and perhaps feeling a bit awkward. This individual, like myself, comes from a windows background, so I think I have some insights to where he is coming from.

I thought it might be interesting to transform my responses to a blog post. Here are some modified excerpts from my replies.

When Windows is the edge case

I think one issue that windows suffers from in this ecosystem is that it is the "edge case". The vast majority of those using, testing and contributing to this project are largely linux users. So when a minor version bump occurs and the PR notes explicitly call out that it wont break current tests, clearly thats evidence that windows was not tested. Although one can argue that windows is just not much of a player (but that's of course changing).

I'd look at this differently if this was code in a chef owned project where I would expect them to be paying more attention to windows. Regarding ServerSpec, a wholly open source project with no funding and shepherded by a community member who has a full time job that is not ServerSpec, I tend to be more forgiving but it can definitely make for a frustrating development experience at times.

I'm really hoping that more windows folks get involved and contribute more in this ecosystem both with code and documentation and also just filing issues, and I hope that their employers support them in these efforts. They stand to gain alot in doing so.

There may be no manual but there are always the codes

One thing I have found in the ruby world and much of the OSS world outside of ruby is that sometimes the best way to figure something out is to read the source code. The obvious downside to this is that its hard to read a language we may not be familiar with and we just want to write our test and move on with our lives.

So I find myself going back and forth. I may just do some quick “code spelunking” and not find anything that clearly points out how to do what I want and may take an uglier “brute force” approach. On other days depending on mood and barometric pressure in the office, I might be inclined to spend the time and dig deeper. It would be awesome if the authors took the time to spell out how to write custom matchers and resource types, but many OSS projects seem to lack this level of detailed documentation. I’m guessing because no one is paying them to and in the end they, like us, have a problem that needs solving and lack time to document.

One consolation is that these ruby libraries tend to be relatively small. Compare the ServerSpec code base including its sister project Specinfra to something like XUnit in C#. Its a lot less code. Of coarse it may take 3x longer to groc if you are a ruby beginner. What I often find is that given the motivation to learn and be more proficient, you eventually reach a point of minimal comfort with the codebase where you get it enough to see what needs to be added to get the thing to do what you want it to do and that’s when you start making contributions.

Heh. I totally have a weird love hate relationship with this stuff. There are days when  I curse these libraries because I just want to do something that seems so simple and I really have no desire or time to make an investment and then there are other times when  I am totally into the code and loving the sense that  I am gaining an understanding of new patterns and coding constructs and realize I’m gaining some knowledge where I can not only make the code better for me but for others as well.

In the end its all just a constant slog through the marshes of learning and as software engineers, that’s our sweet spot. The ability to live in a state of learning and not so much bask in what we have learned.

Multi node testing with Test-Kitchen and Docker containers by Matt Wrock

Two docker containers created and tested with Kitchen-Docker

Two docker containers created and tested with Kitchen-Docker

My last post provided a walk through of some of the new Windows functionality available in the latest Test-Kitchen RC and demonstrated those features by creating and testing a Windows Active Directory domain controller pair. This post will also be looking at testing multiple nodes but instead of windows, I'll be spinning up multiple docker containers. I'm going to be using a Couchbase cluster as my example. Note that while I am using docker containers, there is nothing special happening here preventing one from running the same tests on multiple linux or windows VMs using the Kitchen-Vagrant driver. Couchbase runs on windows too.

Why run tests with containers when my production nodes are VMs?

There are some really interesting things being done with containers in production environments but even if you are not using containers in production, there are some clear benefits to using them for testing infrastructure development. The biggest value is faster provisioning. Using the kitchen-docker driver over vagrant or another cloud based driver can potentially save several minutes per test. You might wonder "whats a couple minutes?" However, when you are iterating over a problem and need to reprovision several times, a couple minutes or more can add up quick.

You still want to test provisioning to VMs if that is what your production infrastructure runs, but that can sit later in your testing pipeline. You will save alot of time, money and tears (you'll need those later) by keeping your feedback cycles short early in your development process.

Setting things up

To get started you will need to have the docker engine installed and the latest RC of test-kitchen.

Docker Install

There are a few approaches one can take to installing docker. Some are more complicated than others and really depend on your host operating system. I'm using an Ubuntu 14.04 desktop os on my laptop. Ubuntu 14.04 has no prerequisites and you simply run:

wget -qO- https://get.docker.com/ | sh

Ubuntu 12.02 requires a kernel upgrade and several packages before the above install will work. The docker installation documentation provides instructions for most operating systems.  If you are running windows or a mac, you will want to run the docker engine from inside a linux vm. You can either setup a vm of your favorite linux distro and then install docker following the instructions on the docker site or you can install Boot2Docker which will install a local docker CLI, VirtualBox, and a stripped down, tiny core linux image.

This post is not aimed to explore the different ways of installing docker. If you do not already have docker or a vm setup from which you can install it friction-free, take a look at my chef_workstation repo that includes a Vagrantfile that will provision a  workable chef enabled workstation environment with docker installed. It should work with VirtualBox, Hyper-V or Parallels on a mac. I believe it also works for VMWare Fusion users but I have not validated that for a while.

A multi-node enabled cookbook to test

To demonstrate multi node testing with test-kitchen, I have forked the community couchbase cookbook. I'll be sending a PR with these changes:

  • Compatibility with docker (current version uses netstat to validate a listening port an thats not installed on the default ubuntu container)
  • Extends the couchbase-cluster resource to allow other nodes to be joined to a cluster
  • Fixes the cookbook on windows which is unrelated to this post but aligns well with one of my personal missions in life

Clone my fork and checkout the multi-node branch:

git clone -b multi-node https://github.com/mwrock/couchbase

If you are using the vagrant box in my chef_workstation repo, cd to the cookbooks directory just below the directory you land in from vagrant ssh and clone from there.

Using the right gems

To help facilitate testing multiple nodes, this cookbook uses a custom test-kitchen provisioner plugin that utilizes functionality exposed in the latest test-kitchen RC. So the cookbook includes a Gemfile that references both of these gems and other important dependencies. To ensure that you are testing with all of the correct gems, cd into the root of the couchbase cookbook and run:

bundle install

Converge and test the first node

We are now ready to create, converge and test the first node of our couchbase cluster. Make sure to run with bundle exec so that we use all of the correct gem versions:

bundle exec kitchen verify server-community-ubuntu

This will start a new container running ubuntu 12.04, install Couchbase and initialize a new cluster. Then a serverspec test will ensure that the service is running and configured the way we want it.

Joining an additional node to the cluster

To get the full multi-node effect, lets now ask test-kitchen to run our second-node suite:

bundle exec kitchen converge second-node-ubuntu

This brings up a new container that will post to the couchbase rest endpoint of our first node asking to join the cluster. Then its serverspec test will pull the list of nodes in the cluster exposed from the original node and check if our second node is included in the list.

Discovering the original node

One possible strategy could be to set an attribute specifying the IP or host name of the initiating couchbase node. However this assumes it is a known and constant value. You may prefer your infrastructure to dynamically query for an existing couchbase node. In our test scenario, we really cant predict the ip or host name since we are getting IPs from DHCP and docker is handing out a unique hash for a host name.

Note that we could tweak the driver configuration in our .kitchen.yml to expose predictable hostnames that can link to other containers. Here is an example of a possible config for our node suites:

suites:
- name: server-community
  driver:
    publish_all: true
    instance_name: first_cluster
  run_list:
  - recipe[couchbase::server]
  attributes:
    couchbase:
      server:
        password: "whatever"

- name: second-node
  driver:
    links: "first_cluster:first_cluster"
  run_list:
  - recipe[couchbase-tests::default]
  attributes:
    couchbase:
      server:
        password: "whatever"
        cluster_to_join: first_cluster

Here the first node uses the kitchen-docker configuration to ask the docker engine to expose its container with a specific name "first_cluster." The second node is asked to link the name "first_cluster" with he "first_cluster" instance. This way any requests from the second container to the DNS name first_cluster will resolve to our first container. Finally we would create a node attribute named luster_to_join that our second node would ask to join.

This may work for your scenario and thats great. However it may break down for others. First its not very portable. This cookbook supports windows and locking in docker specific options will run into problems for windows tests that leverage vagrant here:

- name: windows-2012R2
  driver:
    name: vagrant
    network:
      - ["private_network", { type: "dhcp" }]
  transport:
    name: winrm
  driver_config:
    gui: true
    box: mwrock/Windows2012R2Full
    customize:
      memory: 1024

Furthermore, our test logic needs to match production logic. If production nodes will be querying the chef server for a node to send cluster join requests to, out tests must validate that this strategy works.

The kitchen-nodes provisioner plugin

In my last post I demonstrated a strategy that uses chef search to find a chef node based on a run list recipe. It used my kitchen-nodes provisioner plugin to create mock chef nodes of each kitchen suite so that a chef search can find other suite test instances during convergences. Since that example was creating a windows active directory controller pair, its functionality had some windows specific functionality. I have extended the functionality of this plugin to support most *Nix scenarios including docker.

First we tell test-kitchen to use the kitchen-nodes plugin as a provisioner for the suites that test our couchbase servers:

suites:
- name: server-community
  provisioner:
    name: nodes
  run_list:
  - recipe[couchbase-tests::ipaddress]
  - recipe[couchbase::server]
  - recipe[export-node]
  attributes:
    couchbase:
      server:
        password: "whatever"

- name: second-node
  provisioner:
    name: nodes
  run_list:
  - recipe[couchbase-tests::ipaddress]
  - recipe[couchbase-tests::default]
  - recipe[export-node]
  attributes:
    couchbase:
      server:
        password: "whatever"

The defult recipe of the couchbase-tests cookbook used by our second node can now find the first node using chef search:

primary = search_for_nodes("run_list:*couchbase??server* AND platform:#{node['platform']}")
node.normal["couchbase-tests"]["primary_ip"] = primary[0]['ipaddress']

The search_for_nodes method is defined in our couchbase-tests library:

require 'timeout'

def search_for_nodes(query, timeout = 120)
  nodes = []
  Timeout::timeout(timeout) do
    nodes = search(:node, query)
    until  nodes.count > 0 && nodes[0].has_key?('ipaddress')
      sleep 5
      nodes = search(:node, query)
    end
  end

  if nodes.count == 0 || !nodes[0].has_key?('ipaddress')
    raise "Unable to find nodes!"
  end

  nodes
end

Here we are using a chef search to find a node that includes the couchbase server recipe and has the same os platform of the current node. Matching on platform is important if our .kitchen.yml is designed to test more than one platform like ours.

Chef-zero and chef search

The kitchen-nodes plugin derives from the chef-zero test-kitchen provisioner. Using chef-zero we can issue a chef-search for nodes without being hooked up to a real chef-server. Chef-zero accomplishes this by storing information on each node in a json file stored in its nodes folder. The test-kitchen chef-zero provisioner wires all of this up by copying all files under tests/integration/nodes to {test-kitchen temp folder on test instance}/nodes. So you can create a json file for each test suite in your local nodes folder and then chef search calls will effectively treat the nodes files as the master chef server database.

The kitchen-nodes plugin automatically generates a node file when a test instance is provisioned by test-kitchen. Provisioning occurs at the very beginning of the converge operation. kitchen-nodes populates the node's json file with ip address, platform, and run list. Here are the two nodes' json files generated in my tests:

{
  "id": "server-community-ubuntu-1204",
  "automatic": {
    "ipaddress": "172.28.128.3",
    "platform": "ubuntu"
  },
  "run_list": [
    "recipe[couchbase-tests::ipaddress]",
    "recipe[couchbase::server]",
    "recipe[export-node]"
  ]
}

{
  "id": "second-node-ubuntu-1204",
  "automatic": {
    "ipaddress": "172.17.128.4",
    "platform": "ubuntu"
  },
  "run_list": [
    "recipe[apt]",
    "recipe[couchbase-tests::ipaddress]",
    "recipe[couchbase-tests::default]",
    "recipe[export-node]"
  ]
}

During provisioning, kitchen-nodes will either use SSH or WinRM depending on the test instance platform to interrogate its interfaces for an IP that is accessible to the host. On windows, this information is retrieved using a few powershell cmdlets and on *Nix instances either ifconfig or ip addr show is used depending on what is available on that distro. There may be several interfaces but kitchen-nodes will only choose an ipv4 ip that can be pinged from the host.

Testing that we joined the correct cluster

So how do we test that we actually found the correct node? We cant write a serverspec test using a hard coded IP. We use a testing recipe, export-node, that dumps the entire node object to a json file. Our test recipe run by the second node stores the primary node's IP in a node attribute as we saw further above.

Here is an instant replay:

node.normal["couchbase-tests"]["primary_ip"] = primary[0]['ipaddress']

So when the export-node cookbook dumps the node data, that IP address will be included. Here is the test that validates the node join:

describe "cluster" do
  let(:node) { JSON.parse(IO.read(File.join(ENV["TEMP"] || "/tmp", "kitchen/chef_node.json"))) }
  let(:response) do
    resp = Net::HTTP.start node["normal"]["couchbase-tests"]["primary_ip"], 8091 do |http|
      request = Net::HTTP::Get.new "/pools/default"
      request.basic_auth "Administrator", "whatever"
      http.request request
    end
    JSON.parse(resp.body)
  end

  it "has found the priary node and it is not itself" do
    expect(node["normal"]["couchbase-tests"]["primary_ip"]).not_to eq(node['automatic']['ipaddress'])
  end

  it "has joined the primary cluster" do
    joined = false
    response['nodes'].each do |cluster_node|
      if cluster_node['hostname'] == "#{node['automatic']['ipaddress']}:8091"
        joined =  true
      end
    end

    expect(joined).to be true
  end
end

The export-nodes cookbook dumps the node json to a file named chef-node.json in the kitchen temp folder. So our test pulls the ip that was returned by the chef search from here. It makes sure that it is in fact a different node from its own IP and then issues a couchbase API request to that node to return all nodes in its cluster. Our test passes as long as the second node is included in the returned node list.

Testing all the things

I find this helpful and reassuring that I can include my node interactions into my tests. Test-Kitchen's coverage can indeed extend well beyond the boundaries of a single node.