Using git to version stamp chef artifacts / by Matt Wrock

This post is not about using git for source control. It assumes that you are already doing that. What I am going to discuss is a version numbering strategy that leverages the git log. The benefit here is the guarantee that any change in the artifact (cookbook, environment, data bag) will result in a unique version number that will not conflict with other versions provided by your fellow teammates. It ensures that deciding on what version to stamp your change is one thing you don’t need to think about. I'll close the post demonstrating how this can be automated as a part of your build process.

The strategy explained

You can use the git log command to list all commits applied to a directory or individual file in your repository:

git log --pretty=oneline some_directory/

This will list all commits within the some_directory directory in a single line per commit that prints the sha1 and the commit comment. To make this a version, you would count these lines:

powershell:
(git log --pretty=oneline some_directory/).count

bash:
git log --pretty=oneline some_directory/ | wc –l

Semantic versioning

If you are using semantic versioning to express version numbers, the commit count can be used to produce the build number – the third element of a version. So what about the major and minor numbers? One argument you can pass to git log is a starting ref from which to list commits. When you decide to increment the major or minor build number, you want to tag your repository with those numbers:

git tag 2.3

So now you want your build numbers to reset to 0 starting from the commit being tagged. You can do this by by telling git’s log command to list all commits from that tag forward like so:

git log 2.3.. --pretty=oneline some_directory/ 

If you were to run this just after tagging your repo with the major and minor versions, you would get 0 commits and thus the semantic version would be 2.3.0. So you will need to give thought to incrementing major and minor build number but the final element just happens upon commit.

Benefits and downsides to this strategy

Before getting into the details of applying this to chef artifacts, lets briefly examine some pros and cons of this technique.

Upsides

Any change to a versionable artifact will result in a unique build number and if two builds have the same contents, their build numbers will be the same

This is crucial especially if you need to communicate with customers or fellow team members regarding features or bugs. This can help to remove confusion and ensure you are discussing the same build. If you are using a bug tracking system, you will want to include this version in the bug report so other team members reviewing the bug can checkout that version from source control or review all changes made since that version was committed.

Builds can be produced independently of a separate build server

Especially for solo/side projects where you may not even have a build server, this can help you create deterministic build numbers. However even if your project’s authoritative builds are produced by a system like Jenkins or TeamCity, individual team members can produce their own builds and produce the same build numbers generated by your build server (assuming the build server is using this strategy). Of course the number may vary slightly if other team members have produced commits and have not yet pushed to your shared remote or if the build is performed without pulling the latest changes. That’s why you also want to include the current sha1 somewhere in your artifact. More on that later.

Allows you to separately version different artifacts in your repository

Especially if your chef repository houses multiple cookbooks and you freeze your cookbook versions or use version constraints in your environments, this can be very important. I want to know that any change to a cookbook will increment the version and if the cookbook has remained unchanged, its version should be the same.

Downsides

There will be gaps in your build numbers

You will likely commit several times between builds. So two subsequent builds with say 5 commits in between will increment the build number by 5. This should not be an issue as long as your team is aware of this. However, if you consider sequential build numbers important as a customer facing means to communicate change, this could be an issue. I have used this technique on a couple of fairly popular OSS projects and I never had an issue with users or contributors stumbling on this.

Build numbers can get big

If you rarely increment the major or minor build numbers, this will surely happen over time. I try to increment the minor number on any feature enhancing release in which case this is not usually an issue.

If build agents cannot talk to git

If you are using a centralized build server and if this is a collaborative project you certainly should be, you definitely want the builds produced by your build server to follow this same strategy. In order to do that, you want to configure your build server to delegate the git pull to the build agents. Otherwise, the git log commands will not work. The build agent must have an actual git repo with the .git folder available to see the commit counts.

Applying this to chef artifacts

First, what do I mean by “chef artifacts?” Don’t I really mean cookbooks? No. While cookbooks are certainly included and are the most important artifact to version, I also want to version environment and data_bag files. If I used roles, I would version those too. Regardless of the fact that cookbooks are the only entity that has first class versioning support on a chef server, I should be able to pin these artifacts to their specific git commit. Also, I may change environment or data_bag files several times before uploading to the server and I may want to choose a specific version to upload. If you add cookbook version constraints to your environments, any dependency change will result in a version bump to your environment and your environment version may serve as a top level repository version.

Stamping the artifact

So what gets stamped where? For cookbooks this is obvious. The version string in metadata.rb will have the generated version applied. For environment and data_bag files, we create a new json element in the document:

{
  "name": "test",
  "chef_type": "environment",
  "json_class": "Chef::Environment",
  "override_attributes": {
    "environment_parent": "QA",
    "version": "1.0.24",
    "sha1": "c53bdaa92d67bea151928cdff10a8d5e634ec880"
  },
  "cookbook_versions": {
    "apt": "2.6.0",
    "build-essential": "2.0.6",
    "chef-client": "3.7.0",
    "chef_handler": "1.1.6",
    "clc_library": "1.0.20",
    "cron": "1.5.0",
    "curl": "2.0.0",
    "dmg": "2.2.0",
    "git": "4.0.2",
    "java": "1.28.0",
    "logrotate": "1.7.0",
    "ms_dotnet4": "1.0.2",
    "newrelic": "2.0.0",
    "platform_couchbase": "1.0.31",
    "platform_elasticsearch": "1.0.40",
    "platform_environment": "1.0.1",
    "platform_haproxy": "1.0.36",
    "platform_keepalived": "1.0.4",
    "platform_octopus": "1.0.13",
    "platform_rabbitmq": "1.0.33",
    "platform_win": "1.0.71",
    "provisioner": "1.0.209",
    "queryme": "1.0.2",
    "runit": "1.5.10",
    "windows": "1.34.2",
    "yum": "3.3.2",
   "yum-epel": "0.5.1"
  }
}

I add the version as an override attribute since you cannot add new top level keys to environment files. However for data_bag files I do insert the version as a top level json key.

Including the sha1

You may have noticed that the environment file displayed above has a sha1 attribute just below the version. Every commit in git is identified by a sha1 hash that uniquely identifies it. While the version number is a human readable form of expressing changes and can still be used to find the specific commit in git that produced the version, having the sha1 included with the version makes it much easier to track down the specific git commit. I can simply do a:

git checkout <sha1>

This will update my working directory to match all code exactly as it was when that version was commited. If you report problems with a cookbook and can give me this sha1, I can bring up its exact code in seconds.

As we have already seen, the sha1 is stored in a separate json attribute for environment and data_bag files. For cookbook metadata.rb file, I add this as a comment to the end of the file:

name        'platform_haproxy'
maintainer  'CenturyLink Cloud'
license     'All rights reserved'
description 'Installs/Configures haproxy for platform'
version     '1.0.36'

depends     'platform_keepalived'
depends     'newrelic'
#sha1 'c53bdaa92d67bea151928cdff10a8d5e634ec880'

Bringing all of this together with automation

At CenturyLink Cloud, we are using this strategy for our own chef versioning. I have been working on a separate “promote” gem that oversees our delivery pipeline of chef artifacts. This gem exposes rake tasks that handle the versioning discussed in this post as well as the process of constraining cookbook versions in various qa and production environments and uploading these artifacts to the correct chef server. The rake tasks tie in to our CI server so that the entire rollout is automated and auditable. I’ll likely share different aspects of this gem in separate posts. It is not currently open source, but I can certainly share snippets here to give you an idea of how this generally works.

Our Rakefile loads in the tasks from this gem like so:

config = Promote::Config.new({
  :repo_root => TOPDIR,
  :node_name => 'versioner',
  :client_key => File.join(TOPDIR, ENV['versioner_key']),
  :chef_server_url => ENV['server_url']
  })
Promote::RakeTasks.new(config)
task :version_chef => [
  'Promote:version_cookbooks', 
  'Promote:version_environments', 
  'Promote:version_data_bags'
]

so rake version_chef will stamp all of the necessary artifacts with their appropriate version and sha1. The code for versioning an individual cookbook looks like this:

def version_cookbook(cookbook_name)
  dir = File.join(config.cookbook_directory, cookbook_name)
  cookbook_name = File.basename(dir)
  version = version_number(current_tag, dir)
  metadata_file = File.join(dir, "metadata.rb")
  metadata_content = File.read(metadata_file)
  version_line = metadata_content[/^\s*version\s.*$/]
  current_version = version_line[/('|").*("|')/].gsub(/('|")/,"")

  if current_version != version
    metadata_content = metadata_content.gsub(current_version, version)
    outdata = metadata_content.gsub(/#sha1.*$/, "#sha1 '#{sha1}'")
    if outdata[/#sha1.*$/].nil?
      outdata += "#sha1 '#{sha1}'"
    end
    File.open(metadata_file, 'w') do |out|
      out << outdata
    end
    return { 
      :cookbook => cookbook_name, 
      :version => version, 
      :sha1 => sha1}
  end
end

def version_number(current_tag, ref)
  all = git.log(10000).object(ref).between(current_tag.sha).size
  bumps = git.log(10000).object(ref).between(current_tag.sha).grep(
    "CI:versioning chef artifacts").size
  commit_count = all - bumps
  "#{current_tag.name}.#{commit_count}"
end

This uses the git ruby gem to interact with git and plops in the version and sha1 into metadata.rb. Note, that we exclude all commits labeled “CI:versioning chef artifacts.” After our CI server runs this task, it commits and pushes the changes back to git. We don’t want to include this commit in our versioning. We also adjust our CI version control trigger to filter out this commit from commits that can initiate a build otherwise we would end up in an infinite loop of builds.

Adding a Berkshelf sync

After we generate the new versions but before we push the versions back to git we want to sync up our Berksfile.lock files so we run this:

cookbooks = Dir.glob(File.join(config.cookbook_directory, "*"))
cookbooks.each do |cookbook|
  berks_name = File.join(
    config.cookbook_directory, 
    File.basename(cookbook), 
    "Berksfile")
  if File.exist?(berks_name)
    Berkshelf.set_format :null
    berksfile = Berkshelf::Berksfile.from_file(berks_name)
    berksfile.install
  end
end

This ensures that the CI commit includes up to date Berksfile.lock files that may very well have changed due to the version changes in cookbooks that depend on one another. This will also be necessary in generating the environment cookbook constraints but that will be covered in a future post.

Thoughts?

I realize this is not how most version their chef artifacts or non chef artifacts for that matter. I know many folks use knife spork bump. You can certainly leverage spork with this strategy as well but just provide the git generated version instead of letting spork auto increment. This versioning strategy has proven itself to be very convenient for me on non chef projects. I’d be curious to get feedback from others on this technique. Any obvious or subtle pitfalls you see?