Dude, you got some Chef in my StarCluster!

10 Mar 2011 Dude, you got some Chef in my StarCluster!

Dude, who put the Chef in my StarCluster?

First things first …

Everything I’m writing about here was done by BioTeam’s own Adam Kraut – our infrastructure orchestration, cloud, HPC and ruby ninja. As I slowly degrade into a pile of useless management sludge, it’s basically Adam who is shouldering the load of doing cool and clever stuff for which I am ever grateful.

Integrating Chef with MIT’s StarCluster

Huh?

In this post we plan to demo how we use MIT StarCluster to deploy self-organizing GridEngine clusters in the cloud in a way that also lets us use a different system (“Chef“) to customize, control and install additional software.

OpsCode Chef

Chef is a really cool system for doing “systems orchestration” that BioTeam has been evangelizing & using for a while now. We use it to bootstrap cloud server nodes and turn them into whatever we need at the time – anything from installing a simple scientific software package up through assembling and controlling a complex scientific workflow setup. It has a nice Web interface as well as a fantastic command line client (“knife”) which as nerds, we wholeheartedly approve of.

Tools like Chef remove the need to hand-craft cloud server nodes stuffed with unique software, configurations and settings. If we ignore ALL the other features and advantages of Chef we still come out ahead with this feature alone.

Related blog post: “Coffeshop Cloud Orchestration” — another video showing Chef in action bootstrapping an Amazon cloud node.

MIT StarCluster

StarCluster does what BioTeam used to do, only better, slicker and with more committed developer resources, heh. It’s an open source stack written in Python that builds fully functional and ready-to-use Grid Engine clusters inside the Amazon Web Services Elastic Compute Cloud. StarCluster goes way beyond just installing Grid Engine – it comes preloaded with working MPI, scientific computing applications and NFS filesharing between cloud nodes. It also has nice features for automatically creating, attaching and sharing EBS volumes that might contain your applications, home directories and input/output data. It’s a great system and getting better all the time.

The StarCluster team took an idea that we had been fooling around with since 2008 and actually turned it into something that lots of people can use, customize and extend. In my 2008 and earlier experiments & talks about using SGE on the cloud I never really got beyond developing simple methods and scripts that suited only my needs.

Why combine Chef & StarCluster?

It’s really the best of both worlds for us. We love StarCluster because they’ve done all the hard work in sorting out how to launch, control and deal with self-organizing Grid Engine clusters on the cloud.

At the same time, however, we are working on so many projects and doing so many things that we really can’t bake ‘our stuff‘ into static EC2 server AMI images or even EBS volumes. Too much of an operational and maintenance headache.

Another example — Customer A may only want to deal with CentOS Linux while Customer B might be a Debian or Ubuntu fan. Chef makes us agnostic when it comes to most OS platforms — we simply don’t care and don’t have to manage lots of different OS and distro variants.

We prefer to write reusable Chef “recipes” and “cookbooks” for what we need and store those cookbook files under a source code management system like git or SVN so we have change-control and an audit history.

Our “infrastructure” is managed and treated like source code. It’s a thing of beauty.

Whatever our needs are — from installing the Illumina analysis pipeline to uploading a selective set of SSH keys for a client to login to a certain node — Chef can handle it with ease.

Screencast Video: Bootstrapping StarCluster with Chef

What does the video actually show?

  • Adam Kraut using Starcluster on his laptop to fire up a single-node Grid Engine system on EC2
  • StarCluster does all the hard work of standing up and configuring the cluster
  • Once the Grid Engine system is up, Adam uses the Chef “knife” command to remotely install the Chef-client software and appropriate configuration files and encryption certificates.
  • The end result is a Grid Engine cluster assembled by StarCluster while still being fully-managed Chef clients that we can make dance to our tune

A couple of things about this recording:

  • It’s hard to get the terminal, recording and streaming settings correct for a terminal window that needs a wide area in which to output text. For the embedded youtube.com video below we recommend adjusting the settings to display at 720p “HD” resolution. Running it full-screen or expanded will also make the text more readable.
  • If the text is still unreadable, you can directly download the original 30MB quicktime .MOV file at this link

Below the video…

I’ll upload some additional static screenshots showing “knife” etc. in action…

Additional Screenshots

Chef 1
Partial view of the Chef web interface
Chef 2
Some simple Chef command-line interaction using the “knife” utility
Chef 3
Knife & Chef understand “the cloud”, in this case we are asking the Chef server to list all EC2 nodes under management
Chef 4
Running the chef-client software on a cloud node. This node has already processed it’s “run list” so there are no more activities it needs to do
Chef 5
StarCluster command line client example
No Comments

Post A Comment