Cluster building

23 Jun 2010 Cluster building

With all of Bioteam’s talk about cloud computing, I was a bit surprised to find myself building an honest-to-goodness non-cloud, non-virtual compute cluster a couple of weeks ago. There were wires, blinking lights, whirring fans, circuit breakers, and all sorts of messy real-world details to contend with. The servers were heavy and unwieldy, and I made use of the roll of athletic tape in my cluster-build bag to cover the inevitable nicks and cuts in my fingers that accumulate over a couple of days of slinging metal.

We deployed the first incarnation of this system in 2004. At that time, it was a homogenous system built from single CPU G4 Xserve machines from Apple. I believe that it ran OS X 10.2 Server, installed via NetBoot from a portal that was configured by copying a bootable OS image from a USB drive.

In the ensuing six years, it has been in near constant use as a BLAST farm for the department of environmental engineering at MIT. We’ve upgraded it with each incarnation of the XServe as they came out – and when we ran out of physical space in the co-location facility (hard limit of three racks), we started rolling in new machines by ousting the oldest ones. These old servers moved to a corner of a wet lab to serve as a development cluster.

The system is – to some extent – a crazy quilt. It’s cobbled together, running three different major versions of OS X. We’ve been wanting to upgrade the ethernet backplane for about four years now – but somehow it’s never been important enough to actually do. There are separate NFS servers for BLAST databases and home directories. A small pile of scripts integrate with Sun Grid Engine to ensure that data-staging and software updates do not collide with running jobs. On the other hand, this system has cranked out a ridiculous amount of scientific analysis.

I had a blast. At the end of two days of work, I re-enabled the queues and had the satisfaction of watching all the little blue lights spin up almost immediately, indicating that user jobs were flowing out onto the hardware.

It reminds me, to some extent, of Admiral Hyman G. Rickover’s famous 1953 quote, to a congressional hearing, about the difference between “paper” and “real” nuclear reactors:

An academic reactor or reactor plant almost always has the following basic characteristics: (1) It is simple. (2) It is small. (3) It is cheap. (4) It is light. (5) It can be built very quickly. (6) It is very flexible in purpose. (7) Very little development will be required. It will use off-the-shelf components. (8) The reactor is in the study phase. It is not being built now.

On the other hand a practical reactor can be distinguished by the following characteristics: (1) It is being built now. (2) It is behind schedule. (3) It requires an immense amount of development on apparently trivial items. (4) It is very expensive. (5) It takes a long time to build because of its engineering development problems. (6) It is large. (7) It is heavy. (8) It is complicated.

The systems engineering required to keep pace with biology these days still falls in the “heavy and complicated” category. Under the hood of the virtual, there must always be the real. It’s therefore important that someone on the team break out the athletic tape and the power driver from time to time. It keeps us honest.

Related Posts
Filter by
Post Page
Presentations Employee Posts Screencasts Featured News Publications
Sort by

StarCluster on AWS Spot Instances Screencast

This is the same screencast that was recently posted up on the Amazon AWS “
2011-09-22 19:46:50

22

2004: Mac OS X Clustering for Informatics

Exploring the building, administration and use of clusters for the performance of scientific research. Bill Van
2004-12-01 12:43:30

22

No Comments

Post A Comment