Why you should never build a backblaze pod

Backblaze 2.0 w/ 45 disks installed - 3
BioTeam’s Backblaze 2.0 Project – 135 Terabytes for $12,000

Backblaze? WTF is that?

If you have never heard of the backblaze storage pod, stop reading now and go to this link: blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/. In short, the clever and cool people at online backup company Backblaze have effectively “open sourced” their custom hardware design that allows them to acquire & deploy monstrous storage capacity at a low enough price point to sustain their “$3.96/month unlimited online backups” business model.

The backblaze pod is essentially a stripped down 4U storage server with internal space for 45 internal disk drives (and a single OS disk). At 3TB per drive you can get 135 terabytes of raw capacity into a single storage “pod”. Backblaze says their hardware costs are a little over $7,000 for this configuration but as we will show in a future post this is not exactly realistic for others who wish to follow in their footsteps. As a low-volume customer, our actual cost for building a single 135TB pod was roughly $12,000 USD.

Anyone familiar with the cost of 100TB+ storage arrays from Tier 1 or Tier 2 storage vendors will realize what a shockingly low price this is. Even at $12,000 for 135TB the cost is low enough that the system deserves (and has gained) significant attention from the community.
Kudos need to be given to the backblaze crew for making this information available – most other companies would treat this as confidential “secret sauce” information that was critical to business plans and profitability.

Backblaze has apparently realized, however, that the actual “secret sauce” in their infrastructure is not the hardware … it’s the internally developed “cloud storage” software layer they use to tie all their pods into a highly-available and easy to manage system. You will note that Backblaze does not speak in detail about their software practices!

Why you should never build a backblaze pod

This section is meant to be tongue-in-cheek since it should be clear by now that we have actually gone out and built one of these things (and had a blast doing it, pictures to follow …).
That said, however, we do need to issue a sober warning here:

  • You could lose valuable data
  • You could screw coworkers, projects, and clients
  • You could lose your job

I’ve been in the HPC/informatics world for quite some time now and am lucky enough to be friends with people who are way smarter and more accomplished than me. Quite a few of whom, when I told I was beginning this project, tried to warn me off. In particular:

  • A fantastic guy who happens to be in a very senior IT position at a major top-tier research university has promised epic tales of storage and scientific data-loss disasters caused by labs and students trying the “DIY” storage pod approach
  • An HPC vendor who has been called in more than once to clean up after backblaze-related disasters. He has verifiable tales of folks losing their jobs over projects like this

What are the risks?

The backblaze storage pod was designed for a very specific use case that is not a great fit for more generic usage. A quick glance at the design plans will tell you:

  • The system uses a single disk for hosting the operating system
  • The system requires 2 power supplies to operate; both must be active and there is no redundancy, spare, or failover unit
  • The system has no hardware RAID capability
  • The system only has 2 GigE network interfaces
  • To access/replace a disk drive, you need to remove 12 screws
  • To access/replace a disk drive, you need to remove the top cover
  • If you build this yourself, you will be required to create custom wiring harnesses
  • Any monitoring or health status reporting tools will have to be built, installed and configured by hand

Simply put this box has no “highly available” features and any sort of significant maintenance on it will almost certainly require the system to be taken offline and possibly even powered down. You also need to mount this unit on extremely heavy-duty rack rails OR put it on a shelf and leave about 12 inches of top clearance free if you want to easily be able to pop the top cover off to get at the drives.
This is cheap storage, not fast storage and certainly not highly-available storage. It carries a far higher operational and administrative burden than storage arrays traditionally sold into the enterprise. Scary, huh? My main goal with this blog post is to ensure that readers considering this approach are fully aware of the potential risks.

Why the folks at Backblaze don’t care about the “risks”

Short answer: They solve all reliability, availability, and operational concerns by operating a bunch of pods simultaneously with a proprietary cloud software layer that handles data movement and multi-pod data replication. To them, a storage pod is a single FRU (field replaceable unit) and they don’t really need to spend significant amount of time and attention on any single pod.
Long answer:

  • Backblaze does not care about reliability of single pods. They engineer around hardware and data reliability concerns by using many pods and custom software
  • Backblaze does not care about downtime for single pods. Customer data is stored on multiple pods, allowing individual pods to break or otherwise be taken offline for maintenance or replacement
  • Backblaze does not care about performance of single pods. They have openly stated that their only performance metric is “can we saturate the GigE link as we load a pod with data”
  • Backblaze has an unusual duty cycle. A normal backblaze pod is only “active” for the first few weeks of it’s life as it slowly fills to capacity with customer backup data. After a pod is “full” the system sits essentially idle while it waits for (much less frequent) client restore requests.
  • Backblaze does not care about operational burden. Via their custom software and use of many pods at once Backblaze has built an infrastructure that requires very little effort in the datacenter. It looks like a few days a week are spent deploying new pods and I’m guessing that failing pods are “drained” of data and then pulled out to be totally rebuilt or refreshed. Backblaze does not have to dink around trying to debug single-drive failures within individual pods.

Ok enough scare tactics!

Hopefully we’ve covered enough “IT sobriety” to make sure that people reading these posts understand the pros and cons of this approach. If you are considering trying something like this it is essential to do your own research and due diligence.

In our next few blog posts we’ll talk about why we DID decide to build one of these pods and what our experiences were.

https://farm7.static.flickr.com/6183/6074469950_21cda1897f.jpg

Share:

Newsletter

Get updates from BioTeam in your inbox.