Why you should never build a backblaze pod

August 24, 2011
chrisdag
Article

BioTeam’s Backblaze 2.0 Project – 135 Terabytes for $12,000

Part I – Why you should never build a Backblaze pod (this post)
Part II – Why we built a Backblaze pod
Part III – Our real-world Backblaze pod costs
Part IV – Backblaze pod assembly & integration pictures
Part V – Backblaze Initial Performance Data
Part VI – Backblaze pod software & configuration (future post)
Part VII – Backblaze pod ongoing impressions (future post)

Backblaze? WTF is that?

If you have never heard of the backblaze storage pod, stop reading now and go to this link: blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/. In short, the clever and cool people at online backup company Backblaze have effectively “open sourced” their custom hardware design that allows them to acquire & deploy monstrous storage capacity at a low enough price point to sustain their “$3.96/month unlimited online backups” business model.

The backblaze pod is essentially a stripped down 4U storage server with internal space for 45 internal disk drives (and a single OS disk). At 3TB per drive you can get 135 terabytes of raw capacity into a single storage “pod”. Backblaze says their hardware costs are a little over $7,000 for this configuration but as we will show in a future post this is not exactly realistic for others who wish to follow in their footsteps. As a low-volume customer, our actual cost for building a single 135TB pod was roughly $12,000 USD.

Anyone familiar with the cost of 100TB+ storage arrays from Tier 1 or Tier 2 storage vendors will realize what a shockingly low price this is. Even at $12,000 for 135TB the cost is low enough that the system deserves (and has gained) significant attention from the community.
Kudos need to be given to the backblaze crew for making this information available – most other companies would treat this as confidential “secret sauce” information that was critical to business plans and profitability.

Backblaze has apparently realized, however, that the actual “secret sauce” in their infrastructure is not the hardware … it’s the internally developed “cloud storage” software layer they use to tie all their pods into a highly-available and easy to manage system. You will note that Backblaze does not speak in detail about their software practices!

Why you should never build a backblaze pod

This section is meant to be tongue-in-cheek since it should be clear by now that we have actually gone out and built one of these things (and had a blast doing it, pictures to follow …).
That said, however, we do need to issue a sober warning here:

You could lose valuable data
You could screw coworkers, projects, and clients
You could lose your job

I’ve been in the HPC/informatics world for quite some time now and am lucky enough to be friends with people who are way smarter and more accomplished than me. Quite a few of whom, when I told I was beginning this project, tried to warn me off. In particular:

A fantastic guy who happens to be in a very senior IT position at a major top-tier research university has promised epic tales of storage and scientific data-loss disasters caused by labs and students trying the “DIY” storage pod approach
An HPC vendor who has been called in more than once to clean up after backblaze-related disasters. He has verifiable tales of folks losing their jobs over projects like this

What are the risks?

The backblaze storage pod was designed for a very specific use case that is not a great fit for more generic usage. A quick glance at the design plans will tell you:

The system uses a single disk for hosting the operating system
The system requires 2 power supplies to operate; both must be active and there is no redundancy, spare, or failover unit
The system has no hardware RAID capability
The system only has 2 GigE network interfaces
To access/replace a disk drive, you need to remove 12 screws
To access/replace a disk drive, you need to remove the top cover
If you build this yourself, you will be required to create custom wiring harnesses
Any monitoring or health status reporting tools will have to be built, installed and configured by hand

Simply put this box has no “highly available” features and any sort of significant maintenance on it will almost certainly require the system to be taken offline and possibly even powered down. You also need to mount this unit on extremely heavy-duty rack rails OR put it on a shelf and leave about 12 inches of top clearance free if you want to easily be able to pop the top cover off to get at the drives.
This is cheap storage, not fast storage and certainly not highly-available storage. It carries a far higher operational and administrative burden than storage arrays traditionally sold into the enterprise. Scary, huh? My main goal with this blog post is to ensure that readers considering this approach are fully aware of the potential risks.

Why the folks at Backblaze don’t care about the “risks”

Short answer: They solve all reliability, availability, and operational concerns by operating a bunch of pods simultaneously with a proprietary cloud software layer that handles data movement and multi-pod data replication. To them, a storage pod is a single FRU (field replaceable unit) and they don’t really need to spend significant amount of time and attention on any single pod.
Long answer:

Backblaze does not care about reliability of single pods. They engineer around hardware and data reliability concerns by using many pods and custom software
Backblaze does not care about downtime for single pods. Customer data is stored on multiple pods, allowing individual pods to break or otherwise be taken offline for maintenance or replacement
Backblaze does not care about performance of single pods. They have openly stated that their only performance metric is “can we saturate the GigE link as we load a pod with data”
Backblaze has an unusual duty cycle. A normal backblaze pod is only “active” for the first few weeks of it’s life as it slowly fills to capacity with customer backup data. After a pod is “full” the system sits essentially idle while it waits for (much less frequent) client restore requests.
Backblaze does not care about operational burden. Via their custom software and use of many pods at once Backblaze has built an infrastructure that requires very little effort in the datacenter. It looks like a few days a week are spent deploying new pods and I’m guessing that failing pods are “drained” of data and then pulled out to be totally rebuilt or refreshed. Backblaze does not have to dink around trying to debug single-drive failures within individual pods.

Ok enough scare tactics!

Hopefully we’ve covered enough “IT sobriety” to make sure that people reading these posts understand the pros and cons of this approach. If you are considering trying something like this it is essential to do your own research and due diligence.

In our next few blog posts we’ll talk about why we DID decide to build one of these pods and what our experiences were.

BioTeam updates, delivered.

Have Questions?

We'd love to help.

Work WIth Us

Related Insights

Cloud

Why you should never build a backblaze pod

Backblaze? WTF is that?

Why you should never build a backblaze pod

What are the risks?

Why the folks at Backblaze don’t care about the “risks”

Ok enough scare tactics!

Share:

Newsletter

Have Questions?

Related Insights

CryoSPARC v5 crashing daily?

PCS or ParallelCluster for Schrödinger Suite?

Research at Scale: Architecting HPC Infrastructure for a Leading Academic Medical Center