Why you should never build a backblaze pod

Backblaze 2.0 w/ 45 disks installed - 3

BioTeam’s Backblaze 2.0 Project – 135 Terabytes for $12,000

Backblaze? WTF is that?

If you have never heard of the backblaze storage pod, stop reading now and go to this link:

http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

… in short, the clever & cool people at online backup company Backblaze have effectively “open sourced” their custom hardware design that allows them to acquire & deploy monstrous storage capacity at a low enough price point to sustain their “$3.96/month unlimited online backups” business model.

The backblaze pod is essentially a stripped down 4U storage server with internal space for 45 internal disk drives (and a single OS disk). At 3TB per drive you can get 135 terabytes of raw capacity into a single storage “pod”. Backblaze says their hardware costs are a little over $7,000 for this configuration but as we will show in a future post this is not exactly realistic for others who wish to follow in their footsteps. As a low-volume customer, our actual cost for building a single 135TB pod was roughly $12,000 USD.

Anyone familiar with the cost of 100TB+ storage arrays from Tier 1 or Tier 2 storage vendors will realize what a shockingly low price this is. Even at $12,000 for 135TB the cost is low enough that the system deserves (and has gained) significant attention from the community.

Kudos need to be given to the backblaze crew for making this information available – most other companies would treat this as confidential “secret sauce” information that was critical to business plans and profitability.

Backblaze has apparently realized, however, that the actual “secret sauce” in their infrastructure is not the hardware … it’s the internally developed “cloud storage” software layer they use to tie all their pods into a highly-available and easy to manage system. You will note that Backblaze does not speak in detail about their software practices!

Why you should never build a backblaze pod

This section is meant to be tongue-in-cheek since it should be clear by now that we have actually gone out and built one of these things (and had a blast doing it, pictures to follow …).

That said, however, we do need to issue a sober warning here:

  • You could lose valuable data
  • You could screw coworkers, projects and clients
  • You could lose your job

I’ve been in the HPC/informatics world for quite some time now and am lucky enough to be friends with people who are way smarter and more accomplished than I. Quite a few of whom, when I told I was beginning this project tried to warn me off… In particular:

  • A fantastic guy who happens to be in a very senior IT position at a major top-tier research university has promised epic tales of storage and scientific data-loss disasters caused by labs and students trying the “DIY” storage pod approach
  • An HPC vendor who has been called in more than once to clean up after backblaze-related disasters. He has verifiable tales of folks losing their jobs over projects like this

What are the risks?

The backblaze storage pod was designed for a very specific use case that is not a great fit for more generic usage. A quick glance at the design plans will tell you:

  • The system uses a single disk for hosting the operating system
  • The system requires 2 power supplies to operate, both must be active and there is no redundancy, spare or failover unit
  • The system has no hardware RAID capability
  • The system only has 2 GigE network interfaces
  • To access/replace a disk drive you need to remove 12 screws
  • To access/replace a disk drive you need to remove the top cover
  • If you build this yourself totally DIY you will be required to create custom wiring harnesses
  • Any monitoring or health status reporting tools will have to be built, installed and configured by hand

Simply put this box has no “highly available” features and any sort of significant maintenance on it will almost certainly require the system to be taken offline and possibly even powered down. You also need to mount this unit on extremely heavy-duty rack rails OR put it on a shelf and leave about 12 inches of top clearance free if you want to easily be able to pop the top cover off to get at the drives.

This is cheap storage, not fast storage and certainly not highly-available storage. It carries a far higher operational and administrative burden than storage arrays traditionally sold into the enterprise.

Scary huh? My main goal with this blog post is to ensure that readers considering this approach are fully aware of the potential risks.

Why the folks at Backblaze don’t care about the “risks”

Short answer: They solve all reliability, availability and operational concerns by operating a bunch of pods simultaneously with a proprietary cloud software layer that handles data movement and multi-pod data replication. To them, a storage pod is a single FRU (field replaceable unit) and they don’t really need to spend significant amount of time and attention on any single pod.

Long answer:

  • Backblaze does not care about reliability of single pods. They engineer around hardware and data reliability concerns by using many pods and custom software
  • Backblaze does not care about downtime for single pods. Customer data is stored on multiple pods, allowing individual pods to break or otherwise be taken offline for maintenance or replacement
  • Backblaze does not care about performance of single pods. They have openly stated that their only performance metric is “can we saturate the GigE link as we load a pod with data”
  • Backblaze has an unusual duty cycle. A normal backblaze pod is only “active” for the first few weeks of it’s life as it slowly fills to capacity with customer backup data. After a pod is “full” the system sits essentially idle while it waits for (much less frequent) client restore requests.
  • Backblaze does not care about operational burden. Via their custom software and use of many pods at once Backblaze has built an infrastructure that requires very little effort in the datacenter. It looks like a few days a week are spent deploying new pods and I’m guessing that failing pods are “drained” of data and then pulled out to be totally rebuilt or refreshed. Backblaze does not have to dink around trying to debug single-drive failures within individual pods.

Ok enough scare tactics!

Hopefully we’ve covered enough “IT sobriety” to make sure that people reading these posts understand the pros and cons of this approach. If you are considering trying something like this it is essential to do your own research and due diligence.

In our next few blog posts we’ll talk about why we DID decide to build one of these pods and what our experiences were.

http://farm7.static.flickr.com/6183/6074469950_21cda1897f.jpg

Filed Under: Employee PostsTech Notes

Tags: , , ,

About the Author

Chris is an infrastructure geek specializing in the applied use of IT to enable and enhance scientific research in life science informatics environments.

Comments (29)

Trackback URL | Comments RSS Feed

  1. Angel says:

    loving the series of posts. Hadn’t seen openfiler before. Good lead.

  2. ddorian says:

    what about installing openstack swift and using backblaze pods as storage units and keep 3 copies of data?

  3. Apps says:

    I am just wondering if this monster is as reliable as each one of these hard disks?

    • chrisdag says:

      So far our client is very happy with it and it’s been online and operating for a while now. I’ll get a chance to visit it in person later on this week. The client using this thing is well aware of the risks (I gave them the big scary story at the beginning to set their expectations) and we’ve really only done a few minor things to hedge against problems – they have already done multiple “practice” drive replacements to see what the process involves. They have 3x spare drives sitting on a shelf and I believe they have an order out for a spare power supply as well. The only thing they are not protected against right now would be a mainboard failure or some type of issue with the wiring harnesses or PCI SATA expansion cards… -Chris

  4. Nyohati says:

    The point is software distributed storage models.
    you can set these servers up in tandems, or have some spare parts
    and your fine.

    I’ve seen SAN storage controllers breaking, so if here a main board cracks thats the same to me. but these mainboards are a lot cheaper then the san hardware.

    JBOD is the future and the end of SAN

  5. Jan says:

    I think it depends on where you get the 3T drives (http://www.verkkokauppa.com/fi/product/8704/dbhff/Western-Digital-Caviar-Green-3-TB-SATAIII-64-MB), if i check the price here in Finland i can buy 3T drives for 136,90e each so 45 is 6160,50e = 8 062,43 USD. Not sure how to calculate the frame + power price.

    -Jan

  6. john Moore says:

    You guys are stupid having your os on a second hard drive, why not put vmware esxi onto a usb pen and plug that into motherboard you will save electricity costs by not running another drive plus your read/write speed would be quicker.

  7. Ryan says:

    Great series of well-written posts! At the end of each section I would tend to have one question on my mind, and your next section always addressed it.

  8. TheSysOp says:

    If this is just insurance vs. having to redownload, and dirt cheap is the primary goal. I’m curious if you guys considered a tape drive autoloader. It’s the old school way of solving this problem. Tapes are content to sit idle in their magazines, and it’s a lot easier than maintaining 45 “always on” spindles.

    Doing the math on HDD failure rates you may be in for a lot of maintenance. Desktop drives aren’t rated for continuous use. Optimistically you’re looking at a 2-3 year lifespan on average. So plan on replacing a drive every month over the life of the box.

  9. Michael says:

    This shouldn’t really be called “Why you should never build a pod” it should be called “Make sure you understand what you are and aren’t getting when you build a pod”

  10. Jacob says:

    There are two problems with role your own storage systems:

    1) Rotational vibration – this is what separates men from boys in chassis design. All those drives in one place spinning around, makes for vibration which wears and tears on the drives. Try touching the stem of your electric toothbrush to your teeth and you will know what I mean.

    2) Generel error handling — hard drives vary a lot in the their firmware and the way they process errors. When new drives come to market, enterprise storage vendors spend months on QA, tweaking their own code, and going back to the hard drive makers for firmware tweaks. when you role your own, you loose out on this vital process.

    I have not played with BackBlaze myself so I’m looking forward to the rest of Chris’ testing. My guess is that no matter how bad it is, you could always use an object-based storage system, using redundancy of objects to ensure against data loss. Ideally you you would also use hashing/checksums to verify the integrity of the data. Three BackBlaze boxes are still probably less money than one big shiney something else.

    • Buddy Farr says:

      Jacob – the design of the backblaze pod includes vibration rubber footings to help handle the vibration of the drives.

  11. Jesse says:

    Great posts. We just built one of these (by built I mean bought from Protocase and filled it with drives) because the cost of enterprise SAN storage is ridiculous. For comparison the cost to build 4 of these filled with drives is still less than the chassis alone with some other vendors (not named on purpose) before you even get into drives and licensing.

    I have been testing Openfiler(v2.99) vs FreeNAS(v8.0) for iSCSI and NFS sharing and have had some great success. One thing to note about Openfiler though is that the Intel e1000e NIC drivers included with the rPath Linux install need to be upgraded or they simply won’t work.

    Other things of note – this is really meant to be filled up with disks upon building. As mentioned there are a bunch of screws and a cover to remove before you can get at the drives so you really don’t want to “add drives to it later” if you can avoid it. Also as mentioned you will probably want to add another NIC in the chassis for link aggregation or at the very least to separate management from storage interfaces.

    These are great when used appropriately but in a business environment it is really difficult to justify some of the short comings – especially the drive accessibility – unless you are doing what backblaze did which is to build so many of these that no single entity is critical. That is not to say the concept is wrong – I just mean that spending extra money on a Supermicro chassis (like this one http://www.supermicro.com/products/chassis/4U/?chs=847) with accessible drive bays and redundant power may get you more bang for your buck in the long run.

    Overall this is a great project if, as others have stated, you have the right use case for it. I would recommend this for home/personal use (think media library) and for non-production business use cases – or at least non-critical applications like archive storage. In business production I would want at least redundant power supplies and accessible drive bays.

  12. Jiriki says:

    Why can we not get postings like this for all vendor software and hardware. Up-fronting the risks and quirks at this level is awesome, but wish there was a place that did it for mainstream products and software!

  13. Matt says:

    I am curious if you will be releasing part 6 of this post. I am in the configuring stage and could use some pointers.

  14. DJI says:

    Yup. JBOD is the future (are we back to 1985? Back to tape too!! Yipee!!). I don’t see how anyone would ever consider this for critical research data. Is losing your job worth the cost benefits? I just don’t get it…

    • Buddy Farr says:

      JBOD, not in my work. I need the redundancy of 24/7 operation. I can’t tell my users that their data will be offline until I can get the replacement drives in and do a restore. I would rather it run a little slower until the replacement drive is in and rebuilt.

  15. John says:

    One word zfs, this is essentially a copy of a sun fire x4500 thumper meant for zfs use.

  16. Bryan says:

    Most of these risks are easily over come

    The system uses a single disk for hosting the operating system
    – 2 SSD drives in a RAID 1 array on a separate raid card running the OS. Ditch the rotary OS drive/
    – Run Windows Server 2008 Enterprise or DTC

    The system requires 2 power supplies to operate, both must be active and there is no redundancy, spare or failover unit
    – Install 1 1500 Watt PSU to make up for the 2 x 750 watt PSU. Less heat generation one point of failure instead of 2 points of half failure

    The system has no hardware RAID capability
    – I believe the model the backblaze comes with will do RAID 5 and or RAID 50 but thats can be overcome by upgrading the raid card

    The system only has 2 GigE network interfaces
    – 1 NIC for active 1 NIC for backup. Management isnt a constant process so performance may take a hit if you are managing or if a drive is rebuilding

    To access/replace a disk drive you need to remove 12 screws
    – 12 screws seems to be over kill for this access hatch only put 6 in possible 4 in

    To access/replace a disk drive you need to remove the top cover
    – kinda the same as the last risk

    If you build this yourself totally DIY you will be required to create custom wiring harnesses
    -from my research i was sure the already put together case from protocase comes with tis already done. But if you are looking for this to be a complete DIY then yes

    Any monitoring or health status reporting tools will have to be built, installed and configured by hand
    – mobilepcmonitor.com

    • Bryan says:

      Just a thought…

      Since you are splitting the 45 drives into 3 arrays. this could make a mean tier server. 15 x ssd’s, 15 x sas 15k, and 15 x big sata’s.

      this is scary but i have had large 3rd party software/application vendors wanting to sell me a Drobo Box in my data center.. I would prefer back blaze rather than a proprietary Drobo Box.

  17. roger says:

    Before scaring everybody off, you could mention that it does allow for software RAID.

  18. Scary…

    But. Reading your article to the very end (even if it’s not complete yet), you seem to realize that it works as advertised. We’ve done the same exercise as you guys and not only found that it works but it is a viable solution.

    We offer cloud storage service to our customer using a modified version of the pods and it’s been working very well for us so far. We now have close to a dozen pods out there, some dating back 2 years. Modifications greatly reduce your arguments against using the pods.

    The system uses a single disk for hosting the operating system : True the original concept was designed with a single HD for the OS. We’ve moved to SSD’s for the OS and mirror them. There is room enough in the case to accommodate 2 SDD drives.

    The system requires 2 power supplies to operate, both must be active and there is no redundancy, spare or failover unit : Yes, there is room for two power supplies in the pod. Why not use 1500W power supplies ? One is enough to power 45 green HDs if you tweak the power up sequence (a little electrical engineering required).

    The system has no hardware RAID capability : Hardware RAID is overrated. MDADM (software RAID) work very well thank you. Much more flexible we think. I personally experimented with hardware RAID in the past. How many times have I replace a RAID controller only to realize the new controller would not recognize the RAID…

    The system only has 2 GigE network interfaces : We put 3 cards in our boxes. One to manage the pods, the other 2 can be coupled together to provide up to 2 GB/s. We are experimenting with 10 GB/s cards but have not yet found needs for that kind of speed. 1 GB/s provides 100MB/s data transfer which is really enough.

    To access/replace a disk drive you need to remove 12 screws : A little mechanical engineering and this is easily fixed.

    To access/replace a disk drive you need to remove the top cover : See previous point.

    If you build this yourself totally DIY you will be required to create custom wiring harnesses : Yes. But it’s really not that complicated. This is the part that needs serious concentration to build properly.

    Any monitoring or health status reporting tools will have to be built, installed and configured by hand : We developed that within a month. And it’s pretty neat.

    Yes it’s a DIY in it’s original state. If you read the original article it’s clear they haven’t said everything. I am sure you found out yourself that some little parts of information were missing.

    Sure any HP, DELL, IBM or EMC rep would tell you horror stories. I would propagate those stories if I were them…

    This is not a SAN solution, it’s a NAS. Couple with open-source software like Linux, MDADM, LVM, Gluster and others, you can put together a “production grade” cloud-storage solution.

    I would never put high access or critical data on it. Something like databases (Oracle, SQL Server or MySQL) does not belong on a NAS. But backups or those databases, security videos from surveillance camera, call center recording of customer service calls, email archives, the infamous “common” drive every company has and many more should be offloaded from SAN which cost $$$$.

    All in all, even at 11K, it’s a real good start to build a production grade NAS.

    Marc
    Lead Engineer
    punraz.com

  19. Fred says:

    Ok so if you build two of these boxes and then sign up for the backblaze I have a feeling I would be fairly safe – or ? Anything obvious I am missing.

  20. David says:

    I wonder how many of these are being built right now? Any idea?

Leave a Reply




If you want a picture to show with your comment, go get a Gravatar.