Deploying the HUBzero scientific collaboration platform on the cloud
This post will be our central landing page for all content HUBzero-related. As we publish more information and details we'll adjust this page so that it contains a summary and list of links.
- Behind the scenes: Step 1 of the HUBzero on Amazon process
What is HUBzero?
If you are not familiar with the HUBzero platform, stop reading this post and surf on over to http://hubzero.org for all the details. Check out the About page as well as the Getting Started section for a good background view.
Essentially HUBzero is an open-source platform chock-full of services and features that facilitate online collaboration, sharing and communication among members of a community. In a simplistic way you can think of it as something like "Facebook for Scientists". What separates HUBzero from all the other online publishing platforms, blog engines and content management systems is that this one has been built from the ground up with the needs of scientific research users in mind. The code itself sits on top of the Joomla CMS for those interested in the foundation.
Many online publishing platforms offer convenient "social" features like member forums, blogs, wikis and file hosting. HUBzero offers all of these and many more features of interest to scientific and technical communities. Of particular note are the methods for hosting and delivering browser-based workspaces & scientific tools along with a remote execution and workflow framework that supports integration with common high performance computing resource managers and grid systems.
Why HUBzero on the cloud?
In the usual tradition of BioTeam openess and honesty we'll be blunt about our motives here. We became familiar with HUBzero because one of our consulting clients is currently evaluating it for a potential internal deployment. It's a fairly complex system that is not particularly easy to deploy quickly if one is simply interested in seeing what it's like and "kicking the tires". At the same time this was occurring, the author of this post also felt the need to refresh his Ruby coding skills and get back into developing deployment cookbooks for the awesome Opscode Chef infrastructure automation platform. And finally, after seeing some interesting talks at the 2012 Amazon re:Invent Conference we became interested in learning more about the newly launched AWS Marketplace which is Amazon's latest take on establishing a framework for people to publish and/or sell interesting cloud-based software, systems and services.
A quick google search indicates that nobody has already published a public HUBZero AMI or written extensive HowTo guides or blog posts on how to wedge HUBzero into any of the common IaaS cloud platforms. Also - the AWS Marketplace allows people to publish and "sell" software for $0.00 which means that we could use this method to make cloud-resident Hubs free to acquire and easy to launch.
Out motivation summarized: A confluence of interests -- BioTeam is interested in checking out the AWS Marketplace "from the inside" while internal employees are interested in improving Ruby/Chef skills. By leveraging these interests in an attempt to publish a free version of HUBzero within the AWS Marketplace we will (ideally) be satisfying our own needs while also helping the community by making it easier for people without access to VMWare or local Debian servers to launch and experiment with the HUBzero platform.
Success is not guaranteed; Amazon scans and monitors Marketplace products for malware, scams and backdoors. There is a chance that some HUBzero components may trigger security warnings when our AMI is reviewed prior to publication. One example may be the HUBzero Maxwell Server component which uses an internal automatically-generated SSH key to make passwordless localhost connections to OpenVZ containers hosting tools and workspace environments. Amazon's Marketplace docs warn against leaving SSH keys inside AMI server images so the mere presence of the maxwell-service key may cause us to flunk a security audit. As we progress through the AWS Marketplace process we'll post our experiences and will consider workarounds and alternatives as we encounter obstacles. In the worst-case scenario we might have to leave some HUBzero components uninstalled or un-configured until a user launches the system and logs in to run the final "complete me" installation/setup script.
There are multiple reasons why a cloud-resident easily-launched HUBzero system would be useful to people:
- Got VMWare ESX? Most large organizations with virtualization capability are running the "enterprise" versions of their platform. The VMware image distributed at http://hubzero.org/download was created for VMWare Player and not for VMWare ESX. Integrating the downloaded VMWare Player image into a local enterprise-class VM environment may be non-trivial and may require a conversion or reconfiguration process. It may not be as quick, easy and convenient as first thought.
- Got Debian 6? The "other" way to install and launch a local Hub is to follow the 20-step instructions published online at http://hubzero.org/documentation/1.1.0/installation/Setup. This process requires a Linux system running Debian 6 ("Squeeze"). The binary versions of the software components are distributed as .deb packages and integrating them into a non-Debian-6 platform (we have tried both Ubuntu & CentOS Linux) would be quite a complex task. The "self-install" process can be somewhat complicated and it is easy to end up with a non-functioning hub, even if the online instructions are followed to the letter.
- Cloud, duh! And finally, "the cloud" is a natural meeting place for scientific collaborators and partners of all kinds! Amazon is already becoming a center-of-gravity for genomic and life science public datasets and it's a natural location to host a "hub" where multiple groups of people intend to share tools, data, resources and information. If the intent is to really run a scientific collaboration platform there are valid reasons for not hiding it behind a corporate firewall or VPN.
Current Status - December 2012
We have a four-stage Chef Cookbook that can install, configure and deploy Hubzero onto a 'naked' Debian 6 AMI in about 45 minutes. The majority of that time is consumed by the automated chef-client software downloading several gigabytes of Debian packages, updates and dependencies -- the published Debian 6 public AMI is quite "light" and only comes with a minimal software install at launch.
There are technical reasons for why HUBzero deployment takes 4 separate stages. We'll discuss those in future posts.
The Cookbook we have is successful at turning a bare-bones Debian 6 server into a Hub. We have not gotten the server image to the point where it can be turned into a generic AMI yet. In particular HUBzero has some dependencies on the server public hostname and this will be different each time the image is launched. There are some additional scripts/methods we need to develop to ensure that the HUB will "self-reconfigure" the first time it boots up with a new/different public hostname.
We are also not accepted into the AWS Marketplace yet. Because money is involved we need to submit W9-forms and link a bank account to the Amazon Payment system. These steps are required even though we intend to charge $0.0.0 should we be successful with this project. These procedural steps are currently under way.
The first "behind the scenes" article has been posted at http://bioteam.net/2012/12/hubzero-on-aws-stage-1/