With re:Invent just two weeks away, the HPC team at AWS has released a new product they are calling ParallelCluster to replace their legacy, open-source project CfnCluster.
BioTeam has been assembling and orchestrating SGE-based HPC clusters in EC2 for over 10 years, first with StarCluster, then with CfnCluster, and now we are excited to learn about ParallelCluster, which looks like an upgraded, product-ized, and supported version of the CfnCluster package. Our clients who use AWS typically have HPC as a central component of their Cloud infrastructure, and a new package like this will have an impact on their compute strategies.
ParallelCluster is a significant release, as one of the few cloud cluster toolkits to combine the usage of a “legacy” job scheduler like SGE with a container-scheduling service like AWS Batch. If you’ve found yourself asking for the equivalent of qsub or bsub for AWS Batch, ParallelCluster has finally added the long awaited awsbsub.
What you need to know
Refer to the documentation about moving from CfnCluster if you already use CfnCluster and want to know what’s going to be changing for you. The command-line client is now renamed from cfncluster to pcluster, but there’s an interesting set of new features and improvements as well:
- AWS Batch support has been added, but SGE is still the default scheduler, with SLURM and Torque/PBS also available. We are not surprised that MPI and HPC Schedulers are still alive as we still see in life sciences that SGE and Slurm continue to dominate the landscape, and most folks do not take advantage of what alternative schedulers have had to offer.
- Multiple EBS volume support. This was often needed and had been solved by Custom Bootstrap Actions or other workarounds. It’s now easier to attach up to 5 EBS volumes to your cluster directly from the ParallelCluster config file.
- Most customizations should be compatible. Many of the deep customizations we have made for our clients in CfnCluster will be forward-compatible in ParallelCluster. Custom Chef cookbooks are still recommended and supported for configuration management.
- HTTP proxy support was added to the VPC config so running in secure private subnets should be simpler.
- Bringing your own images is still supported with custom AMI’s.
- IAM policies have been updated in the docs and include some new services for the AWS Batch features. Be sure to update those policies if you use custom IAM roles as required in most scenarios.
Our initial thoughts
- Adopt ParallelCluster since it’s mostly a relaunching of CfnCluster and migrating should be simple.
- Trial the AWS Batch scheduler option as first step with HPC in Docker containers.
- Assess your security and IAM policies for your ParallelCluster accounts to ensure HIPAA compliance.
If you have any questions about ParallelCluster or migrating from CfnCluster feel free to post questions in the comments. We’ll be walking around at re:Invent so if you’d like to meet up in person contact us!