Summary
A top-five US Cancer Research Medical Center sought BioTeam’s expertise in assessing and creating a five-year roadmap to address its scientific computing challenges. Legacy high-performance computing (HPC) systems were unable to meet the needs of their researchers, leading to long wait times for computing resources, inefficient workflows, and researchers losing access to valuable data resources. The hospital’s enterprise IT solutions were unable to identify and resolve underlying issues, and the research IT group was overwhelmed supporting legacy systems while also attempting to deploy a new HPC system.
BioTeam addressed the problem using a dual approach: 1) stabilize the existing HPC environment by deploying missing infrastructure components; 2) develop a five-year strategy roadmap and implementation plan to expand the HPC environment and create an effective research IT organization. This comprehensive approach directly addressed the client’s critical short-term challenges with their legacy HPC systems while outlining a path forward to a more efficient and powerful HPC system capable of driving their cutting-edge research.
Challenge
The research hospital struggled with legacy HPC systems that were increasingly complex to scale, manage, and maintain. Interviews revealed the researcher community needed not only a reliable, larger HPC system, but also:
- Clear strategies to manage and transfer research data from creation to long-term archiving,
- The ability to collaborate with external colleagues, move data securely, and use HPC resources to meet grant requirements,
- Additional Research IT support and HPC onboarding.
The Research IT group worked holidays and weekends to keep the HPC systems and support services operational. The Enterprise and Research IT teams also felt unequipped to diagnose and resolve underlying infrastructure problems. It became clear that the Enterprise and Research IT teams did not fully understand each other’s needs, and therefore needed a third party to provide detailed industry best practices on how other institutions addressed similar challenges, such as engaging external research partners without exposing internal IT infrastructure.
Approach
BioTeam conducted a deep-dive assessment of its HPC system and identified both short-term challenges that could be addressed to stabilize it and long-term challenges that would require a more comprehensive strategy to address. Short-term solutions included providing modular, cloud-native HPC environments tailored to each research team’s workload that alleviated reliance on legacy HPCs. Ultimately, workloads were migrated to the new HPC or remained on these cloud environments, depending on IT and research group preference.
For the strategic roadmap and implementation plan, BioTeam created and led four working groups with members from both the client’s Research IT and Enterprise IT groups. Ultimately, each working group addressed a specific challenge, such as:
- Merged multiple sources of HPC and data storage identities and created a clear vision of what the final state of identity management would be for the medical center.
- Outlined and tested secure methods for researchers to move data to and from external collaborators.
- Planned five years of HPC and data storage capacity growth based on previous usage and predicted growth patterns in AI and precision medicine research.
- Created a detailed list of new and existing Research IT services the research community needed as well as outlined the staffing requirements and specific skills needed to deliver those services.
The long-term roadmap leveraged BioTeam’s decade-long expertise in building effective research IT organizations and systems, as well as insights gained from conducting targeted working groups drawing from stakeholders across the organization. The roadmap included developing a step-by-step playbook to meet the future needs of the research community and integrating new technologies, such as AI/ML.
Outcomes
BioTeam modernized existing HPC systems and provided a detailed five-year roadmap to ensure long-term success. Key outcomes included:
- HPC Infrastructure Improvements
- A more modern, stable, secure, and scalable central HPC system.
- Migrated legacy on-premises and cloud-based HPC infrastructure systems and research IT support services.
- Enhanced external research collaboration without compromising internal network security.
- HPC, Cloud, and Data Management Strategy
- A five-year HPC capacity plan based on research needs with detailed budget justifications.
- A holistic, diagramed data management strategy enabling researchers and IT to meet growing data and compute demands with enhanced performance and security.
- A five-year roadmap to transform the existing HPC support team into an effective and science-driven research IT organization.
- A detailed list of all Research IT services and a clear staffing and resource plan to support those services.
