
Summary
A globally recognized academic medical center (AMC) for integrated healthcare and biomedical research faced a critical need to scale its data infrastructure to support accelerating demands in clinical operations and precision medicine. With well over 3,000 active clinical trials and more than 21,000 annual scientific publications, the scale of data production necessitated a platform capable of processing multi-modal datasets, including single-cell omics and large-scale medical imaging. Their aging high-performance computing (HPC) and storage systems were failing to keep pace with demand, and it was unclear when cloud resources would fit into an overall research/clinical computing strategy. To ensure that its infrastructure matured alongside its research requirements, AMC initiated a strategic modernization of its HPC ecosystem. To maintain its competitive edge and support advanced technologies such as Artificial Intelligence (AI), the AMC sought to modernize its scientific computing strategy and capabilities while minimizing disruption to ongoing research.
Following a preliminary strategic HPC market and vendor assessment with BioTeam, the AMC engaged BioTeam to create a high-level architectural design for the hybrid HPC/Cloud system to support both research and clinical operations at the hospital. This initiative focused on defining scientific and clinical requirements and converting those into technical requirements to build a scalable HPC infrastructure connected to existing cloud services. This hybrid strategy integrates on-premises resources with managed services across major cloud providers, including to support bursty GPU training and collaborative bioinformatics. The project also delivered a robust, user-friendly computing infrastructure that empowers researchers at all levels of expertise, facilitates groundbreaking discoveries, and sustains the AMC’s mission of advancing innovative patient care.
Challenge
The AMC faced an unprecedented surge in data generation and the need for stable infrastructure after data storage platforms caused widespread outages. Bandwidth constraints between research locations and data centers forced workflows to remain on-premises and delayed large-scale data migration, while a lack of graphical interfaces created a high barrier for clinical staff. It became clear that the existing infrastructure needed improvement, but that the support teams also needed to mature their services and Service Level Agreements (SLAs) to meet strict research and patient-outcomes standards. Another primary requirement was that this transition occur with minimal disruption to ongoing, high-stakes clinical and research programs. The legacy cluster also lacked the high-VRAM GPU density and NVLink interconnects required for modern large language model (LLM) training and 3D protein structure prediction, while the organization faced a significant data storage deficit.
Approach
The AMC partnered with BioTeam to gather research and clinical requirements from interviews and initial market research, and to develop a high-level HPC and data storage architectural design. The high-level design then drove additional, detailed system design:
- BioTeam conducted targeted interviews with key research and clinical groups to ensure the infrastructure design reflected researcher demand. This included engagement spanning molecular pathology to pharmacoepidemiology, supplemented by survey data from dozens of researchers to identify distinct workload patterns.
- Researcher use cases and scientific requirements were synthesized and vetted by the community to ensure the technical design aligned with the diverse needs of the scientific community.
- A comprehensive high-level design was created to outline requirements for compute, storage, networking, scientific/clinical software, HPC management tools, and a services and team skills matrix to support the system. Technical components were further mapped to NIST control families to ensure the environment meets federal standards for sensitive patient data. Storage architecture followed a three-tier model (Hot/Warm/Cold) with S3-compatible object storage.
Outcomes
The project was structured to define comprehensive HPC technical and operational requirements in a phased project. Furthermore, the project created a service description list to identify necessary internal and vendor-managed services. This systematic approach provided AMC leadership with a clear roadmap for capital investment, ensuring that the resulting HPC ecosystem was built to be scalable, minimally disruptive, and accessible to researchers across all levels of technical expertise. The initial phase deliverables included:
- Research and clinical use cases to drive high-level HPC design.
- A comprehensive list of research and clinical workloads to determine which are best suited for the new HPC infrastructure and which are suited for cloud resources.
- HPC and data storage architectural design that includes computing requirements, administrative requirements, and a recommended approach to account for future scalability.
- A Skills Matrix and Service Description List to define all internal and external-facing services needed to support the new HPC, identified gaps in internal resourcing, and recommendations for addressing those gaps, including outlining responsibilities for vendor-managed services.

