Implementing HealthOmics for Cancer Diagnostics

Summary

A major pharmaceutical company sought help in porting its aging bioinformatics pipeline from an on-premises high-performance computing (HPC) system to AWS. The pipeline was a crucial component of their business and could easily be disrupted by hardware failures, capacity limitations, or other issues. To address this need, BioTeam migrated its pipeline to Workflow Description Language (WDL) on HealthOmics, allowing for greater scalability, resiliency, portability, and reproducibility.

Services provided:

  • Developed cloud architecture
  • Converted bioinformatics pipelines to WDL
  • Configured WDL pipeline to run on HealthOmics
  • Containerized bioinformatics tools and environments
  • Trained the bioinformatics team on Terraform, DevOps, and CI/CD pipelines
  • Created scripts for testing and workflow deployment
  • Refactored ETL trigger scripts to utilize AWS services
  • Developed workflow run orchestration and automation using S3, EventBridge, and Step Functions
  • Developed an AWS Lambda for data migration
  • Right-sized and optimized tasks for efficiency
  • Validated that migrated workflows provided expected results

Challenges

The pharmaceutical company was facing a common problem – modernizing and porting a crucial but aging bioinformatics pipeline. The pipeline was only designed to run on their on-premises high-performance computing (HPC) system, lacking the scalability and resiliency that cloud infrastructure provides. For instance, if a large number of samples suddenly needed to be processed, they couldn’t quickly scale up their infrastructure to meet the demand. If the system was down for maintenance, there was no backup system to pick up the slack. Additionally, the pipeline did not utilize a workflow management system, which limited its portability and reproducibility of results.

Approach

BioTeam collaborated with the pharmaceutical company to migrate its bioinformatics pipelines and associated test suites to HealthOmics:

  • Pipeline Refactoring: Converted Python and shell scripts to the WDL workflow language
  • Containerization: Created containers for every bioinformatics tool and environment used by the pipeline
  • Cloud Infrastructure: Developed flowcell run orchestration and automation with S3, EventBridge, and Step Functions
  • CI/CD System Development: A CI/CD system was developed, integrating Terraform for deployments and Docker for builds.
  • Fixture Creation: Pytest fixtures were created for HealthOmics, Step Functions, and CloudWatch.
  • AWS Infrastructure Setup: New AWS infrastructure, including DynamoDB, SNS, and IAM, was established for testing using robust Terraform configurations.
  • Test Refactoring: Over 100 unit, system, and integration tests, along with their data dependency requests, were refactored.
  • Fail-Safe Implementation: Fail-safes were implemented to address test interruptions and common quota errors.
  • GitLab Runner Configuration: A GitLab runner was configured to manage test runs and worker scaling.
  • Pipeline Adjustments: Existing pipelines were modified and corrected.

Outcomes

The pipeline migration yielded significant improvements for the pharmaceutical company:

  • Portability and reproducibility were improved by containerizing the workflow and porting them to WDL
  • Scalability and resiliency were enhanced by moving the workflows to HealthOmics
  • Greater automation for moving data and starting workflows
  • Turnaround time was reduced
  • The new pipeline handles significantly larger flowcell volumes with ease

Share:

Newsletter

Get updates from BioTeam in your inbox.