Autism Speaks needed to move their genomic variant annotation pipeline onto the Google Cloud Platform in support of their MSSNG project’s goal to sequence over 10,000 Autism-affected families. They turned to BioTeam to design and build a solution that would allow their bioinformaticians to efficiently annotate the billions of variants being identified by this ground-breaking study.
BioTeam worked with MSSNG’s bioinformaticians at The Center for Advanced Genomics (TCAG) in Toronto to deploy their existing perl-based pipeline onto Google Compute Engine instances. A custom web application was developed to allow TCAG team members to select and submit variant data to the annotation pipeline and to allow easy monitoring of the ongoing annotation process.
The web application published new annotation jobs onto dedicated messaging queues in Google PubSub where they could be worked off by available annotation nodes. We utilized Ansible DevOps framework to provide a repeatable means to instantiate and provision additional annotation nodes. This enabled the annotation pipeline’s capacity to be easily scaled to keep the duration of the annotation-processing step in line with the project’s goals, even as the project’s data increased dramatically.
In combination with the Google Cloud Platform, the annotation pipeline is enabling the MSSNG team to effectively process over 15 billion variants to identify and annotate the subset of unique variants found in their population. These annotations are then integrated into the MSSNG Research Portal, a custom web portal also built by BioTeam. The Research Portal allows MSSNG to provide open access to this data for Autism researchers to allow them explore the variants and associated clinical phenotypes with the goal of identifying genetic variants that may unlock aspects of this complex disorder.