Broad Institute offering cloud-based access to trove of genomic data
- Working with Google, Amazon Web Services and other tech giants, the Broad Institute of MIT and Harvard plan to offer cloud-based access to its genome analysis toolkit (GATK), the research center said this week.
- The toolkit is a software package used to analyze high-throughput genomic sequencing data and is used by over 30,000 researchers worldwide. Previously accessed through desktop software, the toolkit will now be available as a software-as-a-service (SaaS) package.
- Genomic sequencers at the Institute generate 14 gigabytes of data every minute, adding up to 20 terabytes of new data a day - straining the capacity of computing systems. With this new offering, the Board is aiming to lower the barriers for researchers wanted to use the gene data.
GATK has proven to be a powerful tool for researchers worldwide, who are working with genomic data to generate insights that inform drug development and other research efforts. However, given the tremendous data and storage requirements, there are intrinsic challenges with trying to use GATK as a desktop-based solution. In order to make GATK even more useful and powerful, the Board Institute made the decision to move to cloud-based computing.
The Institute debuted this technology first with its internal research teams and have now added its Whole Genome Sequencing Pipeline to the Google Cloud platform.
For its SaaS offering, the Broad is working with a number of large tech companies including Amazon Web Services, Cloudera, Google, IBM, Intel, and Microsoft.
In addition to the move towards cloud-based computing, Broad has also been working on developing a next generation version of its genomic analysis software, called GATK4.
- Google Research Blog Genomic Data Processing on Google Cloud Platform