Intel and the Broad Institute have teamed up to create and share tools and infrastructure that support the integration and processing of genomic data. The five-year, $25 million collaboration aims to make it easier to pull together genomic research data housed in private, public and hybrid clouds.
Broad and Intel have three goals going into the project. The first task is to recommend hardware best practices for running Broad’s Genome Analysis Toolkit (GATK) on on-premise, public cloud and hybrid computing environments. In doing so, the partners think they can address some of the difficulties that arise when trying to analyze genomic data from a range of sources.
In parallel, Intel and Broad will seek to tweak software tools such as GATK, Broad’s workflow execution engine Cromwell and GenomicsDB so they run better on Intel-based computing platforms. The software optimization is intended to make it faster and easier to analyze genome data sets.
Finally, Broad and Intel want to help biopharma companies, academics and healthcare providers collaborate by facilitating the secure processing of genomic data across different organizations. To achieve this goal, the partners plan to create workflow execution models that support complex, dispersed data sets. If Intel and Broad can fulfill their ambition, they think the subsequent removal of a barrier to collaboration could benefit everything from drug discovery to clinical decisions.
The goals reflect the pressures organizations face in genomics today. Earlier collaborations between Broad and Intel have tackled problems such as how to accelerate variant discovery. Now, with organizations juggling their own ever-growing data sets and ambitions to pool resources with others, Broad and Intel have stepped up their interest in integrating data.
“Working with Intel, we plan to build out solutions that can work across different infrastructures to facilitate efficient processing of these growing data sets, and then make these tools openly available for researchers worldwide,” Eric Banks, director of the data sciences and data engineering group at Broad, said in a statement. “Our work is a step toward building something analogous to a superhighway to connect disparate databases of genomic information for the advancement of research and precision medicine."