More than 30 years after the Human Genome Project began charting our molecular makeup, researchers have finished the job by publishing the first fully complete, gap-free DNA sequence.
Spanning 23 chromosomes and totaling more than 3 billion individual nucleotides—the biologic compounds labeled A, C, G and T, which together form the basis of all life on earth—the new data maps out the last untapped 8% of the genome, home to difficult-to-parse regions made up of numerous genes and long stretches of repetitive DNA.
According to researchers at the Telomere to Telomere Consortium and the National Human Genome Research Institute, revealing the final unknown corners of the genome will open up new studies into how chromosomes properly divide and research on more than 2 million additional genetic variants. The details provide new information on 622 genes previously linked to both health and disease.
“Ever since we had the first draft human genome sequence, determining the exact sequence of complex genomic regions has been challenging,” said Evan Eichler, professor of genome sciences at the University of Washington School of Medicine and a co-chair of the T2T Consortium, in a statement.
“I am thrilled that we got the job done,” Eichler noted. “The complete blueprint is going to revolutionize the way we think about human genomic variation, disease and evolution.” The consortium’s work spanning multiple papers was published this week in the journal Science.
Since the Human Genome Project kicked off in 1990—with the first drafts of the human reference genome at 92% complete published at the turn of the millennium—the cost of sequencing a single genome has plummeted from tens of millions of dollars to less than a thousand, thanks to the advent of next-generation, short-read machines. These sequencers allow for the quick analysis of targeted regions of DNA, several hundred bases at a time.
However, these methods can leave gaps when trying to stitch pieces together into a longer string. But over the past decade, new types of long-read machines have been able to generate larger sequences in a single run with high accuracy.
Researchers participating in the T2T Consortium relied on long-read machines developed by Pacific Biosciences and Oxford Nanopore to gather and sequence tens of thousands to potentially millions of bases at a time.
“The T2T Consortium’s work could help scientists better understand human biology and evolution,” PacBio President and CEO Christian Henry said in a statement. “Most importantly, it could one day ultimately change the lives of millions of people by helping researchers better understand the genomic basis associated with potentially dozens of genetic diseases and eventually identify cures or therapeutic options for them.”
Going forward, the baton may be picked up by the Human Pangenome Reference Consortium, which plans to generate reference-quality genomes from more than 300 people with diverse backgrounds over the next three years, according to PacBio. Using long-read machines, these individual genomes will be combined into a pangenome to provide a bedrock for DNA research that better represents human diversity.
“Truly finishing the human genome sequence was like putting on a new pair of glasses,” said consortium co-chair Adam Phillippy, head of the genome informatics section at NHGRI. “Now that we can clearly see everything, we are one step closer to understanding what it all means.”
The now-complete sequence will be available for studies into how DNA differs from person to person to help understand how the genome contributes to certain diseases or responses to treatment.
"In the future, when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their healthcare,” Phillippy said.