Beyond short reads: Leveraging long-read sequencing for population-scale genomic insights – PacBio
Large-scale population sequencing programs have a number of important research goals and outcomes. These include:
Generation of a complete catalog of genetic variation
to reflect genetic diversity in a target population.
Maximize discovery power through accurate variant calling across all variant classes and across the entire human genome
including challenging genomic regions (highly repetitive or homologous regions), while taking advantage of additional genetic context such as phasing and methylation status.
Creating broad and long-term utility of the generated data
serve as a resource for population genetics, translational research, and pharmaceutical R&D.
Returning value to a population
through the return of direct results to study participants in order to enable precision health and preventive programs.
In recent years, population genetics and precision health research have relied on large genomic data sets, often combining microarray, exome and genome data to get the results they need. Gaps and challenges within the generated data have remained persistent, primarily due to technological limitations of leading next-generation sequencing technologies that require additional methods to integrate the data and fill the gaps.
Current short-read next-generation sequencing (NGS) methods do not detect variants in dark regions and routinely struggle with large or complex variants, which can disparately influence obtaining genetic insights from ancestral populations and can lead to an understanding partial or incomplete analysis of the genetic causes and mechanisms of disease.
Here are some examples of disease-related genes that illustrate this problem:
- SMN1: Silent carriers of SMA (spinal muscular atrophy) of African descent remain poorly detected with current short-read NGS and its callers
- LPA: High levels of LP(a) (lipoprotein (a)) increase the risk of heart attack, stroke and aortic stenosis, long variable repeats in the LPA gene remain difficult for short-read methods, higher levels of Lp(a) they are found in Africans and South Asians
- HBA, HBB, HBM: Genes involved in thalassemia and sickle cell disease, where homologous sequences, copy number variants (CNVs) and gene fusions influence analysis, hemoglobinopathies are more common in Mediterranean, Middle Eastern, Southeastern countries Asian, African and African American
The long-read sequencing difference
HiFi sequencing revolutionizes the field, significantly changing the game for population genomics programs. Compared to traditional short-read methods, a HiFi genome can detect 2.5 times more structural variants. While single nucleotide variants (SNVs) and indels make up the majority of variants in number, it is structural variants (SVs) that have a major impact on the number of base pairs affected throughout the genome. In fact, SVs alter more bases than SNVs and indels combined. Furthermore, HiFi sequencing proves invaluable in analyzing previously inaccessible “dark regions” of the genome that remain beyond the reach of whole genome short-read sequencing (WGS). These regions contain numerous medically relevant genes such as SMN1, HBM and LPA. Furthermore, HiFi genomes offer a genome-wide methylation signal along with base calling and the ability to phase variants into distinct haplotypes, allowing for the generation of more disease-relevant insights.
To ensure the long-term and broad utility of population sequencing data, it is critical that programs obtain comprehensive, whole-genome insights across all ancestors while working within their budgets. PacBios HiFi data has been recognized as indispensable in this research, as noted by the authors of a recent paper involved in one of the largest and most advanced population sequencing projects in the world, We should continue to develop population-scale cohorts sequenced only with reads long. The question arises whether we have entered the era of exclusive use of long reads.
Interested in how HiFi sequencing can make a difference in your next population or cohort sequencing project? Visit our Population Genetics + Carrier Screening page or contact us to explore your options.
#short #reads #Leveraging #longread #sequencing #populationscale #genomic #insights #PacBio