“New study highlights improvements to healthcare by increasing diversity in genomic research”

In a new study published in Nature (Wojcik et al. 2019), investigators from the US and Mexico identify the lack of racial diversity in large genetic datasets as a major reason why genetic discoveries fail to translate to non-white populations.


Since the Human Genome Project finished in 2003, genomic research has promised a future of precision medicine—where prognosis and treatment can be personalized using genetic data. Foundational to fulfilling this promise has been the collection of large genetic datasets linked to medical data. Genome-wide association studies (GWAS) leverage these datasets to identify genetic contributors to traits and diseases. While these studies have provided numerous discoveries for some portions of the population, specifically white Europeans, their ability to help other populations has been limited.

In a new study published in Nature, investigators from the US and Mexico identified the lack of racial diversity in large genetic datasets as a major reason why genetic discoveries fail to translate to non-white populations. Specifically, the large genetic datasets used for studies of complex biological traits, drug development, or clinical guidelines, are composed almost entirely of white individuals with European ancestry. A number of factors have contributed to this bias, including deficiencies in technical sophistication, which required racially homogeneous datasets to maximize the discovery of genetic associations. In the Nature report, however, the researchers showed that discoveries made using European genetic datasets did not lead to discoveries in other racial groups, especially African-Americans and Hispanics/Latinos. Importantly, applying findings from European-heavy datasets could produce misleading results and exacerbate health disparities in underrepresented populations. The authors of the study warn, “Current genomic databases are under representative of populations with the greatest health burden and possibility of meaningful benefit.”

The researchers went on to show the power of analyzing diverse, multi-racial genomic datasets in combination with new, more sophisticated computational tools and methods. The researchers collected genetic and clinical data from approximately 50,000 non-European individuals within the United States for a study called Population Architecture using Genomics and Epidemiology (PAGE), funded by the National Human Genome Research Institute and National Institute on Minority Health and Health Disparities. The researcher team performed GWAS using these data and searched for genetic associations with 26 clinical and behavioral traits—including body-mass index, cholesterol levels, height, coffee and cigarette consumption, and several blood-protein levels. The PAGE study identified 27 new, previously unreported, genetic associations with the clinical traits they examined. They also replicated 1,444 previously reported genetic associations for the same clinical traits.

Notably, because the study considered genetic ancestry on a continuum, rather than separating individuals into discrete populations, the researchers were able to translate findings across racial groups that share components of genetic ancestry. For example, the researchers discovered an association recently reported in African-Americans between a genetic variant and levels of a specific blood-protein also exists in Hispanics/Latinos—presumably due to their shared African ancestry.

In addition to identifying new genetic associations, the Nature paper also demonstrated how data from the PAGE study could be used to identify credible, functionally important genetic variants associated with a trait. Specifically, patterns of genetic variant inheritance—which variants tend to be inherited together—differ between populations. This feature can be used to reduce the statistical noise that occurs in a GWAS when unimportant variants tend to be inherited along with an important variant, making them also appear important. Leveraging their racially diverse dataset, the researchers were able to significantly reduce the number of credible variants associated with height and body-mass index reported in a previous study. Reducing the number of credible variants helps future research that uses more thorough and time-consuming methods to investigate the biological function of variants for basic biology or drug development purposes.


Overall, the researchers presented detailed evidence of how racial homogeneity in large genetic datasets has hindered the application of genomic medicine in minority populations. This has manifested both in the limited translation of findings across populations and in missing critical variants not present in European datasets. The authors note, “The lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease”. Fortunately, the researchers also demonstrated the capability of racially diverse genetic datasets to provide novel insights into health and disease. They hope this will reduce the healthcare disparity between populations. In order for the promise of precision medicine to be realized, the investigators argue, “researchers and funders can no longer afford to ignore non-European populations.”

Previous
Previous

“Siblings can differ in their genetic ancestry results, here’s why you shouldn’t be concerned”

Next
Next

“Princeton Citizen Scientists Meet with Congressional Reps to Talk Paid Family Leave.”