Ancestry Inference From Targeted NGS Tests to Enable Precision Medicine and Improve Racial/Ethnic Representation in Clinical Trials

ACMG Annual Clinical Genetics Meeting 2022 Presentation
Authors Francisco De La Vega, Brooke Rhead, and Sean Irvine


There are well-established racial and ethnic disparities in cancer incidence and outcomes, in part due to structural, socioeconomic, environmental, and behavioral factors. However, some of these differences can be attributed to biological factors, such as the frequency of cancer somatic variants that vary by ancestry. It is well known that diversity in clinical trials is low, with Blacks and Hispanics consistently underrepresented compared to their cancer incidence. Further, race and ethnicity reporting in clinical trials occurs infrequently, and it is missing in up to 50% of patient medical records and genomic profiling tests. Moreover, self-reported race/ethnicity does not accurately reflect genetic ancestry, disproportionately affecting admixed patients. Representative participation in clinical trials would help minimize disparities in outcomes and enable the assessment of biological differences that may determine differential efficacy of drugs for oncology and other indications. Thus, the ascertainment of diversity in real-world genetic testing and clinical trial cohorts is needed.


Rather than relying on self-reported race/ethnicity labels, ancestry can be inferred directly from sequencing data collected during tumor profiling and other tests. Ancestry is usually inferred from genome-wide data, either array or genome sequencing (GS), using unlinked random markers and clustering methods; but this approach is inappropriate for targeted next-generation sequencing (NGS) gene panels or even exome sequencing (ES) data. Instead, we selected 654 ancestry informative markers (AIMs) overlapping the coding regions of 648 cancer genes targeted by the Tempus xT NGS assay. We implemented a supervised version of the ADMIXTURE algorithm using our AIMs to infer global ancestry proportions at the continental level (ie, Africa, America, Europe, East Asia, and South Asia) from GS, ES, and NGS xT variant calls. We validated our methods by comparing our results with labels of unadmixed donors from the 1,000 Genomes Project (1KGP), simulated admixture from the latter, and local ancestry inference previously derived with the RFmix software for admixed individuals from the 1KGP and the ICGC PCAWG project.


We show that this method can infer ancestry and admixture proportions from targeted NGS testing data collected from research-consented patients where race/ethnicity was missing, and report concordance with self-described labels where available. As a use case, we estimated ancestry proportions from a cohort of 1,775 de-identified patients tested with Tempus xT diagnosed with early-onset colorectal cancer (EOCRC), a disease disproportionately rising in Blacks and Hispanic/Latinos. While >50% of test orders lack race/ethnicity metadata, the ancestry inferred from the xT data with our method, suggests an approximately 80% increase in the total number of Black and Hispanic/Latino patients that could potentially be included in studies of EOCRC disparities.


Our results show that inferred ancestry can facilitate research on ancestry correlates with cancer somatic variants and outcomes data. Furthermore, ancestry inference from targeted NGS tests can be used to monitor the diversity in enrollment of oncology and other clinical trials to improve representation and ultimately patient outcomes.