Francisco M. De La Vega, Len Trigg, Kurt Gaastra, Sean A. Irvine, Gene Selkov, Yan Yang, Kyung Choi, Robert Huether
Introduction: Colorectal cancer (CRC) is a leading cause of cancer-related death across the world. Irinotecan (IRI) is commonly used to treat metastatic CRC. The gene UGT1A1 encodes the enzyme responsible for the glucuronidation of SN-38, the active metabolite of IRI. Wild-type UGT1A1 contains six TA repeats [A(TA)6TAA] in its promoter region (also known as the *1 allele). Polymorphic UGT1A1 alleles with a higher number of TA repeats, such as UGT1A1 *28 /(TA)7 and *37/(TA)8alleles, cause decreased enzyme activity and are associated with severe toxicity in patients receiving IRI-based chemotherapy, for which dose reductions are recommended. Matched tumor/normal genomic profiling by NGS for cancer therapy decision support has increased in adoption and it is an ideal opportunity to assess therapy-induced adverse events due to germline variants such as those in UGT1A1. However, genotyping of UGT1A1 polymorphisms is commonly carried out with RT-PCR or fragment analysis in capillary electrophoresis, and not from NGS data due to challenges in aligning short reads to repeats and the introduction of “stutter” artifacts due to DNA polymerase slippage that add or delete copies of the repeat unit in the observed sequencing reads. Here, we benchmark a novel method, BayeSTR, to call accurate UGT1A1 repeat genotypes from target capture NGS data and demonstrate the feasibility of this method for genomic profiling of cancer patients.
Methods: BayeSTR analyzes read alignments to a graph-based model representing the possible repeat alleles, applies an empirically derived “stutter” denoising model, and then performs genotype calling by a Bayesian model. The Bayesian model provides genotype posterior probabilities as confidence values that can be used to eliminate genotyping errors for poor quality data/samples.
Results: To benchmark our method, we simulated alignment data for TA repeat lengths from 4-11 copies in multiple genotype combinations and coverage depths. We observed that with a minimum depth of 70X, we obtained 100% accuracy and robustness to rare/new repeat alleles. We validated our method with germline data from the Tempus xT tumor-normal matched NGS test, which targets 648 cancer related genes including the UGT1A1 promoter. We observed 100% accuracy through analysis of sequencing data from a collection of 54 Coriell cell-line DNA samples whose UGT1A1 genotypes were established orthogonally as part of the CDC GeT-RM project, including different combinations of *1, *28, *36 and *37 genotypes. Additionally, we confirmed similar performance with 66 patient samples where true status was confirmed orthogonally by fragment analysis.
Conclusion: BayeSTR allows for automated, accurate UGT1A1 promotor genotyping from targeted NGS data and can be applied to other genomic repeat regions of clinical relevance. This method identifies UGT1A1 repeat polymorphisms associated with IRI-induced adverse events and can be used during clinical NGS testing to further support clinician treatment decisions for cancer patients.
VIEW THE POSTER
VIEW THE PUBLICATION