03/22/2024

Leveraging a comprehensive genomic data library for detecting clonal hematopoiesis in liquid biopsy

AACR 2024 PRESENTATION
Authors Anne Sonnenschein, Maysun Hasan, Sandra Hui, Bob Tell, Halla Nimeiri, Jonathan Freaney, Kate Sasser, Wei Zhu, Christine Lo

Background: Clonal Hematopoiesis (CH) is a well established confounder in next-generation sequencing (NGS)-based liquid biopsy cancer diagnostics. Misclassification of CH variants as tumor variants can lead to false positive actionable variant detection, potentially resulting in incorrect interpretation of results and therapy selection. Moreover, CH variants may also interfere with quantitative variant monitoring leading to inaccurate assessment of treatment response. While filtering of CH variants is possible via matched sequencing of white blood cell and plasma DNA, emerging algorithmic approaches may
enable a more resource-effective, time-sensitive approach with high precision.

Methods: A random forest classifier was trained and validated on 1321 advanced, pan-solid tumor cancer samples (training n=660, validation n=661) sequenced using both the Tempus xF+ (liquid biopsy) and Tempus xT (solid tumor with matched buffy coat) NGS assays. Variants were labeled as CH or tumor-derived based on solid-tissue results in 39 genes that are known to be associated with CH (e.g., DNMT3A, TET2, TP53). The classifier was trained to classify SNV and indel variants detected via liquid biopsy as circulating-tumor or non-tumor (CH + germline) in origin. Features used by the classifier include the fragment size of reads overlapping each variant, prevalence in solid-tumor samples from the Tempus multimodal database, and variant allele fraction relative to estimated tumor fraction. Model classifications were validated against Tempus xT.

Results: Our training set (n=660 liquid biopsy samples) included 680 pathogenic variants. 50% (n=342/680) were determined to be tumor-derived, while 50% (338/680) are likely due to CH. Our independent validation (n=661 samples) included 600 pathogenic variants. Model prediction accuracy on these validation set variants was 91.7%, with an ROC-AUC of
0.97. Sensitivity was 88.3% (n=257/291 true positives correctly labeled as non-tumor variants), specificity was 94.8% (n=293/309 true negatives correctly labeled as tumor variants), and precision was 94.1%. The model ranked “gene” as the most informative feature for variant classification, followed by measurements of fragment distribution and historical prevalence of individual variants within the Tempus Database.

Conclusion: A novel classifier trained on multiple orthogonal bioinformatics features can reliably distinguish CH from tumor-derived variants using only liquid biopsy data with high accuracy, including high sensitivity and high specificity.

VIEW THE PUBLICATION

VIEW THE POSTER