Authors
Anne Sonnenschein, Timothy Baker, Singer Ma, Christine Lo, Robert Tell, Brett Mahon, Nirali M. Patel, Jerod Parsons
Introduction – The xT tumor-normal matched assay, which sequences solid tumor biopsy paired with matched buffy coat, has enabled the accumulation of large amounts of clonal hematopoiesis (CH) data. Although buffy-coat matched sequencing is the gold standard for distinguishing tumor from non-tumor variants, accurately identifying CH variants and distinguishing them from germline or artifactual variants presents unique algorithmic challenges. The buffy coat is sequenced at lower depth than the tumor, potentially impacting the accuracy of variant calling at low variant allele fractions (VAFs). For CH variants with high VAFs, which are the most clinically relevant, distinguishing germline variants is computationally challenging. Due to immune infiltration, CH may be found in both the normal and tumor samples, and copy number variants and loss of heterozygosity in tumor samples can substantially bias VAFs. Here, we demonstrate methods for identifying CH with high accuracy accounting for these challenges.
Methods – We developed an algorithm that calculates the expected VAF for a heterozygous germline variant in a tumor sample by integrating copy number variations, the balance between major and minor alleles, and tumor purity. This algorithm was incorporated into a machine learning model trained on common germline variants and canonical CH mutations sourced from xT data. We validated the accuracy of the xT CH pipeline against higher depth sequencing in xF+ liquid biopsy using 3,000 unique samples.
Results – Our algorithm substantially improves the separation between CH and germline variants. In a representative distribution of variants with VAFs over 20% held out for testing, sensitivity for CH variants improved from 77% to 91%, precision from 80% to 97%, and specificity from 99.5% to 99.9% (ROC-AUC of 0.995) compared to a heuristic logic that does not incorporate copy number. For canonical CHIP variants detected in xF+ with a VAF above 5%, 92% are detected as CH in xT. Variants identified as CH in xT with VAF above 2% are validated as true CH variants by xF+ with 90% precision. CH variants detected in both plasma and buffy coat have highly correlated VAFs (R^2=0.79, m=0.94), and 3.6% of the samples assessed contain a high VAF (>=20%) CH variant.
Conclusions – We demonstrate that the xT platform can distinguish between CH, germline, and artifactual variants with high sensitivity and a low false positive rate. Although CH at high VAF is present in only a minority of samples (3.6%), it is regularly observed in high throughput clinical sequencing. These variants are likely to be clinically significant and may represent secondary hematologic malignancies, which are more common in cancer patients than in healthy populations. Therefore, accuracy-driven approaches for identifying CH in buffy coat matched data are important for improving clinical outcomes and understanding the underlying biology of these variants.
VIEW THE PUBLICATION
VIEW THE POSTER