05/23/2024

Use of a deep learning model on H&E slides to predict nucleic acid yield for NGS testing

ASCO 2024 Abstract
Authors Josh Och, Yoni Muller, Kunal Nagpal, Boleslaw Osinski, Sun Hae Hong, Martin C. Stumpe, Nike Beaubier, Ryan D Jones

Background: Samples sent for NGS may fail testing, leading to treatment delays or missed opportunities to offer patients targeted therapies. A common reason for assay failure is insufficient total nucleic acid (TNA). We developed and validated a model based on a digitized H&E slide and readily available clinical data that predicts whether samples will yield sufficient TNA for NGS testing. This model could lead to faster turnaround times and lower failure rates by helping choose patient blocks most likely to complete sequencing, or routing likely-to-fail samples to specialized extraction methods or smaller panel assays with lower input TNA requirements. We also evaluated the model’s ability to identify samples that yield abundant TNA, which could help pathologists spare tissue for other testing.

Methods: We developed our model on samples received for NGS testing from January 1 – June 30, 2023, and tested on a large, real-world cohort of 9,707 samples received from July 1 – September 30, 2023. Cell count, assessed by a previously developed deep learning model on digitized H&E slides, along with the number of slides scraped, sample age, procedure type, and tissue site were used as model inputs. We measured the model’s ability to predict whether a sample yielded less than 100 ng of TNA (the target mass for the NGS assay used) and whether it failed sequencing. We also evaluated the model’s ability to predict whether a sample would have abundant TNA (defined here as yielding more than 1000 ng of TNA, 10x the target mass for the assay).

Results: On the validation set, the model predicted samples that yielded TNA < 100 ng with a specificity of 98%, sensitivity of 26% and PPV of 82% (prevalence = 24%). It predicted samples that failed DNA sequencing with a specificity of 96%, sensitivity of 30% and PPV of 58% (prevalence = 15%); and samples that failed either DNA or RNA sequencing with a specificity of 98%, sensitivity of 23%, and PPV of 80% (prevalence = 27%). On the validation set, 742 samples were flagged, 430 of which failed DNA sequencing, and 590 of which failed either DNA or RNA sequencing. When predicting abundant TNA, the model predicted samples yielding TNA > 1000 ng at a specificity of 97%, sensitivity of 54% and PPV of 88% (prevalence = 26%). On the validation set, the model predicted 1544 samples as abundant, only 185 of which yielded TNA < 1000 ng and only 16 with TNA < 100 ng.

Conclusions: The developed model holds promise as a tool for use at the beginning of NGS assays. It can be utilized to optimize testing decisions including selection of which samples to sequence, routing likely-to-fail samples to alternate testing methods, and sparing tissue for future assays from samples with abundant tissue.

VIEW THE PUBLICATION