Reducing Manual Chart Review in Oncology: Validation of an AI-Based Monitoring Tool for Lung Cancer Staging and Biomarker Tracking
ASCO 2026
Katie Mo, Tian Kang, Journey Penney, Jessica Dow, Chithra Sangli, Noah Zimmerman, Riccardo Miotto
Background
Continuous monitoring of evolving staging and molecular data is essential for clinical decision support to capture shifts in an oncology patient’s care journey. However, current systems struggle to synchronize fragmented electronic health records and extensive clinical documentation. This leaves clinicians with an incomplete picture, hindering their ability to follow complex and dynamic NCCN protocols without manual chart reviews.
Methods
We implemented an AI-based continuous monitoring system (AI-CM) that uses embeddings from large language models, fine-tuned on clinical notes, to extract clinical events and flag patients whose care deviates from NCCN guidelines. Monitoring begins at the initial oncology ICD code, where the system tracks follow-ups that determine stage, histology, and site (diagnostic features). Subsequently, it continuously monitors documents for diagnosis-specific NCCN-recommended biomarker testing (ordered or performed). We validated AI-CM in a manually annotated cohort of 7,259 patients across eight institutions (2024–2025). The evaluation focused on biomarker testing for early-stage (n = 1,155) and advanced (n = 4,448) non-small cell lung cancer (eNSCLC, aNSCLC), specifically targeting EGFR, ALK, PD-L1, and NGS.
Results
AI-CM identified eNSCLC and aNSCLC diagnostic features with precision of 0.77 and 0.85 and recall of 0.91 and 0.94, respectively, and biomarker testing with precision of 0.75 and recall of 0.81. Incorporating monitoring for diagnostic features reduced false positive diagnoses by 25% relative to relying solely on ICD codes. The median time to diagnosis was 41 days for eNSCLC and 16 days for aNSCLC, with a median time to test from diagnosis of 9 days. Within 6 weeks of the initial ICD code and diagnosis, AI-CM identified diagnosis and biomarker testing, on average, for 82% and 88% of patients, respectively (Table 1). This enables flagging for full review of a much smaller proportion of patients (18%) with a missed diagnosis, or who are non- adherent (12%) to NCCN guidelines for testing. By reviewing only the AI-recommended document to confirm diagnosis for the 89% of patients correctly identified as aNSCLC, and all documents for the remaining patients, the average number of documents requiring review within the first 6 weeks to determine these diagnoses was reduced from 5.5 to 1.3 per patient (a 76% reduction).
Conclusions
AI-based continuous monitoring improves timely identification of patients deviating from guidelines and reduces manual review by prioritizing documents for human verification.
Table 1. Performances of AI-CM at 6 weeks from the previous clinical event.
| Target | Patients Identified (%) | Mean Documents to Review per Patient | AI-CM Mean Documents to Review per Patient | AI-CM Documents to Review Reduction (%) |
| eNSCLC | 76 | 7.3 | 2.4 | 67 |
| aNSCLC | 89 | 5.5 | 1.3 | 76 |
| eNSCLC Testing | 89 | 5.2 | 1.5 | 71 |
| aNSCLC Testing | 88 | 7.5 | 1.8 | 76 |
Related publications