Authors
Katie Mo, Xifeng Wang, Kaitlynn Cunnea, Bridget Bax, Maria A Berezina, Chelsea Kendall Osterman, Riccardo Miotto, Chithra Sangli
Background: Real-world oncology data integrates structured and unstructured EHR-based information relating to clinical characteristics, treatment patterns, and outcomes for patients. At Tempus, unstructured records are abstracted into structured fields through a uniform, rules-based, human curation process. We aim to measure the performance of our abstraction process by evaluating inter-abstractor reliability, accuracy compared to an independent oncologist, and the utility of Tempus (de-identified) abstracted data for estimating real-world outcomes.
Methods: Two randomly selected abstractors (blinded to study participation) independently abstracted the unstructured records of 222 advanced or metastatic non-small cell lung cancer patients (a/mNSCLC).Clinical variables were assessed in the demographic, diagnosis, third-party lab biomarker results, first line treatment (1L), and outcome data domains. A subset of 40 patients were reviewed by an oncologist. The primary measure of inter-abstractor reliability was Gwet’s agreement coefficient (AC). Categorical variables were assessed excluding and including missing data as a category for agreement. Date agreement was calculated for presence/absence, as well as exact, within ±15 days, and ±30 days. TheKaplan-Meier estimate of real-world progression free survival (rwPFS) on combination 1L platinum-based chemotherapy (PBC) and immunotherapy (IO) was derived in 2,980 a/mNSCLC patients diagnosed between 2018-2023.
Results: Gwet’s AC was high (≥0.82) between abstractors across demographic, diagnosis, biomarker, and treatment domains. Among the 181 patients where abstractors agreed on 1L class and initiation date within ±30 days, the agreement in progression presence and date was 0.83-0.93. Gwet’s AC was 0.96-1for death presence and date. Percent agreement was high ranging from 85%-100% between at least one abstractor and the oncologist among categorical variables and 80%-100% within ±30 days for date variables. Median rwPFS on 1L PBC and IO was 7.9 months in line with KEYNOTE-189 and -407. All patients with progression and a non-missing date of progression had a clinically relevant downstream event, 97% with 1L treatment end date, 100% with 2L treatment start date, and 69% with a deceased date.
Conclusions: These results demonstrate that the rules-based, human abstraction process as designed is reliable and accurate across the data domains commonly used in insight generation. The resulting data product has utility for estimating real-world outcomes.
Domain
|
AC(Min-Max)
|
Demographic (birth date, sex, race,ethnicity, smoking status)
|
0.96-1
|
Diagnosis (stage, histology, year ofdiagnosis)
|
0.87-0.99
|
Biomarker (EGFR, ALK, ROS1, PD-L1, BRAF, RET, NTRK)
|
0.87-1
|
Treatment (agents, class, dates)
|
0.82-0.97
|
VIEW THE PUBLICATION