Abstract
Introduction: Real-world data (RWD) offers a valuable tool for identifying novel prognostic factors in heterogeneous diseases. We leveraged the Tempus multimodal RWD database to analyze outcomes for patients with advanced epithelial ovarian carcinoma (EOC) and primary peritoneal carcinoma (PPC). Our primary objective was to apply machine learning (ML) models to identify and evaluate clinical prognostic biomarkers for real world progression-free survival (rwPFS). A secondary objective was to build a well-characterized, real-world cohort of patients receiving standard-of-care (SoC) treatment to serve as a foundation for this analysis.
Methods: Using the Tempus database we constructed a retrospective real-world clinical biomarker cohort of 3016 patients with Stage III/IV EOC or PPC who received first-line (1L) carboplatin and paclitaxel and had a reductive surgery prior to or within 30 days of 1L treatment start. We included relevant clinical variables, such as CA125, race, ECOG, obesity status, histology, and lymphocyte counts. rwPFS was analyzed using Kaplan-Meier and Cox proportional hazard models. Using this cohort, we trained Random Survival Forest (RSF), Cox Regression, and Regularized Cox Regression models to predict rwPFS risk. Model interpretability and feature importance were assessed using SHAP (SHapley Additive exPlanations) values.
Results: The median rwPFS (mPFS) for this cohort was 18.5 months (95% CI: 17.5-19.6), establishing a baseline for this advanced SoC-treated population. The RSF model, trained on the clinical biomarker cohort (N=2769, 80% training, 10% validation, 10% test), demonstrated prognostic performance for rwPFS (AUC = 0.7 at 18 months). SHAP analysis of the RSF model confirmed CA125 as the most significant prognostic feature, consistent with clinical practice. Obesity (BMI threshold > 30) was identified as an important prognostic feature, with obese patients having higher SHAP values compared to non-obese. A stratified analysis showed that CA125 levels were prognostic in non-obese patients (mPFS, Low vs. High: 20.2 [95% CI: 16.1-24.2] vs. 14.7 months [95% CI: 13.0-16.3]), whereas for obese patients mPFS estimates were similar across the two CA125 groups (mPFS, Low vs. High: 14.7 [95% CI: 12.5-20.9] vs. 14.3 months [95% CI: [11.2-17.5]).
Conclusions: Our study demonstrates the utility of applying ML models to large-scale, multimodal RWD to identify and validate prognostic factors in advanced EOC and PPC. Our models confirmed the primary prognostic power of CA125 and identified obesity as an independent prognostic factor. The observed differential prognostic impact of CA125 between obese groups highlights a potential health disparity and underscores the need for further investigation into population-specific biomarkers to ensure the development of equitable prognostic models.
VIEW THE PUBLICATION