03/25/2025

Multi-Modal Large Language Models for Metastatic Breast Cancer Prognosis

AACR 2025 PRESENTATION
Authors Raphael Pelossof, Mark Carty, Talal Ahmed, Stanislas Lauly, Alberto Purpura, Erik Mueller, Justin Guinney

Background – Inputs into cancer prognostic models are primarily structured data such as demographic and clinicopathological features, and lack richer and temporal context often found in unstructured clinical notes. We hypothesize that creating a temporal clinical patient note from structured data that preserves longitudinal and clinical contextual information, and coupling it with a large language models (LLM) that is trained to prognosticate overall survival (OS), may improve model accuracy with an interpretable embedding space.

Methods – We developed the Patient Chronological Note (PCN), an algorithm that converts structured data elements into textual strings, mirroring physician notes of patient histories. A bidirectional large language model, was pre-trained using PCNs from breast cancer patients (N=580,000), allowing the LLM to learn a representation of the patient journey. The resulting embeddings were fed into a fully-connected 2-layer network that was fine-tuned using Cox survival loss. Fine tuning was performed on PCNs derived from mBC patients (N=28,500), where the model was trained to predict OS from the time of first metastatic diagnosis. A held-out validation dataset of mBC patients (n=28,800) was used to validate survival prediction accuracy.

Results – Our LLM-Cox model achieved a prediction performance of 0.66 (concordance index) on the validation cohort, outperforming a standard linear cox model that achieved 0.62. A marginal effect analysis showed that features associated with metastasis, medications, and patient demographics were most important in prediction performance while hormone receptor status and sequencing information were less important. Clustering the LLM embeddings revealed 10 distinct patient groups enriched for key mBC traits, including a high-risk cluster enriched for triple negative status, TP53 mutations, and African American race, a low-risk cluster enriched for ESR1 mutations and CDK46 treatment, and a cluster enriched for low-risk early-onset patients. The different clusters showed different prognostic risk levels.

Conclusions – LLM-Cox model learning from PCNs can improve prediction performance over standard models. The internal embedding representation of the LLM was interpretable, and yielded distinct clinical-molecular subtypes that also showed distinct levels of prognostic risk. Furthermore, creating LLM-based clinical-molecular groups of patients with similar journeys and similar prognostic risk presents an opportunity to identify novel stratifications within each group that are associated with treatment-specific responses and not prognostic risk.

VIEW THE PUBLICATION

VIEW THE POSTER