03/19/2026

Multimodal AI for Patient Subtype Discovery in LUSC Using Real-World Data

AACR 2026 PRESENTATION
Authors Swati Kaushik, Mark Carty, Akul Singhania, Justin Guinney, Radia Johnson

Abstract

Introduction: Lung squamous cell carcinoma (LUSC) remains a significant therapeutic challenge due to patient tumor heterogeneity and lack of predictive biomarkers. Prior subtype identification efforts, limited to single-modality data (e.g. gene expression), fail to capture the full spectrum of LUSC’s molecular complexity. To define clinically actionable vulnerabilities, we employed a multimodal AI approach integrating gene expression, copy number variation (CNV), and mutation data derived from Tempus real-world data to identify LUSC molecular subtypes, providing a biological landscape essential for improved treatments.

Methods: We analyzed de-identified clinico-genomic records from LUSC patients profiled with Tempus (xT) DNA and RNA (xR) assays. For molecular subtyping, we developed a multimodal autoencoder integrating gene expression, CNV, and mutation profiles from 4,973 tumors of the trachea, bronchus, and lung. Modality-specific encoders were trained and joint embeddings were obtained by averaging and aligning latent spaces with a distance loss to ensure coherent representation across modalities. K-means clustering was applied to joint embeddings to define patient subtypes, which were then functionally characterized via molecular enrichment. Real-world overall survival (rwOS) analysis was performed to assess the clinical and prognostic relevance of the identified subtypes.

Results: The multimodal autoencoder accurately reconstructed all three modalities with low reconstruction errors. Seven distinct subtypes of LUSC were identified with significant differences in rwOS (p=0.02). Subtype C1 (12.5% cases) exhibited the lowest median survival (11.7 months; 95% CI 9.4-16.3) and activation of EMT and TGF-β signaling pathways, known to be associated with adverse outcomes, contrasting with subtype C5 (9.5% cases) with the highest median survival (22.4 months; 95% CI 13.28- 31.5). Subtypes derived from joint embeddings showed enrichment for known driver genes, thereby defining distinct molecular characteristics. NFE2L2 mutations were enriched (p<0.05) in subtypes C3 (28%) and C7 (29%). RB1 mutations were prevalent in C2 (11%) and C5 (14%), while NF1 mutations were observed in subtypes C1 (15%) and C6 (12%). SOX2 and PIK3CA amplifications are known to be enriched in the classical subtypes (C3, C7). In addition, we identified multiple cluster-specific alterations (e.g. FGF19, CCND1 in C3; ETV5, BCL6 in C7) highlighting extensive intra-subtype heterogeneity. The resulting multimodal subtypes validated established TCGA classifications while providing a significantly deeper molecular resolution by uncovering previously uncaptured intra-subtype variability.

Conclusions: This study validated the potential of multimodal omic integration for high-resolution patient subtyping, establishing a critical foundation for developing integrative AI frameworks to accelerate precision oncology.

VIEW THE PUBLICATION