Authors
                  Joshua Drews, Lee Langer, Ezgi Karaesman, Matthew MacKay, Andrew Sedgewick, Joshua SK Bell, Justin Guinney, Catherine Igartua
                
                                         
           
          
            Introduction: Cancer tumorigenesis and progression are driven by genetic and epigenetic alterations, giving rise to transcriptional and pathway dysregulation. An approach considering the level of pathway disruption and underlying genomic drivers may provide a more comprehensive understanding of tumor biology than assessing either factor alone.
Here, we developed a machine learning platform that integrates DNA alterations and RNA expression data to measure the activation states of oncogenic signaling pathways and characterize novel genetic alterations that may cause pathway dysregulation.
Methods: Using real-world de-identified patient records from the Tempus Database, the platform trains models on gene expression using arbitrarily complex combinations of somatic alterations and clinical phenotypes as positive and negative cohort labels. We applied this framework to model pathway activities (TCGA PanCan pathways: Cell Cycle, HIPPO, MYC, NRF2, NOTCH, PI3K, P53, RTK/RAS, TGFB, WNT) among patients who received Tempus xT, a targeted DNA and whole-transcriptome RNA sequencing assay. For each pathway, the positive cohort comprised samples with at least 1 pathogenic short variant or CNV in a predefined pathway gene list, while the negative cohort comprised samples with no detected somatic alterations in any of the genes. Model performance was measured by AUC and evaluated for each individual cancer type using a hold-out dataset.
Results: We applied the platform to 15,217 samples across 22 cancer types to develop models of pathway dysregulation. Several pan-cancer models were strongly predictive of dysregulation in individual cancer types, including PI3K in prostatic (AUC=0.86) and pancreatic adenocarcinomas (0.85), and TGFB in gastrointestinal stromal tumors (AUC=0.94) and meningioma (0.90). To assess the platform’s ability to predict the effects of variants on pathway activity, the model was trained to predict RAS variant pathogenicity in colorectal cancer. Without knowledge of rare variant classifications, this model supported the established pathogenicity of several rare KRAS variants (e.g., K117N, A146T) and returned pathogenicity measures for several other variants that were highly consistent with published in vitro data. We applied the same approach to identify potential modifiers of HIPPO pathway disruption in high grade glioma, with results suggesting RB1 mutations as putative modulators of HIPPO dysregulation.
Conclusion: We developed a method for modeling oncogenic processes and applied it to a large real-world database. The method differentiated between oncogenic drivers and passenger alterations with high accuracy, and measured the relative levels of pathway activation states between and within cancer subtypes.
VIEW THE PUBLICATION
VIEW THE POSTER