Gene fusions can serve as key drivers in the development of various cancers and represent important therapeutic targets and diagnostic biomarkers. Due to high detection of candidate fusions from RNA-sequencing data, there is a recognized need to build tools that will make reasonable and automated predictions to identify clinically or biologically relevant fusion events in a tumor sample. We developed a computational pipeline which scores and prioritizes all detected fusion transcripts within a sample to determine which fusions are likely driver events in the tumor. Specifically, the pipeline implements a categorization scheme that bins all scored fusion events into Low, Medium and High Confidence levels based on threshold read support levels and a DriverScore metric, which is derived from a binary classification algorithm using specific features, such as reading frame, breakpoint region, kinase domain and transcript isoform. We systematically analyzed 3200 fusion candidates from a previously published cohort of 500 paired tumor-normal samples sequenced with the Tempus xT assay.
We found that 1.7% and 20.1% of fusion candidates were categorized in the High and Medium Confidence levels, respectively, while 78.2% of fusion events were deprioritized as Low Confidence calls. Of the 35 clinically-relevant fusions, 27 (77.1%) were captured in our prioritized set (High/Medium Confidence), including National Comprehensive Cancer Network actionable gene rearrangements involving RET, STAT6 and FUS, while the remaining 8 were assigned as Low Confidence due to an out of frame fusion transcript and insufficient read support. The frequency of prioritized fusions varied by cancer type, with prostate and breast cancer having the highest frequency of prioritized fusions. In addition to well-established canonical fusions, we also sought to characterize novel fusions, identifying a subset of 21 novel prioritized fusions which were also observed in The Cancer Genome Atlas tumor samples. Within this subset, 3% of fusion candidates contained a druggable domain such as a tyrosine kinase or Ras-binding domain, signifying the potential of categorization to enable novel fusion drug target discovery. Overall, our analysis highlights the utility of using an automated prioritization tool to detect known canonical fusion drivers and explore novel fusion drug targets and biomarkers.
VIEW THE PUBLICATION