A Deep-learning based RNA-seq Germline Variant Caller

Bioinformatics Advances Manuscript
Authors Daniel E Cook, Aarti Venkat, Dennis Yelizarov, Yannick Pouliot, Pi-Chuan Chang, Andrew Carroll, Francisco M De La Vega

RNA sequencing (RNA-seq) can be applied to diverse tasks including quantifying gene expression, discovering quantitative trait loci and identifying gene fusion events. Although RNA-seq can detect germline variants, the complexities of variable transcript abundance, target capture and amplification introduce challenging sources of error. Here, we extend DeepVariant, a deep-learning-based variant caller, to learn and account for the unique challenges presented by RNA-seq data. Our DeepVariant RNA-seq model produces highly accurate variant calls from RNA-sequencing data, and outperforms existing approaches such as Platypus and GATK. We examine factors that influence accuracy, how our model addresses RNA editing events and how additional thresholding can be used to facilitate our models’ use in a production pipeline.