Performance of Copy Number Variant Detection From Short-Read Whole Genome Sequencing For Clinical Gene-Panel Applications

Authors Francisco M. De La Vega, Sean Irvine, Pavana Anur, Kelly Potts, Lewis Kraft, Raul Torres, Sean Truong, Yeonghun Lee, Shunhua Han, Vitor Onuchic, James Han, and Peter Kang

Introduction: The decreasing costs of short-read whole-genome sequencing (WGS) now provide a viable alternative to targeted or whole-exome sequencing for clinical applications, offering a superior detection of copy number (CNV) and structural variants (SV). Clinical multi-gene panels presently employ targeted sequencing with 100-500X depth to compensate for reduced uniformity due to bait capture. WGS, however, is typically performed at 30-50X depths,
depending on the application. We aimed to assess multiple CNV detection tools for short-read WGS data to establish clinical-grade performance potential for gene-panel reporting at 50X WGS coverage.

Methods: Our evaluation included CNVpytor, Cue (a new machine-learning method), Delly, and the DRAGENTM 4.2 software suite, which includes a new CNV caller combining breakpoint and depth-based calls, and custom callers targeting particularly difficult to sequence genes. We utilized PCR-free library data from eight cell lines with medically relevant known CNVs, sequenced to a 50X average depth using 2x150bp paired-end reads on Illumina NovaSeqTM
6000 platform. This included a single-exon duplication in PALB2, single-exon deletions in DMD, GAA, PLP1, GBA, and 2-exon deletions in CHEK2 and CDKL5. For accuracy calculations, we evaluated CNV overlaps with coding exons, identifying a match as an event intersecting an exon with equivalent dosage direction as the reference set. Event contribution was adjusted according to the number of overlapping exons. Specificity was assessed across regions of combined 89-gene hereditary cancer and 95-gene cardiovascular disease panels, to emulate gene-panel reporting
from WGS.

Results: Our results reveal that Cue and CNVpytor struggle with detecting single-exon events, failing to identify half of such events, thereby yielding 62% overall sensitivity, whereas their specificity was 85% and 50%, respectively. Delly demonstrated a higher sensitivity of 87%, missing only the deletion in GBA, a notoriously challenging gene, but its specificity was markedly low at 20%. Notably, DRAGEN achieved 100% sensitivity and specificity in this analysis,
which included a deletion call in the GBA gene by a dedicated caller for this gene.

Conclusions: These results suggest that 50X WGS can provide the necessary accuracy for CNV detection in multi-gene panel applications when combined with DRAGEN analysis tools. We are extending our study to include a set of specimens from patients with positive pathogenic CNVs in the genes of interest.