Sequencing coverage and breadth of coverage

Renesh Bedre        2 minute read

What is Sequencing coverage or depth and breadth of coverage?

  • Sequencing coverage or depth (coverage and depth are used interchangeably) determines the number of times sequenced nucleotide bases covered the target genome. For example, if genome size is 100 Mbp and you have sequenced 5 M reads of 100 bp size, then sequencing coverage at genome level would be 5X.
  • The breadth of coverage refers to the percentage of genome bases sequenced at a given sequencing depth. For example, if 95% of the genome is covered by sequencing at a certain depth.

How to calculate sequencing coverage?

  • Sequencing coverage is calculated based on the type of sequencing. For RNA-seq applications, coverage is calculated based on the transcriptome size and for genome sequencing applications, coverage is calculated based on the genome size
  • Generally in RNA-seq experiments, the read depth (number of reads per sample) is used instead of coverage. High read depth is necessary to identify genes with low expressions. The typical read depth RNA-seq experiment to study gene expression ranges from 5 to 25 M reads per sample.
  • Calculating sequencing coverage based on raw sequence reads will give you rough estimates as some of these raw reads may have contamination (adapter, primer, duplicates or low-quality bases) or may not map to genome. In such cases, you can consider genome mapped data for estimating the coverage.

We will use bioinfokit (v0.9.7 or later)
Check bioinfokit documentation for installation and documentation

# you can use interactive python interpreter, jupyter notebook, spyder or python code
# I am using interactive python interpreter (Python 3.7)
# go to a directory where fastq files are saved. Make sure fastq file is uncompressed.
# this will give sequencing coverage per sample
>>> from bioinfokit.analys import fastq
>>> fastq.seqcov(file="fastq_file", gs="genome size in Mbp")
Sequencing applications Recommended Coverage
Whole genome sequencing (WGS) 15X to 60X
Whole exome sequencing (WES) 100X
RNA sequencing (RNA-seq) 5 to 100 M reads per sample depending on target study
ChIP-Seq 100X

Source: Illumina and genohub

References:

  • Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nature Reviews Genetics. 2014 Feb;15(2):121-32.

How to cite?
Renesh Bedre.(2020, July 29). reneshbedre/bioinfokit: Bioinformatics data analysis and visualization toolkit (Version v0.9). Zenodo. http://doi.org/10.5281/zenodo.3965241

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com

Last updated: September 18, 2020

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.