compleasm: a faster and more accurate reimplementation of BUSCO

Bioinformatics. 2023 Oct 3;39(10):btad595. doi: 10.1093/bioinformatics/btad595.

Abstract

Motivation: Evaluating the gene completeness is critical to measuring the quality of a genome assembly. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. Benchmarking Universal Single-Copy Orthologs (BUSCO) is a widely used tool for assessing the completeness of genome assembly by testing the presence of a set of single-copy orthologs conserved across a wide range of taxa. However, BUSCO is slow particularly for large genome assemblies. It is cumbersome to apply BUSCO to a large number of assemblies.

Results: Here, we present compleasm, an efficient tool for assessing the completeness of genome assemblies. Compleasm utilizes the miniprot protein-to-genome aligner and the conserved orthologous genes from BUSCO. It is 14 times faster than BUSCO for human assemblies and reports a more accurate completeness of 99.6% than BUSCO's 95.7%, which is in close agreement with the annotation completeness of 99.5% for T2T-CHM13.

Availability and implementation: https://github.com/huangnengCSU/compleasm.

MeSH terms

  • Benchmarking*
  • Genome
  • Genomics*
  • Humans
  • Molecular Sequence Annotation