As the number of publicly available whole genome assemblies has increased, methods for whole genome comparison have been developed. Average Nucleotide Identity (or ANI, Konstantinidis and Tiedje, 2005 and Goris et al. 2007) is a measure of the pairwise average nucleotide identity shared between two genomes. Genome sequences are fragmented into 1020 bp regions which are then compared with all other fragments of the other genome. This allows for a comparison that is less affected by genome rearrangement or horizontal gene transfer, as the compared regions are reasonably short yet still biologically informative. A shared ANI of 95% or greater has generally been accepted as indicating the genome pair are both members of the same species.
The Chang lab has developed a perl script automating the process of calculating pairwise ANI values between all genomes given as input. The following instructions describe the pipeline and process for generating a table of pairwise ANI values from whole genome sequencing data.
Pipeline scripts for ANI analysis: Download
First, download the pipeline scripts from GitHub into its own folder using the following command:
git clone https://github.com/osuchanglab/autoANI autoANIA help message describing the various options for the program can be displayed by using the
-helpflag with the following command:
./autoANI.pl -helpgives the following output:
Usage: autoANI.pl input[n].fasta input[n-1].fasta ... input.fasta input.fasta Options: Defaults shown in square brackets. Possible values shown in parentheses. -help|h Print a brief help message and exits. -man Print a verbose help message and exits. -quiet Turns off progress messages. Use noquiet to turn messages back on. -email (firstname.lastname@example.org) - REQUIRED Enter your email. NCBI requires this information to continue. -log|nolog [logging on] Using -nolog turns off logging. Logfile is in the format of ani.log. -threads  *Recommended* - Using multiple threads will significantly speed up the BLAST searches. -size  Genome chunk size. -coverage  (0-100) Percentage of query coverage cutoff. -pid  (0-70) Percent identity cutoff for BLAST search results.
Place the genomes you wish to compare in fasta format into a folder. An email address is required for calls to the NCBI servers. For example, with the files GenomeA.fasta, GenomeB.fasta, and GenomeC.fasta, pairwise ANI values can be calculated with the command:
ani.pl -email email@example.com ./GenomeA.fasta ./GenomeB.fasta ./GenomeC.fasta
Which will produce the output:
Building a new DB, current time: 06/01/2015 12:18:27 New DB name: db/GenomeA.fasta New DB title: ./GenomeA.fasta Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 55 sequences in 0.257053 seconds. Building a new DB, current time: 06/01/2015 12:18:29 New DB name: db/GenomeB.fasta New DB title: ./GenomeB.fasta Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 4 sequences in 0.363739 seconds. Building a new DB, current time: 06/01/2015 12:18:56 New DB name: db/GenomeC.fasta New DB title: ./GenomeC.fasta Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 56 sequences in 0.21495 seconds. Genome A Genome B Genome C Genome A ---------- 87.864 85.682 Genome B 87.759 ---------- 85.887 Genome C 85.774 85.866 ----------
ani.log will contain logging information if you chose to enable logging. The table at the end of the output is in tab delimited format and contains
the pairwise ANI values for each genome pair. For example:
|Genome A||Genome B||Genome C|
If you wish to publish output from this pipeline, please cite the following papers and programs:
Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., & Madden T.L. (2008) "BLAST+: architecture and applications." BMC Bioinformatics 10:421.
Davis II EW, Weisberg AJ, Tabima JF, Grünwald NJ, Chang J. (2016) Gall-ID: tools for genotyping gall-causing phytopathogenic bacteria. PeerJ Preprints 4:e1998v3 https://doi.org/10.7287/peerj.preprints.1998v3
Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. (2007) "DNA-DNA hybridization values and their relationship to whole-genome sequence similarities." Int J Syst Evol Microbiol. 2007 Jan;57(Pt 1):81-91.