Auto ANI



Average Nucleotide Identity (ANI) analysis

As the number of publicly available whole genome assemblies has increased, methods for whole genome comparison have been developed. Average Nucleotide Identity (or ANI, Konstantinidis and Tiedje, 2005 and Goris et al. 2007) is a measure of the pairwise average nucleotide identity shared between two genomes. Genome sequences are fragmented into 1020 bp regions which are then compared with all other fragments of the other genome. This allows for a comparison that is less affected by genome rearrangement or horizontal gene transfer, as the compared regions are reasonably short yet still biologically informative. A shared ANI of 95% or greater has generally been accepted as indicating the genome pair are both members of the same species.

The Chang lab has developed a perl script automating the process of calculating pairwise ANI values between all genomes given as input. The following instructions describe the pipeline and process for generating a table of pairwise ANI values from whole genome sequencing data.

Instructions

Required Software

Git
Perl -Installed by default on Mac and most Linux distributions
Required Perl Modules
BioPerl
Bio::DB::EUtilities

ANI pipeline scripts

Pipeline scripts for ANI analysis: Download

General Usage

First, download the pipeline scripts from GitHub into its own folder using the following command:

	git clone https://github.com/osuchanglab/autoANI autoANI
		
A help message describing the various options for the program can be displayed by using the -help flag with the following command:
	./autoANI.pl -help
		
gives the following output:
	
Usage:
    autoANI.pl input[n].fasta input[n-1].fasta ... input[2].fasta input[1].fasta

Options:
    Defaults shown in square brackets. Possible values shown in parentheses.

    -help|h Print a brief help message and exits.

    -man    Print a verbose help message and exits.

    -quiet  Turns off progress messages. Use noquiet to turn messages back
            on.

    -email (email@univ.edu) - REQUIRED
            Enter your email. NCBI requires this information to continue.

    -log|nolog [logging on]
            Using -nolog turns off logging. Logfile is in the format of
            ani.log.

    -threads [1]
            *Recommended* - Using multiple threads will significantly speed
            up the BLAST searches.

    -size [1020]
            Genome chunk size.

    -coverage [70] (0-100)
            Percentage of query coverage cutoff.

    -pid [30] (0-70)
            Percent identity cutoff for BLAST search results.


		

Examples

Place the genomes you wish to compare in fasta format into a folder. An email address is required for calls to the NCBI servers. For example, with the files GenomeA.fasta, GenomeB.fasta, and GenomeC.fasta, pairwise ANI values can be calculated with the command:

		ani.pl -email name@domain.com ./GenomeA.fasta ./GenomeB.fasta ./GenomeC.fasta
		

Which will produce the output:



Building a new DB, current time: 06/01/2015 12:18:27
New DB name:   db/GenomeA.fasta
New DB title:  ./GenomeA.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 55 sequences in 0.257053 seconds.


Building a new DB, current time: 06/01/2015 12:18:29
New DB name:   db/GenomeB.fasta
New DB title:  ./GenomeB.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 4 sequences in 0.363739 seconds.


Building a new DB, current time: 06/01/2015 12:18:56
New DB name:   db/GenomeC.fasta
New DB title:  ./GenomeC.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 56 sequences in 0.21495 seconds.	
	Genome A	Genome B	Genome C
Genome A	----------	87.864	85.682
Genome B	87.759	----------	85.887
Genome C	85.774	85.866	----------
	

The file ani.log will contain logging information if you chose to enable logging. The table at the end of the output is in tab delimited format and contains the pairwise ANI values for each genome pair. For example:

Genome A Genome B Genome C
Genome A --------- 87.864 85.682
Genome B 87.759 --------- 85.887
Genome C 85.774 85.866 ---------
The output contains ANI values for blast searches in both directions, ie Genome A blasted against Genome B and Genome B blasted against Genome A.

Citations

If you wish to publish output from this pipeline, please cite the following papers and programs:

Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., & Madden T.L. (2008) "BLAST+: architecture and applications." BMC Bioinformatics 10:421.

Davis II EW, Weisberg AJ, Tabima JF, Grünwald NJ, Chang J. (2016) Gall-ID: tools for genotyping gall-causing phytopathogenic bacteria. PeerJ Preprints 4:e1998v3 https://doi.org/10.7287/peerj.preprints.1998v3

Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. (2007) "DNA-DNA hybridization values and their relationship to whole-genome sequence similarities." Int J Syst Evol Microbiol. 2007 Jan;57(Pt 1):81-91.