COX II neighbor-joining tree

By Fred R. Opperdoes


This example shows the different steps in the creation of a phylogenetic tree of the mitochondrial cytochrome oxidase II subunits, starting from the individual sequences up to the tree. Special emphasis is given to the different file formats.


Rob Benne and coworkers in Amsterdam noticed that when they analysed the phylogenetic tree of mitochondrial ribosomal RNAs that the bodonid organism Bodo saltans resembled more the trypanosomatids rather than the other bodonid Trypanoplasma borreli. They decided to repeat the analysis with a mitochondrial protein sequence and they chose for the mitochondrially encoded cytochrome oxidase II subunit. For the analysis a number of Kinetoplastida COX II sequences were collected from the SwissProt database. They can easily be retrieved because they all start with the entry code cox2 followed by a species identification. These are :

The individual sequences in various formats can be visualised here:

The various databases and programs all use different sequence file formats such as Genbank, EMBL, NBRF, GCG etc. However, they can be interconverted using the file utility Readseq. It is available from the Web for most platforms, such as Macintosh, PC, Unix and Vax. For Macintosh users this utility is available as a stand alone and integrated in the programs SeqApp and SeqPup. For Unix users it is available as a stand alone or the use of the program GDE is recommended.

Readseq as stand alone uses the following command line syntax to do a file transformation from for instance GCG to NBRF format:
readseq inputfile -all -f=nbrf -o=outfile.nbrf

An alternative would be to use the Readseq server at the NIH


A multiple alignment then is created using the program ClustalW, which reads various formats such as the NBRF/PIR and Phylip formats for sequence files.

Here you see a multiple sequence alignment created by ClustalW in Pileup MSF and in Phylip format. The MSF format allows the easy editing of the alignment by a word processor such as MSWord. Columns where the alignment is ambiguous, or with indels, can easily be removed using [alt-shift] keys in combination with the mouse button to select the appropriate columns for deletion. Sites in the alignment that should be ignored can also be indicated by a question mark. Another way to ignore entire columns in an alignment would be the use of a weighting (1 or 0) (or mask) for each site. Click here to see an example of a Phylip file with mask.


Creation of a distance tree using Neighbor Joining.

The file in Phylip format is used as an input file for the programs of the Phylip package. For the construction of a phylogenetic tree from a protein sequence alignment the following programs are being used.


Interestingly the Bodo saltans sequence seems more related to the trypanosomatid sequences then to the other bodonid sequence of T. borreli. Also the other tree construction methods, such as maximum likelihood and maximum parsimony give the same result. See below.


Creation of a Maximum Parsimony tree using Protpars.

The file in Phylip format is used as an input file for the programs of the Phylip package. For the construction of a phylogenetic tree from a protein sequence alignment the following programs are being used.


Creation of a Maximum Likelihood tree using Puzzle.

The file in Phylip format can also be used as an input file for the program Puzzle. For more info on Puzzle click here. Maximum likelihood methods give in general results that are more robust towards violations of the model of molecular evolution and therefore is preferred in most cases The Puzzle program allows you to chose the model of evolution and gives you a tree with both branch lengths and quartet puzzling reliabilities for the branching points. This program produces the following output files.


Last updated: 24 August 1997.

created by :Fred Opperdoes