Methods for the inference of protein phylogeny

By Fred R. Opperdoes


Introduction

Once a multiple sequence alignment has been prepared, such alignment may serve the purpose of further evolutionary analyses. The final goal of such an analysis is to prepare an evolutionary tree describing the relationship of the various taxa with respect to each other. In order to understand the terminology used in the area of phylogeny, study the hypothetical tree shown below:

Example of a phylogenetic tree (for a normal size figure click here)


There exist various methods for the preparation of evolutionary trees: These are "Distance Methods" based on a matrix containing pair wise distance values between all sequences in the alignment, and "Character-Based Methods" that carry out calculations on each of the individual residues of the sequences. In general, distance methods are fast, while character-based methods are much slower, because they are CPU (central processing unit) intensive.

Back to the Table of Contents.


Methods available for tree construction


Back to the Table of Contents


Distance Matrix Methods

NB:These methods, which all assume an evolutionary model, result in only one best possible tree.


Character-based methods


NB: For more information on the maximum likelihood methods: click here

.Back to the Table of Contents



How to root an unrooted tree?

Some good advice to root a tree



NB: Tree topologies may strongly depend on the following:

NB: None of the methods may guarantee the one tree with the correct topology.

.Back to the Table of Contents


How to use the tree-construction programs?


So as to have an idea about the reliability of the topology of the resulting tree, one should do one or all of the following:

NB: Only when widely different methods provide you with similar or identical tree topologies and such topologies are supported by good bootstrap values (> 95%) the trees can be considered reliable.

.Back to the Table of Contents


Limitations of the various methods

.Back to the Table of Contents


Complication of paralogous genes

The presence of more than one homologue of a certain gene, or of different members of a gene family, in one and the same organism may complicate considerably phylogenetic analyses. When such a situation is encountered one speaks of the presence of paralogous genes.

Two genes are said to be orthologous if they diverged after a speciation event, whereas they are said to be paralogous if they diverged after a duplication event.

Let's take the example of the mammalian lactate dehydrogenase isoenzymes M and L. In the case of mouse and rat the isoenzymes are the result of a gene duplication that took place well before the separation of these two species.

paralogoous genes

where () indicates speciation and X indicates gene duplication.

Here one says that the LDH_M gene family is paralogous to the LDH_L gene family. In the case one is not aware of the presence of paralogous genes and isoenzymes in the organisms, because for instance only one sequence for each organism (e.g. LDH_L for mouse and LDH_M for rat) is available and isoenzyme data are missing, then the resulting phylogeny would suggest a much earlier separation of mouse and rat. This will inevitably lead to erroneous phylogenetic trees.

Here you'll be able to find more explanation on orthologous and paralogous genes.

Definition: Paralogy is the construction of a phylogenetic tree from a mixture of genes generated by duplications.

To find out what is the difference between Cladistics and Phenetics and to read about the differences between the various phylogeny inference methods, click here.


Back to the Table of Contents


Still to add:

some literature examples

Origin of chloroplasts in J. Molec. Evolution, 1995


Back to the Table of Contents


Last updated: 25 September 1997.
created by :Fred Opperdoes