Normalized Frequencies of Amino Acids
| Ala | 0.096 |
| Gly | 0.090 |
| Lys | 0.085 |
| Leu | 0.085 |
| Val | 0.078 |
| Thr | 0.062 |
| Ser | 0.057 |
| Asp | 0.053 |
| Glu | 0.053 |
| Phe | 0.045 |
| Asn | 0.042 |
| Pro | 0.041 |
| Ile | 0.035 |
| His | 0.034 |
| Arg | 0.034 |
| Gin | 0.032 |
| Tyr | 0.030 |
| Cys | 0.025 |
| Met | 0.012 |
| Trp | 0.012 |
Dividing each element of the mutation data matrix, [Mij], by these normalized frequencies, [fi], yields a relatedness odds matrix, [Rij]. Then taking the log of each element, [Rij], gives a log odds matrix of the form, [Sij], described at the outset.
In practice much sequence comparison over the last 15 years has employed a log odds matrix based on a mutation probability matrix corresponding to 250 PAM of evolutionary distance (see section on "Properties of Mutation Probability Matrix" for methods of computing matrices to arbitrary evolutionary distances). Recent applications of information theory to these matrices has led to the prefered use of PAM matrices optimized for shorter evolutionary distances, for example, 120 PAM.
Cys C 12
Ser S 0 2
Thr T -2 1 3
Pro P -3 1 0 6
Ala A -2 1 1 1 2
Gly G -3 1 0 -1 1 5
Asn N -4 1 0 -1 0 0 2
Asp D -5 0 0 -1 0 1 2 4
Glu E -5 0 0 -1 0 0 1 3 4
Gln Q -5 -1 -1 0 0 -1 1 2 2 4
His H -3 -1 -1 0 -1 -2 2 1 1 3 6
Arg R -4 0 -1 0 -2 -3 0 -1 -1 1 2 8
Lys K -5 0 0 -1 -1 -2 1 0 0 1 0 3 5
Met M -5 -2 -1 -2 -1 -3 -2 -3 -2 -1 -2 0 0 6
Ile I -2 -1 0 -2 -1 -3 -2 -2 -2 -2 -2 -2 -2 2 5
Leu L -8 -3 -2 -3 -2 -4 -3 -4 -3 -2 -2 -3 -3 4 2 8
Val V -2 -1 0 -1 0 -1 -2 -2 -2 -2 -2 -2 -2 2 4 2 4
Phe F -4 -3 -3 -5 -4 -5 -4 -6 -5 -5 -2 -4 -5 0 1 2 -1 9
Tyr Y 0 -3 -3 -5 -3 -5 -2 -4 -4 -4 0 -4 -4 -2 -1 -1 -2 7 10
Trp W -8 -2 -5 -6 -6 -7 -4 -7 -7 -5 -3 2 -3 -4 -5 -2 -6 0 0 17
C S T P A G N D E Q H R K M I L V F Y W
Cys Ser Thr Pro Ala Gly Asn Asp Glu Gln His Arg Lys Met Ile Leu Val Phe Tyr Trp
Log odds matrix for a 250 PAM evolutionary distance. The matrix was obtained
by taking the log of each element in the relatedness odds matrix ([Rij]
= [Mij]/[fi]) for 250 PAM. The elements in this matrix are multiplied by
10 for readability. A score of -10 means that a given pair would be expected
to be aligned only one tenth as frequently in related sequences as random
chance would predict; a score of 2 means that the pair would be expected
to align 1.6 times as frequently. The amino acids were arranged by assuming
that positive values represent evolutionarily conservative replacements;
the clusters correspond to groupings based on the physicochemical properties
of the amino acids [from D. G. George, L. T. Hunt, and W. C. Barker, in
"Macromolecular Sequencing and Synthesis" (D. H. Schlesinger,
ed.), p. 127. Alan R. Liss, New York, 1988.
Back to the main text.