Log odds matrix


Normalized Frequencies of Amino Acids

 Ala 0.096
 Gly 0.090
 Lys 0.085
 Leu 0.085
 Val 0.078
 Thr 0.062
 Ser 0.057
 Asp 0.053
 Glu 0.053
 Phe 0.045
 Asn 0.042
 Pro 0.041
 Ile 0.035
 His 0.034
 Arg 0.034
 Gin 0.032
 Tyr 0.030
 Cys 0.025
 Met 0.012
 Trp 0.012


Normalized frequencies, [fi], of the amino acids in the data used to derive the PAM matrix. These frequencies represent the relative exposure to mutation for each amino acid. The sum of these frequencies is 1. (Adapted from Table 22. Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff, ed. National Biomedical Research Foundation, 1979.)

Dividing each element of the mutation data matrix, [Mij], by these normalized frequencies, [fi], yields a relatedness odds matrix, [Rij]. Then taking the log of each element, [Rij], gives a log odds matrix of the form, [Sij], described at the outset.

In practice much sequence comparison over the last 15 years has employed a log odds matrix based on a mutation probability matrix corresponding to 250 PAM of evolutionary distance (see section on "Properties of Mutation Probability Matrix" for methods of computing matrices to arbitrary evolutionary distances). Recent applications of information theory to these matrices has led to the prefered use of PAM matrices optimized for shorter evolutionary distances, for example, 120 PAM.

 

Log Odds Matrix

Cys C   12
Ser S    0    2
Thr T   -2    1    3
Pro P   -3    1    0    6
Ala A   -2    1    1    1    2
Gly G   -3    1    0   -1    1    5
Asn N   -4    1    0   -1    0    0    2
Asp D   -5    0    0   -1    0    1    2    4
Glu E   -5    0    0   -1    0    0    1    3    4
Gln Q   -5   -1   -1    0    0   -1    1    2    2    4
His H   -3   -1   -1    0   -1   -2    2    1    1    3    6
Arg R   -4    0   -1    0   -2   -3    0   -1   -1    1    2    8
Lys K   -5    0    0   -1   -1   -2    1    0    0    1    0    3    5
Met M   -5   -2   -1   -2   -1   -3   -2   -3   -2   -1   -2    0    0    6
Ile I   -2   -1    0   -2   -1   -3   -2   -2   -2   -2   -2   -2   -2    2    5
Leu L   -8   -3   -2   -3   -2   -4   -3   -4   -3   -2   -2   -3   -3    4    2    8
Val V   -2   -1    0   -1    0   -1   -2   -2   -2   -2   -2   -2   -2    2    4    2    4
Phe F   -4   -3   -3   -5   -4   -5   -4   -6   -5   -5   -2   -4   -5    0    1    2   -1    9
Tyr Y    0   -3   -3   -5   -3   -5   -2   -4   -4   -4    0   -4   -4   -2   -1   -1   -2    7    10
Trp W   -8   -2   -5   -6   -6   -7   -4   -7   -7   -5   -3    2   -3   -4   -5   -2   -6    0    0    17
         C    S    T    P    A    G    N    D    E    Q    H    R    K    M    I    L    V    F    Y    W
         Cys  Ser  Thr  Pro  Ala  Gly  Asn  Asp  Glu  Gln  His  Arg  Lys  Met  Ile  Leu  Val  Phe  Tyr  Trp


Log odds matrix for a 250 PAM evolutionary distance. The matrix was obtained by taking the log of each element in the relatedness odds matrix ([Rij] = [Mij]/[fi]) for 250 PAM. The elements in this matrix are multiplied by 10 for readability. A score of -10 means that a given pair would be expected to be aligned only one tenth as frequently in related sequences as random chance would predict; a score of 2 means that the pair would be expected to align 1.6 times as frequently. The amino acids were arranged by assuming that positive values represent evolutionarily conservative replacements; the clusters correspond to groupings based on the physicochemical properties of the amino acids [from D. G. George, L. T. Hunt, and W. C. Barker, in "Macromolecular Sequencing and Synthesis" (D. H. Schlesinger, ed.), p. 127. Alan R. Liss, New York, 1988.

Back to the main text.