Last version 1:30 Nov. 21, 1996
David R. Nelson
The earliest eukaryotes had no mitochondria or chloroplasts. We
can tell this because the
most ancient branches on the eukaryotic tree all represent groups
without mitochondria.
The earliest eukaryotes are the diplomonads (includes Giardia),
followed by the
microsporidians and then the trichomonads. The evidence is very
strong that mitochondria
were taken in as endosymbionts of the alpha proteobacteria.
Phylogenetic trees of the
rRNAs from bacteria and mitochondria show this plainly. A similar
case is made for the
chloroplasts origin among cyanobacteria. Hydrogenosomes and
peroxisomes may also be
endosymbionts, but they do not contain a genome of their own
today.
Hydrogenosomes are organelles found only in eukaryotes that do not
have mitochondria.
These organelles are surrounded by a double membrane. They are the
site of pyruvate
fermentation to produce acetate, CO2 and H2. ATP is formed by
substrate level
phosphorylation, so these organelles resemble mitochondria in that
they produce ATP.
They do not have pyruvate dehydrogenase. Instead they use an enzyme
called pyruvate
ferredoxin oxidoreductase, and they have hydrogenase. These enzymes
are not found in
mitochondria, but they are seen in anaerobic bacteria. Hydrogenosomes
do not have an
electron transport chain and F1F0 ATPase. They do not perform
oxidative
phosphorylation.
One set of proteins found in hydrogenosomes are the heat shock
proteins Hsp70, Hsp60
and Hsp10. These are ideal sequences to use for phylogenetic
analysis. As we saw above,
Gupta And Golding used them to argue for a hybrid eukaryote genome. A
sequence
analysis of the hydrogenosome Hsp70 and Hsp60 proteins showed a
signature sequence
characteristic of these proteins in mitochondria and gram negative
purple bacteria. Trees
made with other available sequences of all three Hsp proteins placed
each of them in the
mitochondrial Hsp group. Since this happened in all three cases, the
hydrogenosome Hsps
and the mitochondrial Hsps appear to have a common bacterial
ancestor. In anaerobic
environments where these organisms live, it was not useful to retain
the OXPHOS genes
encoded in the mitochondrial genome, which may explain why
hydrogenosomes have no
genome of their own. These organelles are apparently a degenerate
version of the
mitochondrial ancestor.
Before the eukaryotic ancestor could engulf these bacteria, it
would have to evolve into a
phagocytic cell, with the ability to take a whole living bacterium
inside itself. This is not
possible with a rigid cell wall. Therefore, the precursor of
eukaryotes had to lose the
ancestral cell wall.
(see PNAS 92, 8507-8511 1995)
Introns are found in eukaryotic genes (Nature 271, 501 1978).
There are two opposing
views on what the origin of introns might be. In one view, introns
are ancient and were
present in all genes. The amino acid coding regions of genes were
made of small pieces of
15-20 codons each and these were all spliced together by removal of
the introns. In this
model, genetic diversity was assured by exon shuffling, that combined
different exons to
make a very large collection of proteins. This is the exon theory of
genes. Since these
types of introns are not in bacteria, the theory goes that bacteria
are streamlined and have
lost all their introns. Furthermore, the exons in this theory are
supposed to code for units
of protein structure, such as helices and beta strands.
The "introns late" theory says that the original common ancestor
did not have introns and
they only evolved in the eukaryote branch of the tree of life. This
theory suggests that
introns are placed pretty much at random into genes, and they do not
necessarily
correspond to protein structural elements like helices and
strands.
Triose phosphate isomerase is an enzyme that is used to support
the introns early
hypothesis, because the chicken TPI gene has six introns and they all
occur between
structural elements in the protein. When additional sequences from a
plant an a fungus
were determined, additional introns were found. Five were in the same
place in plants and
animals suggesting they existed in the common ancestor to plants and
animals. The total
number of presumed introns was now 11, with different lineages losing
different introns
over time. One of the new exons was big and did not fit well with the
theory that compact
modules of protein were encoded in the exons. Walter Gilbert
predicted that another
sequence would have an intron that would break this exon in two. This
was found in a
mosquito.
The issue was not solved yet, because more sequences were done
from another insect,
another fungi and C. elegans (nematode). These identified seven new
intron positions, for
a total of 21 introns and an average exon size of 11.2 codons. 12
introns only occur in one
sequence, suggesting that all other lineages lost that intron. The
exon theory then is getting
to be very cumbersome. Additional sequences from more insects showed
that a close
relative of the Culex mosquito (Aedes mosquitoes) had the intron, but
more distant relatives
(Anopheles mosquitoes, flies and moths) did not(PNAS 92, 8503-8506
1995). This is
consistent with late insertion of the intron in an ancestor of the
Culex and Aedes
mosquitoes. Since 19 species are missing this intron, at least 10
independent losses of this
intron would be required to fit the introns early model. Therefore,
the introns early model
seems to be wrong.
.
Archaea and eukarya share some distinctive features, including
N-linked glycoproteins,
absence of formylmethionine and introns in their tRNAs(see PNAS 92,
5761-5764 1995).
N-linked glycoproteins are made in eukaryotes in the endoplasmic
reticulum and the golgi.
They are initially formed using a dolichol carrier lipid
intermediate. Their presence in
archaea suggests that dolichol is there and some membrane bound
system similar to the N-
linked biosynthetic machinery of the ER is also present. Bacteria use
N-formylmethionine
at the beginning of all their proteins. There is a special tRNA for
this amino acid. Eukarya
and archaea do not have formylmethionine. Some eukaryotic tRNA genes
have introns. In
yeast, there are 262 tRNA genes, and about one third contain introns.
Archaea also have
introns in some of their tRNAs. Methanococcus jannaschii has 37 tRNA
genes, with
introns in a met and trp tRNA. The transcriptional apparatus of
archaea is much more
eukaryote-like than bacteria-like. This may reflect similarities in
how DNA is packaged in
the two domains. Methanococcus jannaschii has five histone genes, so
the DNA may be
packaged in nucleosomes as in eukaryotes. This may require a more
complex transcription
machinery to get at the DNA. Bacteria don't have histones and they
have a much simpler
transcriptional apparatus.
Archaea and bacteria both have polycistronic operons, and some of
these have the genes
arranged in a similar order in both lineages. This implies that the
operons existed in the
common ancestor. Recently, there has been some evidence that C.
elegans, a model
organism for genome sequencing also has operons. This was unheard of
in animals before
these C. elegans operons were described.
(see Molecular Biology of the Cell, chapter 21)
Humans, worms and flies don't look very similar and they do not go
through the same
developmental stages. Yet the genes that control their body shape and
organization are
related in sequence. They all share a common sequence called the
homeobox. This 180
nucleotide sequence codes for 60 amino acids found in these proteins.
The rest of the
proteins may be very different, but this 60 amino acid piece is
crucial for their function.
The homeodomain is a helix turn helix DNA binding domain that
recognizes a specific
DNA sequence. The homeodomain targets the remainder of the protein to
regulate the gene
expression of any genes with the appropriate recognition sequence in
their control regions.
There are at least 50 homeobox genes in Drosophila. They fall into
two main divisions, the
complex superclass and the dispersed superclass. Those in the complex
group are found in
clusters, the dispersed group are solo genes.
One subset of these genes are called homeotic selector genes. In
Drosophila, there are 8
genes arranged in a series along 650,000 base pairs of DNA. This
whole region is called
the HOM complex. There are two smaller subsets of these genes in the
HOM complex,
the antennapedia complex (5 genes) and the bithorax complex(3 genes).
Other insects
have these genes all in one complex, so it looks as though the HOM
complex became split
in Drosophila.
Mutations in the 8 genes of the HOM complex cause large scale
mutations in flies. A
mutation in bithorax causes a fly to have an extra set of wings.
Mutation in antennapedia
causes a leg to grow where an antenna should be. These genes are not
master switches for
making wings or legs, but they specify position in the fly's body.
The order of the genes
on the chromosome is the same as the order of segments in the fly's
body where they are
expressed. The left most gene is expressed in the head, the right
most gene is expressed in
the abdomen. When a gene is deleted or mutated, the segment where it
is normally
expressed cannot tell where it is because its position clue is gone,
so it behaves like the
closest segment to it. That is why a bithorax mutation causes an
extra set of wings. The
segments adjacent to the bithorax segment dictated what should be
made.
An amazing fact is that these HOM genes have clear homologs in
vertebrates. These are
called hox gene clusters. Mice have four hox gene clusters on four
different chromosomes.
These are called HoxA ,B, C and D. HoxB has all the same genes as HOM
plus one more.
They are in exactly the same order. The other three segments are
missing some of the
HOM genes, but they have some extra homeobox genes not in the HOM
cluster.
The HOM cluster seems to have arisen by gene duplication of a
single homeobox gene long
ago. This cluster then was duplicated in total four times in the
lineage of vertebrates .
Some additional gene duplication and deletion resulted in the present
day set of Hox genes
in mammals. These genes specify position in mouse embryos, just like
they did in flies.
They seem to have a similar function to the HOM cluster in flies,
except it is more
complicated in mammals because there are four clusters. Two sets of
hox gene clusters are
expressed in limb buds in perpendicular directions. The gene products
from one cluster are
expressed along a left to right axis in the limb bud(HoxD) and the
other gene cluster is
expressed top to bottom in the same bud(HoxA). This creates a
checkerboard pattern that
makes each position in the limb bud unique, like the elements of a
mathematical array that
are described by x and y coordinates. If a single gradient in the fly
can specify the
development of different symmetrical segments, like head, thorax and
abdomen, then a
dual gradient in the limb bud can specify the development of
asymmetry in the limbs,
things like the bones and muscles of the hand, the layout of nerves
and blood vessels, what
is to be skin and fingernails.
The HoxA cluster in mouse has 11 genes, Drosophila has eight genes
in the HOM cluster.
HoxA has added three extra genes. Probably, if one looks back at
simpler organisms there
will be some that have fewer homeotic genes in these clusters, or
fewer clusters. The
Annual Review of Biochemistry 1994 has an article on homeodomain
proteins (Vol. 63,
487-526). There, evidence is cited for one hox cluster in acorn worms
(a hemichordate),
two hox clusters in amphioxus (a cephalochordate) and three (or 4) in
lamprey (a primitive
vertebrate). It is tempting to extrapolate that gain of hox genes in
a cluster increases the
complexity of an organism by allowing additional segments to be
specified. Initially these
would be just like adjacent segments, but there would be opportunity
to evolve into more
specialized functions. For example, if there are three sets of legs
in insects, could another
set of legs be added just by duplicating a hox gene that specified a
leg segment of the body?
What do the hox gene clusters of spiders, centipedes and millipedes
look like? Are there
dozens of duplicated hox genes that specify many identical segments?
This provides the
possibility of macroevolution. Duplication of hox genes, or whole hox
gene clusters,
followed by deletion and mutation might alter a species very
dramatically in a short time
period.
Another homeobox gene in Drosophila is eyeless. This gene appears
to be a master switch
gene that turns on eye formation(see Science 267, 1766-1767 and
1788-1792 1995). If
eyeless is expressed in tissues where it normally would not be
active, whole functional
eyes form. These may be on the end of antenna, on the wings or on the
legs. This gene
has homologs in mouse (small eye, Pax-6) and man (aniridia) that also
affect the formation
of eyes. In fact, the mouse gene can substitute in Drosophila for the
eyeless gene. This
means that eye formation is controlled by a gene that evolved before
eyes evolved.
(invertebrates and vertebrates diverged 600 million years ago). The
common ancestor
apparently had a light sensitive tissue that later evolved into
different types of eyes in
insects and vertebrates. Eyeless controlled the development of that
light sensitive tissue
and it has continued in that role for at least 600 million years.
A recently discovered gene called Manx is not a homeobox gene, but
it is a zinc-finger
transcription factor and it is another candidate for a master switch
gene. (see Science Nov.
15, 1996, p. 1205 and news section) The Manx gene is found in
tunicates, a type of
primitive chordate. These organisms start life as a tadpole like
creature with a tail and
notochord. During maturation, they lose their tail and become sessile
on the sea bottom.
Manx is the gene that controls the tail formation. If it is mutated,
the tail never forms. This
was demonstrated by William Jeffery and Billie J. Swalla, who found
two closely related
tunicates, one with a tail and one without. They bred the two and the
hybrid had a small
tail, suggesting that a single gene was responsible and one
functional copy could turn on
the pathway. The gene was identified and its expression was blocked
by antisense RNA in
the hybrid embryo. When this was done, the tail did not form. They
are now looking for
homologs in more complex vertebrates. Manx is similar to eyeless, in
that it turns on a
whole developmental program to form a tail. Of course the hunt is on
to see what gene
might control Manx and which genes lie downstream of Manx to effect
tail development.
A second way to bring about macroevolution is polyploidy. Xenopus
laevis has twice the
DNA of Xenopus tropicalis, by a genome duplication. This gives
Xenopus laevis a lot of
DNA to experiment with and try out new functions for old duplicated
genes. One
consequence of doubling the number of genes is an increase in size.
Xenopus laevis is
much larger than Xenopus tropicalis, perhaps due to a gene dosage
effect. Plants are often
tetraploid or hexaploid, again giving evolution a lot of material to
work with.
Mitochondrial DNA mutates at a rate that is about 17 times greater
than nuclear DNA. This
is probably due to lack of effective repair mechanisms. Because this
DNA changes so
rapidly, it can be used as a monitor of evolution on a time scale of
a few hundred thousand
years rather than millions or billions of years. It is a fast
molecular clock. In addition,
mitochondria are inherited maternally, so there is no genetic
recombination to account for.
The line of descent is direct from mother to mother, because fathers
do not contribute
mitochondria to the egg on fertilization. By comparing DNA from
people from around the
world and especially in Africa, it is possible to build a tree
showing the divergence of
humans over time. This tree can be rooted by using chimpanzee
mitochondrial DNA as an
outgroup sequence, and the time of the last common ancestor can be
estimated. This was
first done in 1987 (Nature 325, p. 31), but it was criticized for
inadequate sampling of
populations and weak methods for making the tree. The process was
repeated with DNA
sequence from 189 people (121 from Africa) using a hypervariable
region of mitochondrial
DNA. (Science 253, 1503-1507 1991). The results were similar in both
cases, though the
critics could not find as much to fault in the second paper.
The results were, that the 14 deepest branches on the tree were
all of African origin. This
implies that modern humans evolved in Africa. The time of the last
common ancestor of all
human mitochondrial DNA types is 166,000 to 249,000 years ago
assuming that
chimpanzees and humans diverged from 4-6 million years ago.
One inference from this conclusion is that there was one woman
whose mitochondria gave
rise to all present human mitochondrial genomes. This is the concept
of a mitochondrial
Eve. This idea was immediately misunderstood to mean that there was
only one woman
alive at this time. The result does not suggest that. Such a finding
would create a
tremendous genetic bottleneck in human history. What the evidence
does say is that the
present population of human mitochondrial DNA did have one founder
mother. She was
the lucky one whose mitochondria have survived 200,000 years. All her
contemporaries
have had their lineages fizzle, by not having children, or not having
female children to be
more specific. There could have been a large population of humans
200,000 years ago, in
fact the next thing we will discuss is exactly how big was this
population.
Polymorphisms are fixed sequence differences in a population that
make up more than 1%
of the population. The HLA locus in humans corresponds to the MHC
locus in mice and
other vertebrates. This is a highly polymorphic region with about 100
genes. One of these
genes is the DRB1 gene. 59 alleles exist in humans and 60 non-human
primate sequences
have been determined. A tree of these sequences gives an estimate of
the time for a last
common ancestor of about 60 million years. (see Science 270,
1930-1936 1995) It is
important to point out here, that these 59 alleles are different
sequence variants of the same
gene in humans. These do not represent different genes.
To carry that many polymorphisms in a population, the absolute
minimum number of
individuals would be 30, one diploid person for every two alleles,
and they would all have
to be herterozygotes, each with a different allele. This situation is
very unlikely. Six
million years ago, there were 32 lineages of the DRB1 gene, with a
minimum population to
carry this number of alleles being 16. Again the actual population
would have to be much
greater than that. There is a theory dealing with polymorphisms and
population size. This
is called coalescence theory. If the time of coalescence is known,
and the number of genes
is known, then this theory will predict what the population size must
be for this to happen.
The results of simulations show that for 60 genes to persist for 1.7
million years (time of
humans as Homo sapiens) the population size would have to be about
100,000. If it was
less, many of the alleles would become lost over that many
generations.
The numbers are not incompatible with a mitochondrial Eve
hypothesis, because
mitochondrial Eve is only considering the inheritance of one small
piece of DNA,
equivalent to a single gene. It must be true that a single woman of
about 200,000 years ago
is the mother of all of our mitochondrial DNA, but it is not true
that she is the mother of all
our other genes.
Males do not have to be left out of this analysis. Portions of the
Y chromosome are unique
to males and are inherited paternally. As long as it is out of the
pseudoautosomal region, it
cannot recombine with other alleles and so it is very analogous to
mitochondrial DNA,
except it evolves at a much slower rate. To do these same types of
calculations with Y
chromosome DNA, a 729bp fragment of the ZFY gene has been sequenced
from 38 men.
There was no difference found. The numbers of samples is too small
yet, or perhaps a
more variable region should be used, like an intron. Even with no
differences detected, the
divergence of the ZFY gene between humans and great apes can give a
rate that is usable
with coalescence theory. In this case the number of alleles is two,
one in humans and one
in apes. The estimate for the time of the last human common ancestor
ZFY gene from these
assumptions is 270,000 years.
The sequences of several genomes are now available.
Go to Genome Projects Page
Mycoplasma genitalium is the smallest known genome that is not a
virus. It codes for 468
proteins, that have been called the minimal set for life. This is not
strictly true, since there
are probably some genes in this set that are specific for M.
genitalium and won't be found
in other unrelated genomes. Mushegian and Koonin compared the M.
genitalium genome
with the H. influenza genome (1703 protein coding genes)to identify
those genes common
to both.(see PNAS 93, 10268-10273 1996) These are gram positive and
gram negative
organisms, so they diverged about 1.5 billion years ago. Any
homologous genes are
probably essential. They found 240 genes. Some essential genes were
missing, because
the same function in some pathways was performed by different
non-orthologous genes in
the two organisms, so these missing genes had to be accounted for.
That added back 22
genes. Then they looked for redundant functions and parasite specific
genes, since both
organisms are parasites, and subtracted 6 more genes to get a total
of 256 protein coding
genes as the minimal set.
The authors point out that parasites import metabolic
intermediates, but they do not import
proteins, so the minimum number of genes illustrates what must be
done after all possible
intermediates are imported from a rich environment.