Neutral evolution occurs when genes do not experience natural selection because they have no effect on reproductive success. Neutrality arises when mutations in an organism's genotype cause no change in its phenotype, or when changes in the genotype bring about changes in the phenotype that do not affect reproductive success. Because neutral genes do not change in any particular direction over time and simply "drift," thanks in part to the randomness of meiosis, they can be used as a sort of molecular clock to determine common ancestors or places in the phylogenetic tree of life.
Stearns, Stephen C. and Rolf Hoekstra. Evolution: An Introduction, chapter 3
January 21, 2009
Professor Stephen Stearns: The lecture today is about neutral evolution. So let's get going on that. I want to remind you, when people think about evolution they often think that it's only natural selection. But it's not. It is both micro and macro. So macro gives us history and constraint, and micro consists basically of natural selection and drift; and developmental biology is involved in both.
So what we're going to talk about today is basically neutral evolution. What happens to genes or traits that are not experiencing natural selection because they're not making any difference to reproductive success? There's actually a lot of that that goes on, and it's very useful that it does. It gives us a baseline, it gives us a method of measuring things, and it gives us a lot of information about history.
So there are going to be three messages I want you to remember today. One is going to be how meiosis is like a fair coin. The probability that a gene will get into a specific gamete in meiosis is 50%. The second point is how the fixation of a neutral allele in a population is like radioactive decay; and it's like it in this sense: in neither the case of the fixation of neutral alleles, nor in the case of say looking at a gram of uranium-238, do you know which mutation will be fixed or which atom will decay. But, because there's so many of them, in both cases you know very precisely how many events will happen in a certain period of time. Okay?
This is a kind of law of large numbers for random events. If a lot of random events go on, the average is a very predictable thing. But if you just examine one nucleotide in a genome, or one atom in a gram of uranium, you can't predict when it will mutate, when it might be fixed, when it will decay.
The third thing that I want you to remember is this regular fixation of neutral alleles, this steady process whereby if you look at an entire genome, over a given period of time--10,000 years, 100,000 years--a certain very predictable, average number of mutations will be fixed if they're neutral. So if you can locate the neutral ones in the genome, you can use them to estimate relationships and the times to the last common ancestors. Okay?
So there's actually some interesting, rather abstract and rather big ideas in this lecture. Randomness is not something that everyone finds intuitive. Our brains are apparently not designed by natural selection to deal extremely well with Las Vegas, or the stock market. Okay? So we need to hone your intuition a bit about how random processes work.
By the way, people who do really well in calculus and analysis often find their introduction to probability and statistics a little confusing. The thing that's going on here is that you have to learn to think about entire populations of things and about distributions and frequencies of things, rather than about billiard balls hitting each other on a table or planets being attracted to the sun, by gravity. It's a different kind of thinking. It's population thinking.
So the outline of the lecture is a bit about how neutrality arises. I want you to know mechanistically why it is that some genes are neutral; the reasons why genetic variation might not produce any variation in fitness--that's what we mean by neutral, there's variation at one level but it doesn't make any difference to reproductive success; the mechanisms that cause random change; and then the significance of neutral for molecular evolution. And now I'm briefly going to mention maladaptive evolution so that you can see how it is that an evolutionary process can actually result in a situation where organisms are not well adapted to their habitats. And with that we will have covered the major possible outcomes of evolution: adaptation, neutrality and maladaptation.
Okay, here's a nice abstract diagram to explain why neutrality arises. What I want you to imagine is a genotype space in which all possible genotypes for that organism might occur. Just think about that as being all the different ways that you might have been constructed if all of the possible recombination events in your father and mother had produced all the possible gametes and all the possible zygotes. There's a genotype space for you.
Many of those genotypes will produce the same phenotype, and that's because many of the genes and many of the nucleotides in the genome, many of the DNA sequences in the genome, are not making any difference to the proteins that are being produced. There are other things going on and we'll run through them. Many phenotypes have the same fitness.
How many of you come from one-child families? Okay, all of your parents have the same fitness. How many from two-child families? All of your parents have the same fitness. Okay? This happens a lot. Basically when we say that many phenotypes have the same fitness, we just mean that in any population there will be a lot of organisms that all have two offspring or all have three offspring or something like that. The two offspring class all have the same fitness.
Then when we look at the whole halfway [this would only make sense when looking at the figure] here, we can see that G1, G2 and G3 are neutral with respect to each other, when measured in a certain environment, but they differ from G4. So here we have a lot of genetic variation that's neutral, and it's neutral for various reasons. We're going to run through some of those reasons.
First, some of the mutations in DNA sequences are synonymous. That means they don't produce any change in the amino acids that are coded in the proteins. Secondly, there are pseudogenes and other kinds of non-transcribed DNA in the genome. A pseudogene is a gene that resulted from a gene duplication event sometime in the past and never got used to make anything. And if you go through an entire genome, which you now can do for many organisms, looking for these things, you will find that they're all over the place.
There have been many gene duplications in the past, and some of them resulted in genes that were then acquired by selection and used developmentally for some function. Others were not. The pseudogenes are the ones that weren't used. Their usual fate is to be eroded by mutation. So gradually the useful information that was once in them gets destroyed by mutation, and if they sit around long enough they are no longer detectible; you can't tell anymore that they were once really a functional gene, before they got duplicated.
There's neutral amino acid variation, for a variety of reasons. Some amino acids have very similar molecular size and charge properties, so that if you substitute them in a protein they don't really make much difference to the shape or the charge distribution on the protein. And if you look at a whole protein, which is usually a pretty big thing--say if it's an enzyme--normally it will have an active site that is in a very small spatial portion of it, so that the amino acid substations that are occurring right at the active site are making a big difference to its function, and then potentially down the line to fitness, and the amino acid substitutions that are occurring a long way from that active site are having little impact on the function of the protein, even if they have a different size or a different charge structure.
So there's neutral amino acid variation, and finally there's something which is a little bit more abstract, and basically it's abstract because we don't understand it very well--it's a real phenomenon but we don't always know what the mechanisms are--and that is the canalization of development. So I'll run through these and then try to explain canalization a little bit in a few slides.
Here is, uh, the genetic code, and basically you can see here the nucleotide triplets that are translated into the various amino acids. And the take-home point, the first take-home point from this, is that for any particular amino acid--phenylalanine, for example, here there are two codes for phenylalanine, and look there are six codes for leucine. So any changes within this set of nucleotide sequences produce no change at all in the amino acid that goes into the protein. They are neutral with respect to each other, because they're synonymous.
And you can get some hint of another level of synonymity by looking at the classes of positively-negatively charged amino acids, aromatic amino acids and so forth. Substitutions between aspartic acid and glutamic acid, that are both negatively charged, are less likely to make a fitness difference than a substitution say of lysine, for glutamic acid. So there is a level in the protein as well.
The pseudogenes I've talked about a little bit. They are not transcribed and all of their nucleotides are free to diverge at random. That means that there is no real editing process going on--natural selection isn't preferring one mutation to another. It's not any more likely to turn up in children or grandchildren than another. This gene has been turned off, and it will inevitably get eroded because all DNA sequences are subject to mutation and if a mutation occurs in a pseudogene, there isn't any particular reason for repair mechanisms to pay any more attention to it than they do to anything else. Okay?
So these things are not especially repaired by repair mechanisms and they're not at all repaired by natural selection. So this comment will apply to a lot of the DNA that's not transcribed. Now fifteen, twenty years ago, when this class of DNA was discovered, people labeled it 'junk DNA' because they didn't think it did anything, and of course it's been then the pleasure of younger scientists to show the older ones that this stuff actually often does have a function--usually it's a regulatory function. Some of it makes small RNA molecules that are used in regulation, but some of it is also being used as, uh, sites and signaling pathways and helping to regulate development.
However, some of it truly is junk. For example, there is a steady process by which viruses of various sorts splice themselves into the genomes of their hosts, and this is part of the adaptive strategy of viruses that they are able to hedge their bets by sticking themselves into a genome and hanging around for awhile and then popping out, at a point which might be advantageous to them but inconvenient for their host.
However, it's a dangerous strategy because sometimes they stick themselves into parts of genomes that never get transcribed, and they never get out. So in fact the genomes of most of the organisms on earth are littered with the fossil skeletons of viruses. I read an estimate once that the human genome had a substantial percentage of fossil viruses in it. I have forgotten the exact figure at the time. This kind of thing was popular when DNA sequences were first starting to come out in large numbers. But just know that. Okay?
So there is junk DNA, and some of it's there because either fossil viruses or transposons, jumping genes, got into positions where they could no longer be transcribed, and they then become a graveyard. Kind of an uncomfortable thought isn't it, that you're just carrying around a viral graveyard? But you are.
Okay, neutral amino acid variation. I've talked about this a bit when I introduced the genetic code. So these are amino acid substitutions that aren't producing any change in geometry or any charge change in the geometry and electrochemistry of a functional site within a protein. And I'd like to talk a little bit about a very early case of molecular evolution; that's the case of alpha-globin. So your hemoglobin has two alpha and two non-alpha chains. It has a beta chain if you're an adult and it has a gamma chain if you're an embryo. The reason it changes from a gamma to a beta is to change the oxygen binding properties, because embryos have to suck oxygen out of their mother's blood. Okay?
If we look at these alpha-globin sequences, across a pretty broad range of vertebrates, and we take samples in such a way that we can look fairly far back in time, we can date these branch points approximately from the fossil records. Okay? So dogs and humans shared an ancestor probably somewhere late in the Cretaceous, mid--late mid-Cretaceous. Our last common ancestor with the kangaroo was at about 140 million years perhaps. The mammals were there while the dinosaurs were there. They were just small little guys, but there were mammals there. Our last common ancestor with the shark is back at about 440 million years.
So take the sequences for all the alpha hemoglobins that you pull out of these things--it's a convenient molecule, you just need a blood sample--and plot them on a graph. So you estimate the time from the fossils and you estimate the average differences. This "k" is a measure of amino acid differences in a protein, and the straight line is what you would expect to get if the rate of amino acid substitution is random, just uniform, just steady. Okay?
It's pretty close to the line. There are some deviations. But this is some of the earliest evidence--this was before DNA sequencing became easy, this was when protein sequencing was easier than DNA sequencing--this was some of the earliest evidence that there's something like a molecular clock. In other words, if we got a vertebrate that we'd never seen before, living in some forgotten jungle, and it had a weird morphology and we didn't know who its relatives were, and we wanted to find out when it might have shared an ancestor with something that we had, and it plotted right here--its difference with something that we were comparing it with right now, plotted right here--then we would have a good estimate of time to last common ancestor, for that new, undiscovered species, based on the assumption that it was experiencing evolution like all these other guys.
Okay, the fourth reason why genetic variation might be neutral is canalization. Now canalization in general means that there are developmental mechanisms that are limiting the range of phenotypic variation, so that even though there is a mutation in the genome, or there is a disturbing environmental effect on a genetically controlled pathway, that you're still going to get the same phenotype.
Some things about your phenotype are extremely stable. They do not respond to mutation much at all. The fact that you have four limbs, the fact that you've got five fingers; things like that are ancient and stable and there are developmental buffering mechanisms that keep them that way. So these things, these canalizing mechanisms, resist the tendency of variation in either genetic or environmental factors to perturb the phenotype; they keep it in a stable state.
So what happens to the genes that are forming this phenotype but they're being buffered by these developmental mechanisms? Well they are then freer to accumulate neutral variation, because basically the fitness consequences of a mutation in those genes have been removed, they've been buffered out. Now there's been a lot of speculation about why canalization might evolve, or whether it might just be a byproduct. And frankly in most cases we have no idea. This is an open research question.
So one of the reasons people think that say whole organism traits, like say five fingers or four limbs, might be buffered is not because of selection to buffer those traits but because there are very, very strong selection forces operating at the micro level within cells on gene signaling pathways. So you buffer those, and then as a byproduct of that you get buffering at a higher level. We don't know what's the case, but we do know that canalization exists and we do know it has a consequence; it allows hidden genetic variation to accumulate. So that's the fourth major reason why there can be neutral genes.
Now, what causes random or genetic drift? That will generate neutrality, but then what happens to the genes that are neutral? Well these are the mechanisms that can introduce randomness into evolution; most of them, there probably are a few others.
The first is mutation. The second is the Mendelian lottery, which is the idea that meiosis is like a fair coin. Then we have some population level effects. So mutation you can think of as a molecular event. The Mendelian lottery is a cellular event. Founder effects and genetic bottlenecks are population effects. And then we have a demographic effect, which is variation in reproductive success in a population of any size. All of these things contribute to random change. And now I want to step through them and give you a more concrete feel for how they work.
There are some senses in which mutation is not random. Okay? Mutations occur at some sites more frequently than others. In a pathogenic bacterium that is encountering a challenging environment, it will up its entire mutation rate by down-regulating its DNA repair. It's a fairly simple thing to increase the mutation rate on a whole genome. You just neglect to repair it and it will mutate faster. Okay? So if bacteria are moved into a new environment or, for example, if a pathogenic bacterium is put into a vertebrate with a very active and threatening immune system, it increases its mutation rate.
The transitions between the nucleotide classes--so purine to purine, pyrimidine to pyrimidine--are more frequent than transversions. So purines will mutate to purines more frequently than purines will mutate to pyrmidines.
And mutations do not produce random changes in phenotype space. This one again is a little bit abstract. Okay? But a mutation can only cause a change in the inherited set of possibilities. There is very, very little mutational variance in the human population for a sixth set of appendages, growing in the middle of our backs, that could be turned into the wings of angels; very little. Okay? There is very little mutational variance in a clam for any organ that could be involved in air breathing.
So mutations do not cover all of conceivable phenotypic space. Mutations are only causing perturbations in the inherited set of possibilities that a given evolutionary lineage has produced. So they're not making random changes in phenotype space. But they are random in an extremely important sense. There is no systematic relationship between the phenotypic effect of a mutation and the need of the organism in which it occurs. They're random with respect to fitness.
So when those bacteria are going into the vertebrate immune systems and it would be extremely convenient for them to have a mutation that was just exactly the right thing that they needed to avoid that particular defensive maneuver on the part of their host, they don't get it. Okay? All nature will give them is random mutations with respect to that particular function, and then if they have a lot of progeny, one of them may have the right one by luck.
Similarly, in your case, it might be extremely convenient for you to have an adaptation which allowed you to look at a computer screen for 48 hours without getting a headache and without having to get up to go to the bathroom. Okay? That mutation is not going to happen, because you need that function. Your genome is going to be covered by random mutations, and it may very well be that one of your children is able to look at that screen a little bit longer than you are. But that will be because it happened at random, not because somehow development or evolution could anticipate that that function was going to be useful.
So the process of mutation produces a lot of variation, and then natural selection edits it, it sorts it, it screens it. And at the point at which that variation is produced, the potential function of the variation is not a question, it's not an issue; it's just making variations.
Okay, second, meiosis is like a fair coin. So this is something that you may find boring. You've all heard about meiosis. You've all heard about Mendel's Laws. You know that the probability that a gamete will get into a particular--that a gene will get into a particular gamete is 50%. And you're all familiar with this because you know that the probability that a child will be a boy or a girl is 50%, and that's because at the sex chromosomes, and at all the other chromosomes that we have, the probability that the chromosome will go one way or the other is 50%.
That is absolutely amazing. Why is it that my Y chromosomes don't get 80% of the action? Why is it 50%? There's actually something very deep here. If you construct a system in which every one of the potentially competing elements has been forced to have the same chance, those elements must then cooperate, because the only way they can increase their own chances is by increasing everybody else's as well.
And that is why this particular effect is called the parliament of the genes. It is a discovery that Nature, about probably two billion years ago, hit upon a principle that human political science didn't discover until the Enlightenment, which is that democracies are stable. Meiosis is a democracy. In meiosis each gene has a fair chance, and that means that in a sense you've got a one-gene, one-vote situation.
So I'll come back--I'll come back to this fairness of meiotic segregation, but there's a general idea behind that. I've just given you a little scenario that would suggest why it was selected; it was selected to repress conflict. Every other aspect of genetics has evolved. So when you take genetics, or you take cell biology, or you take developmental biology, there were--there were selective processes that produce what you study, and there were alternatives that were rejected, and you're looking only at a sample of what nature can produce. And that in itself becomes an interesting research program.
Okay, back to the parliament of the genes. I referred to conflict. Here's the conflict. There are things called meiotic drivers. So there are genes which actually change Mendel's Laws; they change the probability that they will get into the next generation. Anybody already heard how a meiotic driver works? It's kind of a cool system. They use a long-range poison and a short-range antidote. So a meiotic driver usually operates by killing off any cell that does not have a copy of itself, of its gene, and giving an antidote to its own cell.
So as the cells sit there, in the ovary or in the testes or in whatever organ that particular organism has, the biotic drivers are basically wiping out the competition and promoting their own interests. These things are all over the place. They are common in drosophila, and there is evidence that there have been meiotic drivers in the human genome. Okay?
Once the diploid state evolved, there was a long history of invasion by meiotic drivers, and the response to that is that all the other genes wanted to cause these meiotic drivers to go away. They were distorting their own interests. You're sitting there on a chromosome, you're innocent. Some wild bandit comes along and highjacks your interests, and now your probability of getting into the next generation is only 20% rather than 50%. Who wants that, you know? That's not a good deal. So throughout the genome various mechanisms arose to repress meiotic drive; and the result was a very complicated mechanism and we call it meiosis.
So that's not the only possible reason for the complexity in fairness of meiosis. It is a plausible one. I invite you to consider the cultural evolution of democracy and decide whether it too might have been driven by a history of cheating, particularly the defection of leaders who no longer represented the interests of their people. I think there's a similarity, and I think you'll find it articulated in the Declaration of Independence.
Okay, mechanisms that cause random change also occur at the population level. One of them is the founder effect. Let's suppose that I were to found a new population with only you; it would have a high probability of blue eyes. And with you it would have a high probability of brown eyes. And in order to choose you I flipped a coin. Okay? At the founding of that population there was a random event, which was just sampling; just sampling a couple of individuals out of a big population.
And the result of this is that there are certain diseases, human genetic diseases, that are rare in the human population in general, but are common in populations that were founded by just a few people, including Tay-Sachs disease in Quebec, porphyria in the Afrikaners of the Cape, and diabetes in Pitcairn Island. So you just take a little sample out of a big population and you get something that's not representative, and sometimes that contains a genetic disease.
Another population level phenomenon that yields randomness is a bottleneck. So that will happen when a population crashes to a very, very small size, and then only a few alleles make it through. So you might have a lot of versions of a gene in a big population, but if you're only founding a new population with two or three individuals, they're--and they're diploid, well two individuals only carry four copies of the gene. So if there had been twenty alleles in the original population, the maximum possible number that could get through that bottleneck is only four; you've left behind sixteen.
It appears that this is what happened with the cheetahs. And they are apparently almost completely homozygous, particularly with respect to their immune genes. It is a weird biological fact that you can take a skin graft off of one cheetah and graft it to a cheetah, any other cheetah in the world, and the graft will take. In other words, their immune system finds a sample of skin from any other cheetah in the world to be their own skin. They don't detect a difference. And that probably is a signal that cheetahs went through a very small population bottleneck within the last few thousand years.
Genetic drift is then a consequence of neutrality. It's the random wandering of the frequencies of neutral genes. If you look through a microscope, Brownian motion is the jiggling of little dust particles that you see in the microscope, and it is actually the result of the random impacts of water molecules hitting that dust particle. Well the population level analog of heat in water is variation in population size--uh, excuse me, variation in family size. A gene which has gone through the Mendelian lottery of meiosis lands in a zygote. Okay? It got into the zygote. The zygote grows up.
This particular gene is neutral. It's not making any difference to reproductive success. But that particular individual that it landed in could have a small family or a big family, for reasons that have nothing to do with the function of the gene. It's just a flip of the coin that determines whether it will be in a family that produces two children, zero children, or a lot of children. Okay?
So that's what I mean by combining the lottery of meiosis with variation in reproductive success. And this is a process that goes on in all populations. When people are first learning about genetic drift, they think oh, that's something that happens in small populations, because small populations don't have all the smoothing effects of the Law of Large Numbers. But this will happen in a population of any size. Okay? And basically what I mean by that is this interesting consequence of variation in reproductive success. If it's correlated with a trait or with a gene, strongly, it produces natural selection. If it's not correlated it produces drift.
So one of the real puzzles of evolution has to do with what causes a gene to end up at random in an individual making one, two or three, or zero recruits per lifetime; what makes the difference between an adaptive and a neutral gene. I've sketched four possible answers to that question. In any particular case we normally do not know exactly which one is contributing the most to that.
So, what is it that happens to neutral alleles? [That's not going to work. I'll just have to draw over this.] If we draw time on the X-axis, and we draw frequency on the Y-axis, and a mutation occurs, the usual thing that will happen to a mutation is it will increase a little bit and disappear. Then we wait for awhile, another mutation occurs. We're looking, by the way, across many different genes in the population. We wait awhile, another mutation occurs. It comes into the population.
The probability that it will ever get fixed is pretty low because the probability is proportional to the frequency; excuse me, is proportional to 1/N, frequency equal to 1/N. When it's rare, its frequency is very low and so its probability of being fixed is low. But once in awhile a mutation comes along that manages to go through all of this drift, and making it through organisms that had, on average, more than two progeny per lifetime, and it gets fixed.
And if you just look at this class of mutations, the time that it takes them to fix is proportional to the population size. So things will get fixed faster in small populations than they will in big ones. There will be more of them, more mutations will occur in a big population, but it will take them longer to get fixed.
Now because the bigger populations have more mutations, it turns out that their size exactly compensates for the longer fixation times. So if you're just counting how many get fixed--it doesn't matter whether you're in a small population or a big one--the same number of mutations are getting fixed in both cases. That means that over the course of evolutionary history populations could've gone through crashes and explosions, and at the end of it, if you're a geneticist studying the DNA, looking back, it doesn't make any difference that the populations had crashes and explosions, in terms of how many neutral alleles got fixed. They were just getting steadily fixed, with no effect of population size.
So we don't know which one will be fixed. We do know how many will be fixed. So this is why the molecular clock is like an atomic clock; it's driven by radioactive decay. We don't know how many atoms--we don't know which atom will decay, but in a second we know how many will, for a given radioactive substance.
The reason for this is that there's regularity in large numbers. It emerges because there are a large number of independent events. Our haploid genome has about three billion base pairs. One mole of uranium has about 6 times 1023 atoms--actually if it's a mole it has exactly that many atoms--and these large numbers give the regularity to the process.
Okay, so this is what connects microevolution to macroevolution. It creates uniform substitution rates in neutral portions of the genome. And this is the assumption that molecular evolution makes when it reconstructs the Tree of Life. It allows us to estimate branch lengths and branch points to last common ancestors. It allows us to make comparative inferences on phylogenetic trees. And therefore neutral evolution is a actually a central tool in the construction of the evolutionary framework. It's not something to be neglected; it's something to be understood, because it gives us a source of regularity that can take us back into deep time.
As an example, here are nucleotide substitutions occurring in flu. These are isolates that are still in the freezer. Okay? And they run here from about 1925 up to 1990. We don't have any, any error of estimate in age; we know when they were isolated. Okay? The population sizes have fluctuated dramatically. At some point, some of these flu strains were sitting in a few ducks or pigs in southeastern China. At other points, they were inhabiting a billion people around the world. They went through huge fluctuations, and a nice steady rate of substitution. Okay?
All the mechanisms of genetic drift are in play here, except meiosis, because flu is a virus, doesn't go through meiosis. The effect of variation in population size was exactly compensated by the much slower rate of fixation of neutral mutations in larger populations. So even in an epidemic disease, like flu, the molecular clock is nice and steady.
A few caveats about that. Different proteins and different parts of proteins evolve at different rates. They only use non-transcribed DNA sequences. There are some differences among lineages because of different generation times.
And I'm not going to talk about maladaptation because I took too long to talk about neutrality. So you can read about maladaptation, and I'll just give you the basic idea. Here's the basic idea of maladaptation. If natural selection is strong in one place and organisms get really well adapted to it, but they move to another place, where they don't do well, for whatever reason, we call the place that is producing an excess of organisms the source, and the place which is not good for the organisms a sink. The genes in the sink represent organisms usually that were adapted to the source. So if organisms get well adapted in one place and moved to another that's quite different, and they never get an opportunity to come into evolutionary equilibrium with that new place, which we call the sink, then they are maladapted to the sink. That is the basic idea behind how maladaptation can occur. Okay?
So, let me jump ahead. I will just run quickly through these examples and get to the end, just to let you know what happens next time. These are the keys I want you to remember. I want you to remember how it is that meiosis is like a fair coin. I want you to remember how the fixation of a neutral allele is like radioactive decay. And I want you to remember that the regular fixation of neutral alleles generates a molecular clock that allows us to connect micro to macroevolution. Okay, that's it.
[end of transcript]