The Tree of Life must be discovered through rigorous analysis. Genetic information is crucial because appearances can be deceiving, and species that look similar can prove to be genetically very dissimilar and not share recent common ancestors. Two criteria, used to determine what the "correct" Tree is, are simplicity and whether the tree maximizes the probability of observing what we actually see.
Stearns, Stephen C. and Rolf Hoekstra. Evolution: An Introduction, chapter 13
February 16, 2009
Professor Stephen Stearns: Very good. So today we're going to talk about phylogenetics and systematics, and the lecture has this kind of structure. I'll remind you of what the Tree of Life looks like. Then I will motivate the lecture basically by giving you some surprising recent results from molecular systematics, and then I will go into basically what phylogenetic concepts are and how to build a phylogenetic tree. I won't do this in great detail, but I hope I do enough of it so that you at least have a good feel for the issues that are involved when you do this.
So this is the same picture of the Tree of Life that I used earlier in the course, and basically it shows you that since about 3.5 billion years ago, there have been three large clades that have developed. [Technical Adjustments] Take a moment and look at this picture, and look at the next one--which is a--this next one is essentially a blowup of what's been going on since about here--and think about how much that tells us about biology.
It provides a very basic structure, the structure of relationships. It tells you which things shared common ancestors, and why we might expect them to be one way rather than another way. It sets up thousands of comparisons in our minds about questions that we might ask. It provides an extremely useful, overarching structure. But the question is, how did phylogenetic biologists actually get this picture, and are they still changing it? And the answer is: they got it with the methods of inference that I'm going to sketch today, and they're still changing it.
It's not written in stone and it's been changing ever since the first time somebody tried to write it down. So these things are working hypotheses, and we are able to get a better and better refinement of them as new information comes in, but there are significant changes.
Now this is what Darwin had to say in the Origin about the Tree of Life. It's a truly wonderful fact, the wonder of which we are apt to overlook from familiarity, that all animals and all plants, throughout all time and space, should be related to each other in groups. And he goes on about how these groups are hierarchical. He said, "The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile is largely the truth."
So in this beautiful Victorian prose, Darwin rhapsodizes a bit about the Tree of Life, and it was the only picture he had in his book. Okay? So he thought it was real important. He made a sketch of it, and by it, by this sketch, he meant to indicate that a lot of things had gone extinct and that through ancestry and shared common ancestry you could define relationship.
Now he could see immediately that the tree isn't given, it should be discovered. He said, "Our classifications will come to be, as far as they can be so made, genealogies;" So they will reflect what actually happened in terms of evolutionary history to generate relationship; "and then they will truly give what may be called the plan of creation. The rules for classifying will no doubt become simpler when we have a definite object in view. We possess no pedigree or armorial bearings." Okay? That's mid-nineteenth century code for there's no barcode on the species. Okay? They're not wearing their name on the forehead and they're not telling you who they're related to. So we have to discover and trace these diverging lines.
Well it actually has taken a long time for phylogenetics to settle on clear logic and clear methods. Up until about oh somewhere around 19--the ideas were there but they weren't really implemented until about 1965 to 1970, and there ensued a huge controversy that lasted for about twenty years. And now that all--the dust has settled and that seems to be in the past, but many people actually my age are still marked by it, because we witnessed it.
And I'm now going to more or less summarize what came out of it, but I just want to signal to you that this was, in the not very distant past, an extremely controversial area of science, and that it was revolutionized both by the advent of DNA sequence data, and by the development of powerful mathematical and computer methods for determining relationship; and there was an argument about which ones we should use.
So the first step in that is taken by a guy named Zimmermann, and it's just sketched here, and it seems very simple: Sharing a more recent common ancestor defines relationship. So magnolias and apple trees are more closely related to each other than either is to a gingko, because this point on the tree is later in time, and connects them, than this point on the tree, which connects them to gingkoes. It seems like a very simple idea. It wasn't really clearly articulated until Zimmermann laid it down as a principle in 1931.
Now some of these concepts you've already been through. So I'm going to simply mention these names again, just so that they get repeated and start becoming part of your own vocabulary. Monophyletic is a group that contains all of the descendants from a single common ancestor and nothing that is not descended from that common ancestor; paraphyletic groups are groups that do not contain all of the species descended from the most recent common ancestor; and polyphyletic groups really are a hodgepodge of stuff that shouldn't be in that sort of bin, under that category at all, because they have--basically independent evolutionary lines are being incorrectly lumped together.
The basis of a lot of this inference is the concept of homology, and homology and analogy--I'll repeat them in a minute when we get to some slides that illustrate them--but they were defined by Richard Owen in the early nineteenth century, before Darwin's Origin of Species. He was a great morphologist, in London, and basically the idea of homology is that a trait is identical in two or more species because they are descended from a common ancestor. So they got it because their ancestor had it.
And homoplasy, or convergence, is similarity for any reason other than common ancestry. So convergence in morphological traits, mutation to the same sequence, in DNA, will lead to homoplasy. So homology is helpful, and homoplasy is confusing in determining phylogenies.
So here's a good monophyletic group. Okay? This is the dogs. And here the dogs are. Here's Canis and Lycaon--so the Lycaon is the African Hunting Dog--and their closest relatives are the South American wolves. Canis, by the way, Canis--the timber wolf is in the genus Canis, and all domestic dogs are descended from wolves, which is a nice example of really rapid evolution; take a wolf, turn it into a Saint Bernard and a Chihuahua. I give you 5000 years. Can you do it? Well it takes pretty tough breeding to do that, but you can.
And, by the way, I had a colleague--Armand Kuris, who's at Santa Barbara, decided he was going to create his own race; he wanted to create the ugliest dog in the world. He was going to name it the Louisiana Swamp Dog. It took him six generations. I mean, this dog is really ugly. [Laughter] But it's registered with the American Kennel Club. So you can do rapid breeding with dogs. That's, of course, on a timescale up here that you couldn't even fit onto that little white line at the top of the picture. Interestingly, all this stuff over here is extinct.
There are a few little marks here that are kind of interesting. Out here, on the branch going up to the Caninae, is digitigrady, which means that at that point they started chasing after things that were running fast, and evolution, just like it did with horses, started to cause them to be selected to run on their toes. So they got longer legs by running on their toe tips, rather than on the pads of their feet directly. And bone cracking came in right about here, so they could get the marrow out of bones.
That's a good monophyletic group. You probably can't see it up there. I'll give you a little bit of timeline. The whole thing starts taking off about 40,000,000 years ago, in the Eocene. And Canis, the dog genus itself, is about 5,000,000 years old; it's about as old as Homo.
Okay, here are some paraphyletic groups. Reptiles is paraphyletic. Okay? It's got turtles, lizards and crocodiles in it, but it doesn't have the birds. So reptiles is an inaccurate term. This is the monophyletic group--lizards, crocodiles and birds--and we don't have an everyday language term for it. Crocodiles, by the way, build nests and guard them, and when the baby crocodiles hatch out, they cheep like birds: "cheep, cheep." So there are--this relationship between crocs and birds is well established.
Here's a polyphyletic group. If we define a group called the homoeothermic tetrapods, it contains the birds and the mammals. But look at all of the things that are more closely related to the birds than the mammals are. So if we were to define the birds and the mammals as a group, it would be a false group, because phylogenetically the birds have many things which are more closely related to them than the mammals do. And this group is polyphyletic. Okay? It has contributions from two different sources.
Another good example of a polyphyletic group would be if you decided to link together all of the things that look like cactuses in Africa and South America. The ones in South America are cactuses, but the ones in Africa are euphorbs; they look just like cactuses. There's a nice example in the Peabody Museum. You can look at them sitting next to each other. Okay? That's a polyphyletic group. They are convergent, they came together.
Similarly, the homeothermy in the birds and mammals is convergent. The ancestor back here did not have warm blood. It evolved twice, once in the line to mammals and once in the line to birds. I know there were some-warm blooded dinosaurs, but that was later. Warm-bloodedness probably came in, in this line, about here somewhere; don't know exactly.
Then this central concept, homology. Here are the forelimbs of turtles, humans, horses, birds, bats and seals. So we've got some stuff here that's spanning quite a bit of the tetrapods; vertebrates that are living on land. And you can see that it's possible to match up sections of these structures, all the way through. And actually if you study that, and you realize that they were all in an ancestral condition together, you can see how evolution has changed their proportions, changed their thickness, but it hasn't changed their spatial relationships to each other.
And, in fact, if you go through the development, you'll discover that the same nerves coming out of the backbone are running to the same parts of the limb, and all of those conditions have been held together, over evolutionary time. And if you look at the HOX genes that are controlling their development, you can see, as you saw in a lecture a little bit earlier, that the DNA sequences in the HOX genes, that are telling it whether to make a humerus, a radius, an ulna or digits, are actually homologous in their DNA sequence as well. So there's a molecular homology that underlies the morphological homology.
And if you look at molecular sequences, here's a gene called aniridia in humans, and a gene called eye-less in fruit flies, and only six--these are not DNA sequences, this is protein sequence; so these are amino acids--only six of the sixty amino acids are different. The two sequences are 90% identical. There are search algorithms, like BLAST, that go out and look for these sorts of similarities. So that if you get a candidate gene or a candidate protein sequence, it's possible simply to put a search term in, to a search engine, and have other genes from other species pop up. So you can look for molecular homology that way.
A good molecular homology is the fruit fly homeobox complex and the human HOX complex--we talked about this earlier--where the sequence of the genes along the chromosome, and the parts of the body that are being controlled by those genes developmentally, are similar in humans and in fruit flies, and actually unite everything that you see here. So this sort of thing is a signal of shared ancestry, and it's the kind of molecular information used to construct the broader Tree of Life. So this is something that is linking together arthropods, annelids, mollusks, echinoderms and chordates.
Now analogy. Analogy or convergence is a misleading kind of information, because that means that natural selection has taken things that were evolutionarily independent, and which have sister groups, have relatives, that don't look anything like this, and then shaped both of those things to come together to a common form. So the dolphin and the ichthyosaur have a very similar fusiform body, and this is because of strong selection to swim rapidly in the ocean and to chase down fish and squid; which they both did.
And the analogy goes deeper than that. As you probably know, the dolphin has live birth, it's viviparous. So is the ichthyosaur. If you go to a striking museum, just south of Tübingen, in Germany, and look at the world's largest collection of pregnant ichthyosaurs, you can see an ichthyosaur that was giving birth at the point when it was fossilized. And they often had twins, or triplets.
So the definition of analogy is two things that look very much alike, even though they have many relatives that look quite different and are distant on the tree. So the dolphin is more closely related to a kangaroo than it is to an ichthyosaur, and an ichthyosaur is more closely related to a hummingbird than it is to a dolphin; nevertheless they look similar. So that's analogy.
So once people got their DNA sequences and their logic under control, and they started doing a lot of molecular systematics, they discovered some relationships that were kind of surprising, because analogy, convergence, had been covering up relationship, or because evolution had so radically changed the external appearance of these creatures, that it was very difficult to see who they were related to.
Here are a few of these insights. I'll bet that if I ask Alex or Jeremy or Katie or the other teaching fellows if they've got a favorite one, that they could probably come up with others as well. Maybe I will in a minute; so start thinking. Okay?
Pentastomids were a mysterious group of creatures, and they turn out to be closely related to fish lice. I'll show you a pentastomid in a minute. There used to be a group called carnivorous plants, the pitcher plants and the sundews, and people thought that the pitcher plants and the sundews were related. These are plants that are adapted to living under very low nitrogen conditions, and they need nitrogen to make all of their proteins, and they get it by killing insects and other things. Some of them can even kill a small frog. So it was thought that it was likely that this was a natural group and that they had all evolved these capacities in an ancestral condition, and that they were all related to each other. But they're not. It's happened several times.
It wasn't really clear where the whales had come from. One might have thought that well maybe whales are related to seals, or perhaps they're related to other aquatic mammals, like otters. But seals and otters are carnivores, and it turns out that whales, including the toothed whales, the active carnivorous dolphins and sperm whales, are ungulates. Okay? So there was an ungulate that used go around eating plants and it went into the water and it started to eat all kinds of other things. Okay? It stopped eating plants and it started eating fish, squid; and some of them eat a lot of crustacea, if they filter feed.
Sycamores. Sycamores, or plane trees, are the classic tree which is used to decorate the European plaza. Okay? If you like to sit out in the summer, in Italy or France, and watch the people go by, you're probably sitting under a sycamore tree. And they have a leaf that looks like a maple leaf, and if you just look at a sycamore--and by the way it has a kind of blotchy bark; so it has sort of white bark but with blotches on it--if you just look at a sycamore and you just look at the shape of the leaf, you might think, "Oh, these are related to maples." They aren't at all. They are in fact more closely related to water lilies.
Now those are some pretty radical surprises. Those are things that were buried in the DNA sequences, that were not apparent in the morphology, and they are not only testimony to the power of molecular systematics, they are testimony to the power of natural selection to change the shape of things in ways that are profoundly altering and create all sorts of mis-impressions about relationship.
So here are some pentastomids. This set of pentastomid worms here is actually crawling across the roof of a crocodile's mouth. They tend to live in the noses of crocodiles and dogs. They don't tend to live in humans, this is one of those yucky, creepy-crawlies that you don't have to worry about too much yourself. But it looks--actually it looks something like a beetle larva, or something like that; it looks kind of like a mealworm. But look, it's got segments in it, and it has some kind of funny structure inside.
It turns out that it's most closely related to this thing. Here is a fish louse on the outside of a triggerfish, and this is an isopod. Pentastomids are related to fish lice. They are not related to beetles, or to nematode worms, or to a lot of other things. And, in fact, they're nested within the fish lice. So evolution took something that looked like that and turned it into that. And like a lot of things, this probably was accomplished by things like crocodiles eating things like fish. And when the parasite that was living on the fish got ingested by the crocodile, it was more or less, "Oh my heavens, how am I going to adapt to the crocodile?" Well if it could fall of the fish and stay in the mouth of the crocodile and crawl up its nose, it can survive; which essentially is what these things did.
Here are two carnivorous plants. Okay? They are polyphyletic. Pitcher plants have evolved independently at least three times; flypaper traps at least five times. There are two groups of pitcher plants that are sister groups to two clades of flypaper traps; but others have other sister groups. So these are actually deeply cool plants.
The world hotspot of pitcher plants, if you have a deep desire to go collect a lot of pitcher plants, is Borneo, and it is no coincidence that Borneo is an island that has very, very nitrogen poor soil, and the trees and all the shrubs that live on Borneo, many of them have special adaptations to dealing with this low nutrient environment.
Things like flypaper and the sundews, the flypaper traps, they often live in bogs, which also are extremely nutrient poor. If you go to Bethany Bog here, you will find these living in Bethany Bog. It's a kettle lake that was left after the glaciers retreated. There's a mat of vegetation growing out over it, which means that in the middle of it, the plants are living right over water. The water is nutrient poor, and there are flypaper traps catching flies out there, to get their protein.
This is a chunk of the Tree of Life that shows you the radiation of the ungulates. And you can see that both the toothed and the baleen whales are nested within the ungulates, and their closest relatives are the hippos. So you should think that the ancestor of the hippo went into the ocean, probably about 35,000,000 years ago, and right here, marked on the tree, are some of the genes which have changed along these particular lines and which are signals of those relationships.
So the take-home point is that appearances are deceptive and detective work is needed. Now how do you do the detective work? How do you build a phylogenetic tree? Before I do that, do you guys have any favorite sort of phylogenetic--you have; Jeremy what's yours?
Teaching Fellow: Yes, gnetales, not being related to flowering plants, for me personally. Because gnetales have double fertilization, which is a very interesting innovation of plants, or flowering plants have. And actually ginkgos are gnetales in the gymnosperms, the pine trees. That personally--we learned that from Burleigh and Mathews [a paper published in 2004].
Professor Stephen Stearns: So how recently was that discovered?
Teaching Fellow: Like three years ago; three or four years ago.
Professor Stephen Stearns: Okay, see, the tree keeps changing. It's a moving target. It gets better. The basic branches are not moving too much, but out there on the tips there's still a lot of action. So how do you get it? How do you build a phylogenetic tree?
Well this is an important point. You need to have some characters; so those are states of traits. They could be nucleotide sequences. It could be whether the thing has scales or fur, or it could be whether it has a three or four-chambered heart. It could be a lot of things. So you need characters, and they need to have different states. And the characters that are informative are shared derived characters. I will go into this issue with a full slide, because this is a key point; the characters that give you phylogenetic information are the ones that everything in a group shares with each other, and it's different from the ancestor.
Well, you can only define derived by comparison with primitive. Okay? So primitive is like what it used to be, and derived is what it is now, somewhere on the tree, and you can't do that without a tree. So there's kind of a paradox. You don't have a tree, and if you don't have it, you don't have a way to determine what came first, and therefore you don't know about character polarization. Character polarization means knowing which state is primitive and which is derived; that polarizes that series of trait states. So there are number of ways out of this logical dilemma.
One is you look at all possible trees--this is, by the way, a huge computational problem, as you'll see at the end--and you choose the ones that are simplest. So that's the principle of parsimony. And it's a logical principle; it's not an empirical principle, and it's not necessarily the way that evolution operates. But given that there are many, many possible trees, choosing the simplest one basically is a way of saying, "This is how we're dealing with our ignorance."
Or you could choose the tree that would make it most likely that you would have observed the character data that you actually did observe; that's called the principle of maximum likelihood. And, in fact, in the computer programs and in the theoretical arguments that go on in phylogenetics, these are two of the main themes, and many of the methods combine them, in various ways.
Okay, so a bit about shared-derived characters. Remember that picture with the different colors, with the different parts of the forelimb? Having a forelimb that has a humerus, a radius, an ulna, carpals and metacarpals, in that sequence, really doesn't help us to distinguish bats from turtles. Okay? That trait's shared among all tetrapods. So the fact that you're looking at a thing, that's got those parts to it, doesn't help you to distinguish bats from turtles from whales from seals. Sure, they've all got it, but that's not telling you whether they're closely related to each other or not, because they all got it from a common ancestor; it's not derived.
However, that structure distinguishes the tetrapods from the lobe-finned fishes. So at that point on the tree it becomes useful as a shared-derived marker, of a group; which is why, one of the reasons why, we're pretty confident that the tetrapods is a good group, and that things didn't- that the vertebrates didn't come out of the water multiple times. In this context it's marking a trait that originated once in their common ancestor; it's shared by all of them; it's not found in their closest relatives.
The jargon is derived from the Greek. This thing is called a synapomorphy. Syn means shared; apo means derived; and morph means trait. So a shared-derived trait is a synapomorphy.
Now there's a bunch of important stuff here. Just looking the same isn't very helpful. The informative traits are the ones that are shared and derived. And what is shared and what is derived, and therefore what is informative, depends on the context; it depends on the part of the tree that you're sitting in.
Okay, now a little bit about building trees. You can think--evolution is real complex and a lot of stuff is going on. But let's suppose that A has the ancestral state of a trait, and that between A and B and C, a new state of the trait evolves. So Ancestral is blue and New is red. And this marks the point where that trait started to change, and then it spread through the population and by this point it was fixed.
Well this is a cartoon, and this is a cartoon of a cartoon. Okay? We're just marking where that happened, and the only important thing about this mark is that it is between this point and this point; other than that, we don't really know exactly where it is. Okay?
Now A, B and C would normally be species. Okay? But they could be other things. They could be genes, or they could be genera or families or something like that. And 1, 2, and 3 are characters. They could be morphological; they could be molecular. And, as a convention, the ancestral state of that character will be denoted 0, and the derived state will be denoted 1; and the arrow indicates going from ancestral to derived. And basically what this picture is telling you is that Trait 1 changed from ancestral to derived, between A and B, and then after B split off from C, two more things changed in C. Okay, so that's the information that picture is trying to give you.
And if you put down a real interesting vertebrate phylogeny, like this, a vertebrate phylogeny which has some unfortunate names, for certain groups on it--okay?; like reptiles aren't real--and we use these characters, having a vertebral column, having lungs, having an amniotic egg, having lactation, eggshell absent, chorioallantoic placenta, hooves on distal phalanges, a petrosal bulla, in the skull, then what we can see from plotting those along the line basically is which characters here are distinguishing which groups; what's going on when you go out into the ungulates, the horses and the cows.
The vertebral column is a synapomorphy of the vertebrates, of this whole bunch here. But if we just look at the mammals, which are from here on out, it's an ancestral trait; it's not a derived trait, it's a shared-primitive trait, and that's a symplesiomorphy, not a synapomorphy. The lungs, which come in right here, at 2, are a synapomorphy of the tetrapods, but they are an ancestral trait, a symplesiomorphy of the amniotes; okay, so the amniotes are back here--excuse me, the amniotes are up here. So whether you call a trait one or the other depends on where it sits in the tree and what it allows you to do with it.
This is a bit of the relationship between trees and names. Ideally we would like that relationship to be totally clear and unambiguous so that if I just give you a name, you know where that thing sits in the tree. This is a hard thing to do. It is so hard to do that there is now a complete overturning of the Linnaean System of Classification going on. It's being led by a couple of guys here, Michael Donoghue and Jacques Gauthier. Michael is in our department, and he's also vice-president of the university, and Jacques Gauthier is a prof in G&G - in Paleontology.
And they have worked a lot on how to get a way to name things that actually tells you their complete position on the Tree of Life. Okay? It's going to be a big computer code. It's not going to be something nice like Homo sapiens. Okay? Homo sapiens is the Linnaean name. The new method will contain a lot more information in it, and will probably only be something that you can store on your iPhone, or whatever else it is that you use to replace your memory.
But ideally these terms would map onto natural relationships; and these terms actually do map onto these natural relationships. Okay? So this is the primates, and these are the ungulates, and they are nested within the eutherian mammals. The therian mammals include the marsupials, and the mammals include the monotremes, which are the spiny echidna and the duckbilled platypus. And between them and the marsupials is where the female reproductive tract evolved, because the monotremes are still laying eggs. And then the amniotes would include the so-called reptiles; that would actually be crocodiles, lizards, snakes, turtles. The tetrapods include the amphibians, and then the vertebrates include the fish, as well. So this is a natural classification, and that's the way all taxonomic nomenclature should be related to all good systematics.
Now, how do you actually infer a tree from a character matrix? So here we're just trying to figure out if things are related or not. We don't have a tree yet. Okay? And we have three species and we have three characters; and in this case ancestral is going to be 0 and 1 is going to be derived.
Well this particular character matrix is consistent with drawing a tree this way, and then marking down the transitions here. So when 1 went from ancestral to derived, it changed here in both B and C, and when 2 and 3 went from ancestral to derived, it changed only in C. So can you see how the tree relates to the matrix? Okay? Any question about that at this point? I've made the simplest one I could.
Now if we did it with overall similarity, A and B share five ancestral counts. Okay? In three traits, 1, 2 and 3, A has got the ancestral state. In two traits, two of those, in B--B has also got the ancestral trait in 2 and 3, and that would suggest this tree. But if we go by derived similarity, then we get this tree. And you can see what it does to the tree. So this is overall similarity, which is misleading, and this is shared derived trait phylogenetics, which yields this tree, and it shifts the sister of B from A to C.
Now if life were only so simple. Life is never simple. Traits can conflict with each other in the information they give you, and they often do. So here is a character matrix with no conflict, and here's a character matrix with conflict. Okay? So everything was looking pretty good, as long as we'd only measured Trait 1 and 2 down here. But then we measured Trait 3, and Trait 3 seemed to indicate that C had this ancestral trait over here. It was beginning to look like C was going to be a highly derived species. But then we've looked at another trait and it wasn't like that. What do you do? Right? What do you do?
Well you can choose the simplest tree. You can choose the one that implies the least change. So here are all trees which are consistent with this character matrix. And if you go back and try, I think you will see that you can plot all of these changes onto here. So basically this tree is saying that third trait down here, it changed twice; it changed once in A and it changed once in B. So it went from an ancestral state up to a derived state, along these branches. And these traits up here, 1 and 2, they changed between A, on the one hand, and B and C on the other, here. So you could do that also, and you would find that all of these trees are actually consistent with that character matrix, but this one takes five changes and these only take 4: 1, 2, 3, 4; 1, 2, 3, 4, 5.
Therefore, we come to the conclusion that one of these traits, one of these trees, is probably correct, just by the principle of parsimony. And the only way that we really resolve this kind of issue is by getting more data. The more data you get, the more likely it is that that will converge on the real tree. I would like to point out that there probably is not enough data in the world to do that, for all of the creatures on the planet. In other words, at the end of the process there will still be some unresolved stuff.
Now what about--let me just put that in--what about this? Where does the root come from? Well these are unrooted trees, up here, and the choice of where you decide the ancestral state is actually makes quite a bit of difference to the tree. I'm sorry that in translating this has gotten screwed up a little bit here. This is actually A over here, and then it goes D, B, C over here. Choosing this as the root, rather than this, changes the relationship from B and D to B and C; this is B and this is C; this is B and this is D. So where do you get the outgroup, how do you decide where the root should be?
Well, that issue can only be decided in the context of a bigger tree. So you must have some other kind of information to suggest what your out group might be. And, when this is actually done, sometimes people see whether the choice of an outgroup is actually going to be changing the shape of their tree very much. They will report, "We tried this and this and this as the out group, and these were the results, to our tree." Okay?
Now, you get your trait matrix, you want to find the simplest tree. One of them is to do an exhaustive search. So here are two terminal taxa, B and C. Okay? We've chosen A as the ancestral condition; it's not an existing species now. A, we say, is going to be the out group; that's going to be link to the ancestors. Only one tree possible. If we have three terminal taxa, we can have either B as the closest relative of D; C as the closest relative of D; or B and C as being their closest relative. So there are three possibilities there.
If we have four terminal taxa, oh my goodness, suddenly we have this many. Oh that's confusing. If we have 500 terminal taxa, we have 1 times 101280 possibilities. This is a combinatorial explosion of possibilities. And about, oh say about 2003, 2004, it took nine months of runtime on a supercomputer to sort out all of the possible trees for a reasonable number of characters for something like this. So if you have 500 species and you wanted--and you had a reasonable number of characters, you would only get a tiny fraction of those trees covered, and you would wait nine months to get your answer. It's gotten a little better than that recently, but not much. Okay?
So there are ways to get around this problem. There are all kinds of heuristic ways to get around this. There are ways of jumping into tree space and doing local approximations and then branching things together. So that, for example, when the New York Times reported last week, on Darwin's birthday, that biologists had recently been able to publish a tree of 11,000 plant species, which was done here, by Stephen Smith and Michael Donoghue's group; he did that as a super-tree, using these approximation techniques to patch together lots of smaller trees. And there are all kinds of criteria that get applied to how good that is. By the way, one of the things that Stephen turned up is that hey, the ferns are still evolving rapidly; which is kind of neat. Lots of other stuff is in that.
So, the Tree of Life is not given. We have to discover it. The informative characters in the Tree of Life are those that are shared and derived. So appearances deceive. Simply looking similar is insufficient information. You can make a lot of trees with the same character matrix. You would prefer either the simplest, which implies the least change, or the tree that maximizes the probability of observing what you actually see, or some combination of those criteria. Okay? Next time we'll take these methods and see how we can infer history with them.
[end of transcript]