Tree methods and comparisons: bootstrap, maximum likelihood and parsimony

Due Monday, April 28th. Hand in a hard-copy of your results and be prepared to briefly summarize your conclusions (5-15 minutes). You are encouraged to work in the same group as last week, although you can switch around if you would like. Groups must be 2-4 people.

In the first part of this lab we will evaluate trees produced by the upgma and neighbor-joining methods by bootstrap analysis. Then we will use the same sequences and the program Phylip to construct trees using the parsimony and maximum likelihood methods.

For the moment, you should use the 5233 server.
  1. Formulate a clear hypothesis about mammalian phylogeny that you will investigate. Try to select something that involves some uncertainty - usually that involves trying to resolve early branches, such as: relation of bats to the other mammals (for example the Pegasoferae clade); relation of whales to the other mammals; relation of primates to rodents relative to other mammalian orders; placement of the Hystricognathi within Rodentia; the relation between the Xenarthra, Afrotherians, and other mammals.

  2. Select at least 8 mitochondrial sequences from mammalian species to investigate your hypothesis. I have assembled some of interest here. (Genus is only given as a single capital letter, the rest before "Mito" is the species name.) They are also in the folder '/Users/class/mitos' on the 5233 computer. For a particular hypothesis you may need to find others; one way is through NCBI's Taxonomy database with the "genome Sequences" option turned on.

  3. Before doing any sequence analysis, draw the expected tree of your selected species based on the hypothesis you are considering. Where do you expect this tree to be most uncertain?

  4. Bootstrapping. Compute at least 100 bootstraps using the functions supplied on the 5233 server in the "lab 10 functions" worksheet. How many trees do you get? What are they? Create a consensus tree and label the clades with their percentages.

  5. Using phylip. You will need to convert your clustal-format alignment files to phylip format using the readseq program. Compute a tree using the programs dnapars and dnaml. Record the settings you use on these. How do they compare with your previous results?
    Some Phylip hints: You will need to download Phylip to use it. It is not huge, so this shouldn't take very long. It has some quirks: the default output file is always called "outfile" and it is in the same directory as the executable (dnapars or dnaml in our case). Any tree output is called "outtree". If these are already present, you have the option of naming your own path and file. You might need to copy one of the font files (font1,font2,etc) to a file named "fontfile" that is in the directory with the executables. (I was hoping to make a Sage interface to phylip, but for a variety of technical issues I have not so far).