Physics and Astronomy Home Page 

Paul Higgs - Research Topics

Phylogenetic Methods and Applications


    1. Rattray, M. & Higgs, P.G. (2005) RNA-based Phylogenetic Methods. In Probabilistic Modeling in Bioinformatics and Medical Informatics. Husmeier, D., Dybowski, R., Roberts, S. (Eds). Springer Verlag. pp191-210.
    2. Higgs, P.G., Jameson, D., Jow, H. & Rattray, M. (2003) The evolution of tRNA-Leucine genes in animal mitochondrial genomes. J. Mol. Evol. 57, 435-445.
    3. Hudelot, C., Gowri-Shankar, V., Jow, H., Rattray, M. & Higgs, P.G. (2003) RNA-based Phylogenetic Methods: Application to Mammalian Mitochondrial RNA Sequences. Mol. Phyl. Evol. 28, 241-252.
    4. Hoyle, D.C. & Higgs, P.G. (2003) Factors affecting the errors in estimation of evolutionary distances between species Mol. Biol. Evol. 20, 1-9.
    5. Jow, H., Hudelot, C., Rattray, M. & Higgs, P.G. (2002) Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. Mol. Biol. Evol. 19, 1591-1601.
    6. N. J. Savill, D.C. Hoyle and P.G. Higgs, P.G. (2001) RNA sequence evolution with secondary structure constraints: Comparison of substitution rate models using maximum likelihood methods. Genetics 157, 399-411.

PHASE: A software package for PHylogenetics And Sequence Evolution

Why is PHASE different from other Phylogenetic programs?

This package is designed specifically for use with RNA sequences that have a conserved secondary structure, e.g. rRNA and tRNA. It is well known that compensatory substitutions occur in the paired regions of RNA secondary structures. This means that substitutions occurring on one side of a pair are correlated with substitutions on the other side. Most phylogenetic programs assume that each site in a molecule evolves independently of the others, and this assumption is not valid for RNA genes.

Our program uses models of sequence evolution that consider pairs of sites rather than single sites. The program requires a structure-based alignment of RNA sequences with the consensus secondary structure indicated in bracket notation at the top of the alignment. (We assume that you know the structure already). If an RNA model is specified, the program uses only the paired sites in the alignment for calculation of the phylogenetic trees. Unpaired sites are ignored in the present implementation. It is also possible to specify one of the standard 4-state models used for DNA sequence evolution. If a DNA model is specified, then all sites in the alignment are included and treated independently. In future we plan to include an option that allows use of a paired-site model for the paired sites simultaneously with a single-site model for the unpaired sites.

Our program uses the Markov Chain Monte Carlo method to sample large numbers of possible phylogenetic trees with probability proportional to their likelihood. This is a Bayesian statistical method that allows posterior probabilities to be generated for alternative trees and alternative clades. These posterior probabilities provide a sound statistical measure of support of alternative phylogenetic hypotheses, and they remove the need for bootstrapping. We have implemented a number of different programs to summarize the information in the lists of alternative trees generated by the MCMC runs. These allow construction of consensus trees and calculation of posterior probabilities both for those clades in the consensus and those not in the consensus. Where many alternative arrangements of a given set of species exist, it is possible to calculate posterior probabilities for all the alternative arrangements of these species in a convenient way.

In addition to MCMC, we have also implemented standard Maximum Likelihood techniques for inferring the optimal tree with any of the RNA or DNA evolution models. A novel feature of our ML program is the option to specify a number of known clades consisting of small groups of closely related species. An exhaustive search procedure then considers all possible trees that connect these clades together without considering rearrangements of species within the clades. This is useful in cases where it is the more distant relationships on the tree that are in dispute and the closer relationships are already well known.


The PHASE documentation is here.

The source code is available here.

We would be pleased to know if you use our programs or if you have any problems.

Mammalian Evolution

Recent molecular phylogenetic studies have given important information on the relationships between the different orders of mammals and have shown that classifications based on morphology were misleading in some cases. There are apparently 4 main groups of eutherian mammals:

1.      Afrotheria (includes Elephants, Aardvarks, Sea Cows, Hyraxes, Elephant Shrews and others)

2.      Xenarthra (includes Armadillos, Anteaters, and Sloths)

3.      Sometimes called Supraprimates or Euarchontoglires (includes Primates, Tree Shrews, Flying Lemurs, Rodents and Rabbits)

4.      Laurasiatheria (includes the Odd- and Even-Toed hoofed animals, Whales, Carnivores, Pangolins, Bats and the “true” Insectivores)

The following pairs of species are superficially similar in appearance and lifestyle but molecular evidence shows that they are very distantly related.  This means that there has been a large amount of convergent evolution in the phenotype.  In each case the species on the left is from the Afrotheria group and the one on the right is from the Laurasiatheria group. For example, Elephant Shrews are more closely related to Elephants than to Shrews - you wouldn’t tell this from the pictures.

Large Herbivores / Pachyderms: Elephant and Rhinoceros

Specialized Anteaters: Aardvark and Pangolin (note that the Anteater itself is in Xenarthra and is different again)

Sea Mammals: Manatee and Seal (and Whales are third distinct group)

Small Insectivores: Elephant Shrew and Shrew

There are now complete mitochondrial genome sequences available from many different mammalian species. Our own relational database, known as OGRe, contains these sequences as well as many other complete animal genome sequences. We decided to use mammalian sequences as the first application of our new phylogenetic methods in the PHASE package.

We carried out phylogenetic analyses using the helical regions of the complete set of tRNAs and rRNAs  from the completely sequenced mammalian mitochondrial genomes. We find evidence that the mammalian orders are grouped in four principal strongly supported clades. This confirms results obtained with much larger nuclear data sets, and resolves the conflicts between published phylogenies based on mitochondrial sequences and nuclear gene phylogenies. Even with a limited species sampling we were able to resolve deep mammal nodes with mitochondrial data. When a standard single-site model is used on the same sequences, the support values are misleading due to the neglect of correlations. This study highlights the importance of using an appropriate model for phylogenetic inference. SEE References 3 and 5 at the top of the page.