Wednesday, April 24, 2024

Pruning a mega-phylogeny to include a list of taxa in which the names don't match exactly

The other day a colleague asked me for advice on getting a reasonable time-calibrated phylogeny for a set of taxa to use for a comparative physiology exercise he was developing for a class. I suggested pulling a tree from the online resource and then subsampling it to include only the taxa for which he had data. If he would supply me with a list of the taxon names for his species, I told him, I’d be happy to help.

This is what I did.

## load packages

We can start by reading the full DNA maximum clade credibility tree directly from the VertLife website as follows.

## Phylogenetic tree with 4100 tips and 4099 internal nodes.
## Tip labels:
##   _Anolis_carolinensis, Ornithorhynchus_anatinus_ORNITHORHYNCHIDAE_MONOTREMATA, Zaglossus_bruijnii_TACHYGLOSSIDAE_MONOTREMATA, Tachyglossus_aculeatus_TACHYGLOSSIDAE_MONOTREMATA, Rhynchocyon_petersi_MACROSCELIDIDAE_MACROSCELIDEA, Rhynchocyon_cirnei_MACROSCELIDIDAE_MACROSCELIDEA, ...
## Rooted; includes branch lengths.

Now, if we graph this tree, we’ll see that it’s ultrametric. It should be as it contains only extant species & is time-calibrated! Let’s drop the one non-mammal outgroup (Anolis carolinensis) and plot our tree to see this.