Wednesday, April 24, 2024

Pruning a mega-phylogeny to include a list of taxa in which the names don't match exactly

The other day a colleague asked me for advice on getting a reasonable time-calibrated phylogeny for a set of taxa to use for a comparative physiology exercise he was developing for a class. I suggested pulling a tree from the online resource VertLife.org and then subsampling it to include only the taxa for which he had data. If he would supply me with a list of the taxon names for his species, I told him, I’d be happy to help.

This is what I did.

## load packages
library(phytools)

We can start by reading the full DNA maximum clade credibility tree directly from the VertLife website as follows.

download.file(
  url="https://raw.githubusercontent.com/n8upham/MamPhy_v1/master/_DATA/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre",destfile="full_mammal_tree.nex")
mammal_tree<-read.nexus(file="full_mammal_tree.nex")
mammal_tree
## 
## Phylogenetic tree with 4100 tips and 4099 internal nodes.
## 
## Tip labels:
##   _Anolis_carolinensis, Ornithorhynchus_anatinus_ORNITHORHYNCHIDAE_MONOTREMATA, Zaglossus_bruijnii_TACHYGLOSSIDAE_MONOTREMATA, Tachyglossus_aculeatus_TACHYGLOSSIDAE_MONOTREMATA, Rhynchocyon_petersi_MACROSCELIDIDAE_MACROSCELIDEA, Rhynchocyon_cirnei_MACROSCELIDIDAE_MACROSCELIDEA, ...
## 
## Rooted; includes branch lengths.

Now, if we graph this tree, we’ll see that it’s ultrametric. It should be as it contains only extant species & is time-calibrated! Let’s drop the one non-mammal outgroup (Anolis carolinensis) and plot our tree to see this.

mammal_tree<-drop.tip(mammal_tree,"_Anolis_carolinensis")
plotTree(mammal_tree,ftype="off",lwd=1,type="arc",
  arc_height=0.05,part=0.995)