Wednesday, March 22, 2023

Showing missing values as white lines on a "contMap" style plot

Today a phytools user made the following inquiry via my blog.

phytools inquiry screenshot

This is not too difficult to do, as a matter of fact, although it requires that we know a little bit about the internal structure of our "contMap" plotted object.

To illustrate, I’ll use a real tree of mammals – but simulated values of a continuous trait.

We can start by loading phytools (and the phangorn package, which is imported by phytools – but we’re going to need some of its functions in our namespace).

library(phytools)
library(phangorn)

Now let’s load a phylogenetic tree and simulate some data on that tree using phytools::fastBM.

data(mammal.tree)
x<-fastBM(mammal.tree,sig2=0.1,a=10)

Our next step is to simulate “missingness” in our tip data. For this purpose, I’m going to use the special R value NA.

x[sample(mammal.tree$tip.label,10)]<-NA
x
##        U._maritimus           U._arctos       U._americanus           N._narica            P._lotor 
##            9.082822                  NA           10.758603            8.822605            8.595170 
##         M._mephitis            M._meles            C._lupus          C._latrans           L._pictus 
##                  NA                  NA            6.667281            7.158965                  NA 
##           C._aureus U._cinereoargenteus            V._fulva           H._hyaena          C._crocuta 
##            7.874114            9.901579                  NA           10.447277            9.128543 
##          A._jubatus           P._pardus           P._tigris              P._leo          T._bairdii 
##            9.046107                  NA            8.289769            8.063814            4.769473 
##            C._simum         D._bicornis         E._hemionus         E._caballus        E._burchelli 
##            9.006314            8.064863                  NA            9.900821            9.179032 
##         L._guanicoe      C._dromedarius   G._camelopardalis           S._caffer            B._bison 
##           12.409268           12.185310           12.055006           12.553401           11.380935 
##             T._oryx           G._granti        G._thomsonii       A._cervicapra            M._kirki 
##                  NA           12.168986           10.531687           11.095123                  NA 
##       O._americanus       O._canadensis          H._equinus         A._melampus         C._taurinus 
##           12.985031           13.705636            8.978079           10.741041           10.858053 
##          D._lunatus       A._buselaphus        A._americana       C._canadensis             D._dama 
##           10.929413           10.874234                  NA           13.226236           13.837965 
##            A._alces         R._tarandus      O._virginianus         O._hemionus 
##           13.261633           13.717304           13.549289           13.351205

Now let’s generate a "contMap" object. Here, since we have some missing values at the tips of the tips of the tree, I’ll set method="anc.ML" which will cause these missing values to be estimated. (We’re not actually going to plot these estimates – but this allows our analysis to run in spite of the missing values. The good thing is that this missingness will not affect our estimated values at all the other nodes of the phylogeny!)

obj<-contMap(mammal.tree,x[!is.na(x)],method="anc.ML",
  plot=FALSE)

Since the default color palette of contMap is objectively bad, let’s update it to the “viridis” palette using the viridisLite package.

obj<-setMap(obj,viridisLite::viridis(n=10))

Next, let’s find all the tip nodes that are missing from our data – in other words, that had NA values in our original data vector.

nn<-names(which(is.na(x)))
tips<-sapply(nn,function(x,y) which(y==x),y=obj$tree$tip.label)
tips
##    U._arctos  M._mephitis     M._meles    L._pictus     V._fulva    P._pardus  E._hemionus 
##            2            6            7           10           13           17           23 
##      T._oryx     M._kirki A._americana 
##           31           35           43

We could normally skip the next step and use these tip nodes to identify the edges to plot in white or black; however, if we have high missingness we should also worry about whether or not any ancestral edges of the phylogeny should also be treated as absent. Here’s a trick using phangorn::Ancestors to help us identify this circumstance.

missing<-Ancestors(mammal.tree,tips)
non_missing<-Ancestors(mammal.tree,setdiff(1:Ntip(mammal.tree),
  tips))
bb<-setdiff(c(tips,unique(unlist(missing))),
  c(setdiff(1:Ntip(mammal.tree),tips),
  unique(unlist(non_missing))))

Now we’re finally ready to re-color our "contMap" object. To do this we’ll using phytools::paintBranches as follows.

obj$tree<-paintBranches(obj$tree,bb,"NA")

Having done this, let’s just go ahead and plot our tree.

cols<-setNames(c("white",obj$cols),c("NA",names(obj$cols)))
plot(obj$tree,cols,split.vertical=TRUE,outline=TRUE,lwd=6,
  ftype="i",fsize=0.7)
add.color.bar(40,obj$cols,title="trait",lims=range(x,na.rm=TRUE),
  digits=2,subtitle="",x=0,y=0.97*Ntip(mammal.tree),prompt=FALSE)
legend(x=-3,y=0.95*Ntip(mammal.tree),legend="missing",pch=22,
  pt.bg="white",bty="n",pt.cex=2)

plot of chunk unnamed-chunk-9

That’s all there is to it.

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Dear Liam,

    Do you think it is possible to combine this technique with plot.phylo.to.map()?
    I've managed to associate a contMap tree with plot.phylo.to.map() but not a contMap tree that takes into account non-applicable states.

    thank you very much for your work,

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.