Tuesday, April 18, 2023

Coloring the edges of a plotted tree by the marginal reconstruction with the highest likelihood

An R phylogenetics user asked the following:

“I would like to use the results of make.simmap to plot the edges of a tree by their most likely state. (Because my tree is really large and plotting pie charts at the nodes gets really polluted.)”

They had run into lots of problems, but this is not too difficult to do, so I thought I’d show a couple of solutions here.

One dilemma that we immediately face is that every node in the tree has a reconstructed state (rather than every edge), so which node should we use?

For simplicity, I thought it would make the most sense to simply graph the color of the most likely state in the descendant node or tip.

To illustrate it, I’ll use a simple example of diel activity pattern in primates with a phylogeny & dataset from Kirk & Kay (2004). Users wanting to follow along precisely will need to update phytools from GitHub.

library(phytools)
packageVersion("phytools")
## [1] '1.7.6'
data(primate.tree)
data(primate.data)
activity<-setNames(primate.data$Activity_pattern,
  rownames(primate.data))

I’m going to start by doing simple marginal ancestral state reconstruction (rather than stochastic mapping, as in the query) using phytools::ancr as follows.

ard_model<-fitMk(primate.tree,activity,model="ARD",pi="fitzjohn",
  rand_start=TRUE,smart_start=TRUE)
ard_model
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            Cathemeral   Diurnal Nocturnal
## Cathemeral  -0.050530  0.050530  0.000000
## Diurnal      0.002306 -0.004367  0.002061
## Nocturnal    0.002586  0.003565 -0.006151
## 
## Fitted (or set) value of pi:
## Cathemeral    Diurnal  Nocturnal 
##   0.003676   0.021704   0.974620 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -29.714931 
## 
## Optimization method used was "nlminb"
activity_anc<-ancr(ard_model)
activity_anc
## Marginal ancestral state estimates:
##    Cathemeral  Diurnal Nocturnal
## 91   0.000014 0.000496  0.999490
## 92   0.005491 0.015299  0.979210
## 93   0.149800 0.782695  0.067505
## 94   0.032725 0.965831  0.001444
## 95   0.006652 0.993313  0.000036
## 96   0.001721 0.998274  0.000005
## ...
## 
## Log-likelihood = -29.714931
cols<-setNames(c("#21908CFF","#FDE725FF","#440154FF"),
  sort(unique(activity)))
plotTree(primate.tree,ftype="i",fsize=0.6,lwd=1,offset=0.5)
tiplabels(pie=ard_model$data[primate.tree$tip.label,],
  piecol=cols,cex=0.3)
nodelabels(pie=activity_anc$ace,piecol=cols,
  cex=0.5)

plot of chunk unnamed-chunk-35

Now, plotting each edge in the color with the highest marginal likelihood is easier than you probably think.

Just so we can be sure we got it right, I’m going to also overlay the marginal reconstruction pie charts on my nodes & tips.

Alright, here goes nothing!

allStates<-rbind(ard_model$data[primate.tree$tip.label,],
  activity_anc$ace)
edge.cols<-cols[apply(allStates,1,function(x) 
  which(x==max(x)))][primate.tree$edge[,2]]
plot(primate.tree,edge.color=edge.cols,cex=0.6,no.margin=TRUE,
  label.offset=2,edge.width=2,direction="upwards",
  y.lim=c(0,95))
tiplabels(pie=ard_model$data[primate.tree$tip.label,],
  piecol=cols,cex=0.3)
nodelabels(pie=activity_anc$ace,piecol=cols,
  cex=0.5)

plot of chunk unnamed-chunk-36

What?

In a subsequent post, I’ll show how to do this with stochastic character mapping, and perhaps taking into account the reconstruction at both ends of each edge (instead of just tip-wise), but the idea will be generally the same.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.