Wednesday, August 4, 2021

Constraining the root state during stochastic character mapping in phytools

A phytools user asks:

“Question – can make.simmap constrain the root state of a phylogeny to a particular state? As in, if we want to assume the ancestor of primates was a frugivore, can this be done using the current setup given both frugivorous and folivorous species at the tips?”

Answer – yes!

This is done using the argument pi which sets prior probabilities for the global root node of the tree. (Fixing values for other nodes in the tree is also possible, if a tiny bit more complicated.)

Here's an example using piscivory vs. non-piscivory in sunfishes. In this case, the phylogeny is originally from Near et al. (2005).

First, let's load phytools plus our tree & data:

library(phytools)
data(sunfish.tree)
sunfish.tree<-as.phylo(sunfish.tree)
sunfish.tree
## 
## Phylogenetic tree with 28 tips and 27 internal nodes.
## 
## Tip labels:
##   Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
## 
## Rooted; includes branch lengths.
data(sunfish.data)
head(sunfish.data)
##                      feeding.mode gape.width buccal.length
## Acantharchus_pomotis         pisc      0.114        -0.009
## Lepomis_gibbosus              non     -0.133        -0.009
## Lepomis_microlophus           non     -0.151         0.012
## Lepomis_punctatus             non     -0.103        -0.019
## Lepomis_miniatus              non     -0.134         0.001
## Lepomis_auritus               non     -0.222        -0.039

Let's get our feeding mode trait out:

fmode<-setNames(sunfish.data$feeding.mode,rownames(sunfish.data))

Now let's do our stochastic mapping using the default, flat prior at the root:

flat.prior<-make.simmap(sunfish.tree,fmode,model="ARD",nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##            non      pisc
## non  -6.087789  6.087789
## pisc  3.048905 -3.048905
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  non pisc 
##  0.5  0.5
## Done.
flat.prior
## 100 phylogenetic trees with mapped discrete characters

We can plot the posterior probabilities at all the nodes of the tree using the summary method for the object class:

flat.summary<-summary(flat.prior)
flat.summary
## 100 trees with a mapped discrete character with states:
##  non, pisc 
## 
## trees have 7.32 changes between states on average
## 
## changes are of the following types:
##      non,pisc pisc,non
## x->y     4.45     2.87
## 
## mean total time spent in each state is:
##            non      pisc    total
## raw  0.7443479 0.9507313 1.695079
## prop 0.4391228 0.5608772 1.000000
cols<-setNames(palette()[c(4,2)],levels(fmode))
plot(flat.summary,ftype="i",fsize=0.9,colors=cols)
legend("topleft",c("non-piscivorous","piscivorous"),
    pch=21,pt.cex=2,pt.bg=cols)

plot of chunk unnamed-chunk-4

Now, let's say we want to fix the root state to non-piscivory – for instance, because there's other evidence external to that from our comparative data indicating that ancestral sunfish were non-piscivorous rather than piscivorous. We can fix that using the argument pi as follows:

constrained<-make.simmap(sunfish.tree,fmode,model="ARD",pi=c(1,0),nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##            non     pisc
## non  -6.011052 6.011052
## pisc  0.000000 0.000000
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## [1] 1 0
## Done.
constrained
## 100 phylogenetic trees with mapped discrete characters

One thing that the reader might notice is that this new constraint changes our inferred substitution model. (In this case, it is a large change – often the effect is much more subtle.)

constrained.summary<-summary(constrained)
plot(constrained.summary,ftype="i",fsize=0.9,colors=cols)
legend("topleft",c("non-piscivorous","piscivorous"),
    pch=21,pt.cex=2,pt.bg=cols)

plot of chunk unnamed-chunk-6

Neat.

We can also visualize our result using densityMap:

densityMap(constrained,lwd=6,outline=TRUE,fsize=0.9,res=200)
## sorry - this might take a while; please be patient

plot of chunk unnamed-chunk-7

This is very different from when we left the root node unconstrained!

densityMap(flat.prior,lwd=6,outline=TRUE,fsize=0.9,res=200)
## sorry - this might take a while; please be patient

plot of chunk unnamed-chunk-8

That's it.

1 comment:

  1. Hi Liam, this is not so related to the post but I'm currently working on some phylogenetic analysis using continuous data (length in mm). I have a tree made based on available positioning of species, and I want to use the my length data to adjust branch length. However I am having issues as I have 5 length measurements (some negative some positive) for each of my 90 taxa. How can I use those measurements to return only a single value for each of my specimens, I've tried doing MLE but it only allows me to use a single column not all 5. Any help would be appreciated

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.