A phytools user asks:
“Question – can make.simmap
constrain the root state of a phylogeny to a particular state? As in, if we
want to assume the ancestor of primates was a frugivore, can this be done using the current setup given both
frugivorous and folivorous species at the tips?”
Answer – yes!
This is done using the argument pi
which sets prior probabilities for the global root node of the tree.
(Fixing values for other nodes in the tree is also possible, if a tiny bit more complicated.)
Here's an example using piscivory vs. non-piscivory in sunfishes. In this case, the phylogeny is originally from Near et al. (2005).
First, let's load phytools plus our tree & data:
library(phytools)
data(sunfish.tree)
sunfish.tree<-as.phylo(sunfish.tree)
sunfish.tree
##
## Phylogenetic tree with 28 tips and 27 internal nodes.
##
## Tip labels:
## Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
##
## Rooted; includes branch lengths.
data(sunfish.data)
head(sunfish.data)
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
Let's get our feeding mode trait out:
fmode<-setNames(sunfish.data$feeding.mode,rownames(sunfish.data))
Now let's do our stochastic mapping using the default, flat prior at the root:
flat.prior<-make.simmap(sunfish.tree,fmode,model="ARD",nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
##
## Q =
## non pisc
## non -6.087789 6.087789
## pisc 3.048905 -3.048905
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## non pisc
## 0.5 0.5
## Done.
flat.prior
## 100 phylogenetic trees with mapped discrete characters
We can plot the posterior probabilities at all the nodes of the tree using the summary
method for the
object class:
flat.summary<-summary(flat.prior)
flat.summary
## 100 trees with a mapped discrete character with states:
## non, pisc
##
## trees have 7.32 changes between states on average
##
## changes are of the following types:
## non,pisc pisc,non
## x->y 4.45 2.87
##
## mean total time spent in each state is:
## non pisc total
## raw 0.7443479 0.9507313 1.695079
## prop 0.4391228 0.5608772 1.000000
cols<-setNames(palette()[c(4,2)],levels(fmode))
plot(flat.summary,ftype="i",fsize=0.9,colors=cols)
legend("topleft",c("non-piscivorous","piscivorous"),
pch=21,pt.cex=2,pt.bg=cols)
Now, let's say we want to fix the root state to non-piscivory – for instance, because there's other
evidence external to that from our comparative data indicating that ancestral sunfish were
non-piscivorous rather than piscivorous. We can fix that using the argument pi
as follows:
constrained<-make.simmap(sunfish.tree,fmode,model="ARD",pi=c(1,0),nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
##
## Q =
## non pisc
## non -6.011052 6.011052
## pisc 0.000000 0.000000
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## [1] 1 0
## Done.
constrained
## 100 phylogenetic trees with mapped discrete characters
One thing that the reader might notice is that this new constraint changes our inferred substitution model. (In this case, it is a large change – often the effect is much more subtle.)
constrained.summary<-summary(constrained)
plot(constrained.summary,ftype="i",fsize=0.9,colors=cols)
legend("topleft",c("non-piscivorous","piscivorous"),
pch=21,pt.cex=2,pt.bg=cols)
Neat.
We can also visualize our result using densityMap
:
densityMap(constrained,lwd=6,outline=TRUE,fsize=0.9,res=200)
## sorry - this might take a while; please be patient
This is very different from when we left the root node unconstrained!
densityMap(flat.prior,lwd=6,outline=TRUE,fsize=0.9,res=200)
## sorry - this might take a while; please be patient
That's it.
Hi Liam, this is not so related to the post but I'm currently working on some phylogenetic analysis using continuous data (length in mm). I have a tree made based on available positioning of species, and I want to use the my length data to adjust branch length. However I am having issues as I have 5 length measurements (some negative some positive) for each of my 90 taxa. How can I use those measurements to return only a single value for each of my specimens, I've tried doing MLE but it only allows me to use a single column not all 5. Any help would be appreciated
ReplyDelete