Monday, March 21, 2011

Order of traits in brownie.lite() and evol.vcv()

A couple of users (e.g., here) have identified an annoying property of both brownie.lite() and evol.vcv() [both available on my R-phylogenetics page] - that is, the functions order the estimated evolutionary rates or matrices by the order in which the corresponding mapped states appear in the tree (not, for instance, in numeric or alphabetical order). For instance, consider the following code (which also gives a method to simulate Brownian rate variation on the tree):

> # a Newick style SIMMAP tree
> text<-"((10:{te,0.4},4:{te,0.4}):{aq,1.00:te,0.28},((1:{aq,0.98},(((8:{aq,0.05},9:{aq,0.05}):{aq,0.03},7:{aq,0.08}):{te,0.06:aq,0.25},(5:{te,0.08},6:{te,0.08}):{te,0.3}):{te,0.59}):{te,0.04},(2:{aq,0.52},3:{aq,0.52}):{te,0.25:aq,0.25}):{aq,0.06:te,0.6});"
> tr1<-read.simmap(text=text) # read in tree
> simtree<-list(edge=tr1$edge,Nnode=tr1$Nnode, edge.length=tr1$mapped.edge[,"aq"]+20*tr1$mapped.edge[,"te"], tip.label=tr1$tip.label) # make tree for simulation
> attr(simtree,"class")<-"phylo"
> x<-fastBM(simtree) # simulate using fastBM() or sim.char()
> brownie.lite(tr1,x) # fit model
$sig2.single
[1] 8.57495
$var.single
[1] 14.70595
$logL1
[1] -22.13206
$k1
[1] 2
$sig2.multiple
aq te
0.9701962 22.1362945
$vcv.multiple
aq te
aq 0.6387571 -0.9954994
te -0.9954994 160.1397276
$logL.multiple
[1] -20.14824
$k2
[1] 3
$P.chisq
[1] 0.04638306
$convergence
[1] "Optimization has converged."


Note that is is very easy to get the mapped state at the root as follows:

> names(tr1$maps[[1]])[1]
[1] "aq"


because the first edge in the tree will always arise from the root node. The order of $sig2.multiple matches the order of states on the tree (i.e., in this case starting with "aq").

Now, let's flip the state at the root node by artificially rearranging the order of the mapped edges in our tree (altered portions in bold italic):

> text<-"((10:{te,0.4},4:{te,0.4}):{te,0.28:aq,1.00},((1:{aq,0.98},(((8:{aq,0.05},9:{aq,0.05}):{aq,0.03},7:{aq,0.08}):{te,0.06:aq,0.25},(5:{te,0.08},6:{te,0.08}):{te,0.3}):{te,0.59}):{te,0.04},(2:{aq,0.52},3:{aq,0.52}):{te,0.25:aq,0.25}):{te,0.6:aq,0.06});"
> tr2<-read.simmap(text=text)
> names(tr2$maps[[1]])[1]
[1] "te"
> brownie.lite(tr2,x)
...
$sig2.multiple
te aq
22.1362944 0.9701962
$vcv.multiple
te aq
te 160.1399035 -0.9954995
aq -0.9954995 0.6387571
$logL.multiple
[1] -20.14824
...


We can see that the estimated rates are the same (as we would expect), but their order in $sig2.multiple and $vcv.multiple has flipped.

This property (ordering the rates by their temporal appearance, rather than alphanumerically) was by design, not chance; however I can see how it might be annoying for someone analyzing a dataset containing many SIMMAP trees.

Fortunately, there is an easy fix. Just rearrange the columns of $mapped.edge in the order that you want the mapped states to be ordered. For instance:

> tr2$mapped.edge<-tr2$mapped.edge[,c("aq","te")]
> brownie.lite(tr2,x)
...
$sig2.multiple
aq te
0.9701962 22.1362945
$vcv.multiple
aq te
aq 0.6387571 -0.9954994
te -0.9954994 160.1397276
$logL.multiple
[1] -20.14824
...


and the problem is solved.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.