## Wednesday, February 8, 2012

### Issue in phylosig() revealed for very short trees

Matt Johnson, a graduate student at Duke, just identified an interesting problem with the phytools function phylosig. Basically, he was finding that phylosig(...,method="lambda",se=se) was giving discrepant results when he rescaled the branch lengths of the tree to be very short (say, scaling all edge lengths by 1/1000). This was a problem unique to the optimization of λ with sampling error in the estimation of species means. This immediately pointed towards the R base package multidimensional optimizer, optim, because no other methods in phylosig use optim aside from λ with sampling error.

To reproduce the error that he found, Matt suggested something along the lines of the following code:

> set.seed(2)
> require(phytools); require(geiger)
> tree<-pbtree(n=200,scale=1)
> x<-fastBM(lambdaTree(tree,0.7))
> se<-sqrt(rchisq(n=200,df=2)); names(se)<-tree\$tip.label
> e<-rnorm(n=200,sd=se)
> xe<-x+e
> res1<-phylosig(tree,xe,method="lambda",se=se)
> res1
\$lambda
[1] 0.7298913
\$sig2
[1] 1.037691
\$logL
[1] -360.4223

Ok, so far so good. Now let's try and rescale the tree to have really short branches. In theory this should not affect our estimate of phylogenetic signal.

> tree1000<-tree; tree1000\$edge.length<-tree\$edge.length/1000
> res1000<-phylosig(tree1000,xe,method="lambda",se=se)
> res1000
\$lambda
[1] 0.9986187
\$sig2
[1] 51435.36
\$logL
[1] -471.9413

Obviously, there's some kind of problem here. What I've been able to figure out is that it is a convergence issue - for some reason, convergence is more difficult for the larger value of σ2 that results from rescaling the branch lengths to have very short lengths. [Note that for any change to the branch lengths of the tree by factor 1/k, the fitted evolutionary rate should increase by factor k.] I've made some headway towards resolving this issue and will post more, along with an updated version of phylosig shortly.