A phytools user recently contacted me with the following query:
“In short, is it possible to find confidence intervals for ancestral states
reconstructed with anc.ML
? We know fastAnc
gives us
the variance and confidence intervals, but we have been using
anc.ML
for our continuous variables because we usually have data
missing from our tip states. We'd like to be able to use confidence intervals
for the reconstructed nodes (model$ace
) and tips
(model$missing.x
) in our analysis.”
Theoretically, it should not be too hard to obtain variances (& thus confidence intervals) by first computing the Hessian matrix of 2nd-order partial derivatives of the likelihood surface at the MLE. The negative inverse of this matrix has the asymptotic property of being an estimator of the covariance matrix for our MLEs.
I have now added this to phytools.
Here is a quick demo with missing data:
library(phytools)
packageVersion("phytools")
## [1] '0.5.47'
tree
##
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
x.sampled
## X H P J M T
## -1.2380228 1.0293725 -0.8456503 -1.9916634 -2.8730554 0.3547209
## G O D I W Z
## 1.4444012 0.8178155 2.1005946 -2.0430705 0.6952082 -0.3092633
## U V Y S F K
## 1.6612110 1.0018500 -1.8732164 0.2649342 2.8013645 -0.8186988
## B N
## 0.3439659 -2.7633222
fit<-anc.ML(tree,x.sampled,CI=TRUE)
fit
## Ancestral character estimates using anc.ML under a BM model:
## 27 28 29 30 31 32 33
## 0.311053 0.391244 0.523655 0.999723 1.110276 0.848005 2.006952
## 34 35 36 37 38 39 40
## 1.485751 2.172200 1.266575 0.235682 -0.495487 -1.696335 -1.534325
## 41 42 43 44 45 46 47
## -2.001946 -1.499331 -2.736365 -0.177092 0.732131 0.539801 0.539860
## 48 49 50 51
## 0.330409 0.996167 -0.868859 -1.424747
##
## Lower & upper 95% CIs:
## lower upper
## 27 -0.669159 1.291264
## 28 -0.515972 1.298459
## 29 -0.272219 1.319530
## 30 0.216615 1.782832
## 31 0.313083 1.907468
## 32 0.005303 1.690707
## 33 1.593210 2.420694
## 34 0.700476 2.271026
## 35 1.377463 2.966936
## 36 0.826782 1.706368
## 37 -0.542467 1.013830
## 38 -1.216217 0.225242
## 39 -2.212909 -1.179761
## 40 -1.949853 -1.118797
## 41 -2.117523 -1.886370
## 42 -1.919639 -1.079023
## 43 -2.959644 -2.513085
## 44 -0.789921 0.435737
## 45 0.064848 1.399414
## 46 -0.147178 1.226780
## 47 -0.509807 1.589527
## 48 0.076263 0.584555
## 49 0.390736 1.601598
## 50 -1.447851 -0.289866
## 51 -1.762894 -1.086600
##
## Fitted model parameters & likelihood:
## sig2 log-likelihood
## 0.43949 -18.88231
##
## R thinks it has found the ML solution.
The print method does not indicate whether some tips were missing, and if those tip values have been retained; however these can be obtained from the fitted fitect:
str(fit)
## List of 10
## $ sig2 : Named num 0.439
## ..- attr(*, "names")= chr ""
## $ ace : Named num [1:25] 0.311 0.391 0.524 1 1.11 ...
## ..- attr(*, "names")= chr [1:25] "27" "28" "29" "30" ...
## $ logLik : num -18.9
## $ counts : Named int [1:2] 98 98
## ..- attr(*, "names")= chr [1:2] "function" "gradient"
## $ convergence : int 0
## $ message : chr "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
## $ model : chr "BM"
## $ CI95 : num [1:25, 1:2] -0.669 -0.516 -0.272 0.217 0.313 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:25] "27" "28" "29" "30" ...
## .. ..$ : NULL
## $ missing.x : Named num [1:6] 0.848 2.007 2.172 -1.499 0.54 ...
## ..- attr(*, "names")= chr [1:6] "A" "C" "E" "L" ...
## $ missing.CI95: num [1:6, 1:2] -0.56 1.412 0.918 -2.271 -0.594 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:6] "A" "C" "E" "L" ...
## .. ..$ : NULL
## - attr(*, "class")= chr "anc.ML"
fit$missing.x
## A C E L Q R
## 0.8478992 2.0069709 2.1721374 -1.4993467 0.5398655 0.5398655
fit$missing.CI95
## [,1] [,2]
## A -0.5598708 2.2556692
## C 1.4120210 2.6019208
## E 0.9184274 3.4258474
## L -2.2708686 -0.7278247
## Q -0.5937081 1.6734391
## R -0.5937081 1.6734391
That's really all there is to it.
Here are these values overlain on a contMap
style plot:
obj<-contMap(tree,x=c(x.sampled,fit$missing.x),method="user",
anc.states=fit$ace,plot=FALSE)
plot(obj)
nulo<-sapply(names(fit$missing.x),add.arrow,tree=obj,
arrl=0.1,hedl=0.05,col="blue")
You might be able to see that the missing tips (marked in blue) have reconstructed states at the tips that are equivalent to the internal parent node value preceding them. This is exactly what we hope for under BM where the expected change is zero along any edge.
Note that the reason analytic solutions are used by fastAnc
instead of the Hessian matrix is because (in general) for small trees these
CIs will be too small. Theoretically, this problem should decrease
for larger datasets.
The tree and character data for this example were simulated as follows:
library(phytools)
## simulate tree & data
x<-fastBM(tree<-pbtree(n=26,tip.label=LETTERS))
## simulate missing data
x.sampled<-sample(x,20)
x.sampled
Finally, the newest development version of phytools can always be installed from GitHub using devtools as follows:
library(devtools)
install_github("liamrevell/phytools")
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.