Friday, November 10, 2017

New methods for rerootingMethod (and comparing ancestral state reconstruction under the Mk and threshold models)

I just added print and plot methods for the phytools ancestral state reconstruction function, rerootingMethod. I also merged Klaus's pull request for ancThresh, which makes it still faster without breaking anything, and added a minor feature to the new plot method for "ancThresh" objects in that it now (invisibly) returns a matrix with the posterior probabilities at tips & nodes of the tree.

rerootingMethod implements the re-rooting method of Yang et al. (1995) for computing the marginal ancestral states (empirical Bayesian posterior probabilities) at the nodes of the tree under an Mk model.

This function is to some degree redundant with ape::ace; except that rerootingMethod, like make.simmap, allows the user to supply tip states as probabilities instead of fixed values. It then also proceeds to compute (empirical Bayesian) posterior probabilities at the tips as well as the nodes. One limitation of the method is that it can only be used for a symmetric transition model - that is, one in which qi,j = qj,i for all i and j.

Here is a demo of the function, along with its associated S3 methods, using the dataset I analyzed the the threshold model the other day.

library(phytools)
packageVersion("phytools")
## [1] '0.6.41'
X<-read.csv("Centrarchidae.data.csv",header=TRUE,row.names=1)
X
##                          not mod high
## Acantharchus_pomotis     0.0 1.0  0.0
## Lepomis_gibbosus         1.0 0.0  0.0
## Lepomis_microlophus      1.0 0.0  0.0
## Lepomis_punctatus        1.0 0.0  0.0
## Lepomis_miniatus         1.0 0.0  0.0
## Lepomis_auritus          1.0 0.0  0.0
## Lepomis_marginatus       1.0 0.0  0.0
## Lepomis_megalotis        1.0 0.0  0.0
## Lepomis_humilis          1.0 0.0  0.0
## Lepomis_macrochirus      1.0 0.0  0.0
## Lepomis_gulosus          0.0 1.0  0.0
## Lepomis_symmetricus      1.0 0.0  0.0
## Lepomis_cyanellus        0.0 1.0  0.0
## Micropterus_cataractae   0.0 0.0  1.0
## Micropterus_coosae       0.0 1.0  0.0
## Micropterus_notius       0.0 1.0  0.0
## Micropterus_treculi      0.0 0.5  0.5
## Micropterus_salmoides    0.0 0.0  1.0
## Micropterus_floridanus   0.0 0.0  1.0
## Micropterus_punctulatus  0.0 0.0  1.0
## Micropterus_dolomieu     0.0 0.0  1.0
## Centrarchus_macropterus  0.5 0.5  0.0
## Enneacanthus_chaetodon   1.0 0.0  0.0
## Enneacanthus_gloriosus   1.0 0.0  0.0
## Enneacanthus_obesus      1.0 0.0  0.0
## Pomoxis_annularis        0.0 1.0  0.0
## Pomoxis_nigromaculatus   0.0 1.0  0.0
## Archoplites_interruptus  0.0 1.0  0.0
## Ambloplites_ariommus     0.0 0.5  0.5
## Ambloplites_rupestris    0.0 1.0  0.0
## Ambloplites_cavifrons    0.0 1.0  0.0
## Ambloplites_constellatus 0.5 0.5  0.0
tree<-read.tree("Centrarchidae.tre")
tree
## 
## Phylogenetic tree with 32 tips and 31 internal nodes.
## 
## Tip labels:
##  Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
## 
## Rooted; includes branch lengths.
ordered<-matrix(c(0,1,0,1,0,2,0,2,0),3,3)
colnames(ordered)<-rownames(ordered)<-colnames(X)
ordered ## this is a symmetric model
##      not mod high
## not    0   1    0
## mod    1   0    2
## high   0   2    0
fit<-rerootingMethod(tree,as.matrix(X),model=ordered)
fit
## Ancestral character estimates using re-rooting method
## of Yang et al. (1995):
##                               not      mod     high
## Acantharchus_pomotis     0.000000 1.000000 0.000000
## Lepomis_gibbosus         1.000000 0.000000 0.000000
## Lepomis_microlophus      1.000000 0.000000 0.000000
## Lepomis_punctatus        1.000000 0.000000 0.000000
## Lepomis_miniatus         1.000000 0.000000 0.000000
## Lepomis_auritus          1.000000 0.000000 0.000000
## Lepomis_marginatus       1.000000 0.000000 0.000000
## Lepomis_megalotis        1.000000 0.000000 0.000000
## Lepomis_humilis          1.000000 0.000000 0.000000
## Lepomis_macrochirus      1.000000 0.000000 0.000000
## Lepomis_gulosus          0.000000 1.000000 0.000000
## Lepomis_symmetricus      1.000000 0.000000 0.000000
## Lepomis_cyanellus        0.000000 1.000000 0.000000
## Micropterus_cataractae   0.000000 0.000000 1.000000
## Micropterus_coosae       0.000000 1.000000 0.000000
## Micropterus_notius       0.000000 1.000000 0.000000
## Micropterus_treculi      0.000000 0.162248 0.837752
## Micropterus_salmoides    0.000000 0.000000 1.000000
## Micropterus_floridanus   0.000000 0.000000 1.000000
## Micropterus_punctulatus  0.000000 0.000000 1.000000
## Micropterus_dolomieu     0.000000 0.000000 1.000000
## Centrarchus_macropterus  0.488447 0.511553 0.000000
## Enneacanthus_chaetodon   1.000000 0.000000 0.000000
## Enneacanthus_gloriosus   1.000000 0.000000 0.000000
## Enneacanthus_obesus      1.000000 0.000000 0.000000
## Pomoxis_annularis        0.000000 1.000000 0.000000
## Pomoxis_nigromaculatus   0.000000 1.000000 0.000000
## Archoplites_interruptus  0.000000 1.000000 0.000000
## Ambloplites_ariommus     0.000000 0.929589 0.070411
## Ambloplites_rupestris    0.000000 1.000000 0.000000
## Ambloplites_cavifrons    0.000000 1.000000 0.000000
## Ambloplites_constellatus 0.184185 0.815815 0.000000
## 33                       0.371487 0.556580 0.071933
## 34                       0.369454 0.594949 0.035596
## 35                       0.380361 0.568397 0.051241
## 36                       0.870598 0.129096 0.000306
## 37                       0.990482 0.009515 0.000003
## 38                       0.996452 0.003547 0.000001
## 39                       0.999496 0.000504 0.000000
## 40                       0.999909 0.000091 0.000000
## 41                       0.999225 0.000775 0.000000
## 42                       0.999763 0.000237 0.000000
## 43                       0.863569 0.136268 0.000163
## 44                       0.988738 0.011256 0.000007
## 45                       0.492484 0.506872 0.000644
## 46                       0.501701 0.498216 0.000083
## 47                       0.001172 0.153662 0.845165
## 48                       0.000041 0.152108 0.847852
## 49                       0.000017 0.179852 0.820132
## 50                       0.000012 0.178320 0.821668
## 51                       0.000005 0.100775 0.899220
## 52                       0.000001 0.011369 0.988631
## 53                       0.000000 0.001387 0.998613
## 54                       0.356588 0.637814 0.005598
## 55                       0.344129 0.653839 0.002032
## 56                       0.907201 0.092532 0.000267
## 57                       0.994351 0.005647 0.000002
## 58                       0.172986 0.824630 0.002384
## 59                       0.082450 0.905130 0.012419
## 60                       0.095587 0.898916 0.005497
## 61                       0.027088 0.970582 0.002329
## 62                       0.001848 0.988018 0.010133
## 63                       0.031222 0.966869 0.001908
## 
## Estimated transition matrix,
## Q =
##            not       mod      high
## not  -2.685915  2.685915  0.000000
## mod   2.685915 -5.016560  2.330645
## high  0.000000  2.330645 -2.330645
## 
## **Note that if Q is not symmetric the marginal
## reconstructions may be invalid.
## 
## Log-likelihood = -24.373974
plot(fit)

plot of chunk unnamed-chunk-1

Just for fun, let's re-analyze these data with the threshold model:

mcmc<-ancThresh(tree,X,ngen=1000000,control=list(print=FALSE))
## **** NOTE: no sequence provided, column order in x
## MCMC starting....
mcmc
## 
## Object containing the results from an MCMC analysis
## of the threshold model using ancThresh.
## 
## List with the following components:
## ace: matrix with posterior probabilities assuming 2e+05 
##  burn-in generations.
## mcmc:    posterior sample of liabilities at tips & internal
##  nodes (a matrix with 1001 rows & 31 columns).
## par: posterior sample of the relative positions of the
##  thresholds, the log-likelihoods, and any other
##      model variables (a matrix with 1001 rows).
## 
## The MCMC was run under the following conditions:
##  seq = not <-> mod <-> high 
##  model = BM 
##  number of generations = 1e+06 
##  sample interval= 1000 
##  burn-in = 2e+05
pp<-plot(mcmc)

plot of chunk unnamed-chunk-2

and compare the results:

plot(fit$marginal.anc,pp,pch=21,cex=1.5,bg="grey",
    xlab="marginal states from rerootingMethod",
    ylab="posterior probabilities from ancThresh")
lines(c(-0.1,1.1),c(-0.1,1.1),lwd=4,
    col=make.transparent("blue",0.25))

plot of chunk unnamed-chunk-3

They are correlated, but different - which is good, because these are very different models for how the data arose on the tree. Which one is right? Well, that's a question for another day!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.