Yesterday in these pages I wrote about “global” vs. “local” marginal ancestral state reconstruction of discrete traits. If you haven’t seen that post, I recommend you check it out (here) before continuing.
Nonetheless, in brief, global marginal ancestral state estimation involves finding a single value of Q (usually the value that makes the probability of our data highest – that is, the maximum likelihood estimate), fixing to that value of Q, and then computing the marginal probabilities of our data at each node conditioning on each value of the trait and also our fixed value of Q. Global ancestral state estimation is the predominant method of ancestral character reconstruction of phenotypic traits, and is the method implemented in phytools, corHMM, ape, and other software.
Local ancestral state estimation is similar, except that rather than conditioning on a single value of Q, for each node and character condition we maximize the likelihood of Q. This approach was described by Pagel (1999), but has never been widely adopted – in large part, I suspect, due to the fact that it is very computational intensive to optimize Q for each node and character level combination.
I have now implemented local estimation as an option in phytools::ancr
. My implementation is not very fast. Perhaps if there is interest I will work out how to speed it up. (One option would be to use the very fast R package castor by Stilianos Louca, but I have not investigated that yet.)
Just to see how (and that) it works, let’s just reiterate our analysis of the sunfish data for feeding mode that we examined yesterday:
## load phytools
library(phytools)
## check package version
packageVersion("phytools")
## [1] '2.4.2'
## load tree and data
data(sunfish.tree)
sunfish.tree<-as.phylo(sunfish.tree)
data(sunfish.data)
feeding<-setNames(sunfish.data$feeding.mode,
rownames(sunfish.data))
feeding
## Acantharchus_pomotis Lepomis_gibbosus Lepomis_microlophus
## pisc non non
## Lepomis_punctatus Lepomis_miniatus Lepomis_auritus
## non non non
## Lepomis_marginatus Lepomis_megalotis Lepomis_humilis
## non non non
## Lepomis_macrochirus Lepomis_gulosus Lepomis_symmetricus
## non pisc non
## Lepomis_cyanellus Micropterus_coosae Micropterus_notius
## pisc pisc pisc
## Micropterus_treculi Micropterus_salmoides Micropterus_floridanus
## pisc pisc pisc
## Micropterus_punctulatus Micropterus_dolomieu Centrarchus_macropterus
## pisc pisc non
## Enneacantus_obesus Pomoxis_annularis Pomoxis_nigromaculatus
## non pisc pisc
## Archolites_interruptus Ambloplites_ariommus Ambloplites_rupestris
## pisc pisc pisc
## Ambloplites_cavifrons
## pisc
## Levels: non pisc
## fit ARD model
sunfish_ard<-fitMk(sunfish.tree,feeding,model="ARD")
sunfish_ard
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## non pisc
## non -6.087789 6.087789
## pisc 3.048905 -3.048905
##
## Fitted (or set) value of pi:
## non pisc
## 0.5 0.5
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -12.864937
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
## perform global reconstruction
sunfish_global<-ancr(sunfish_ard)
sunfish_global
## Marginal ancestral state estimates:
## non pisc
## 29 0.554292 0.445708
## 30 0.568076 0.431924
## 31 0.589648 0.410352
## 32 0.931603 0.068397
## 33 0.995898 0.004102
## 34 0.998398 0.001602
## ...
##
## Log-likelihood = -12.864937
## perform local reconstruction
sunfish_local<-ancr(sunfish_ard,local=TRUE)
sunfish_local
## Marginal ancestral state estimates
## (local method):
## non pisc
## 29 0.610190 0.389810
## 30 0.611506 0.388494
## 31 0.627541 0.372459
## 32 0.781553 0.218447
## 33 0.984348 0.015652
## 34 0.990881 0.009119
## ...
##
## Log-likelihood = -12.494214
## plot our results
par(mfrow=c(1,2))
plot(sunfish_global,
args.plotTree=list(mar=c(1.1,1.1,2.1,1.1),
fsize=0.8),
args.nodelabels=list(piecol=viridis::plasma(n=2),
cex=1),
args.tiplabels=list(cex=0.7))
mtext("a) \"global\" ASR",line=0,adj=0)
plot(sunfish_local,
args.plotTree=list(mar=c(1.1,1.1,2.1,1.1),
fsize=0.8),
args.nodelabels=list(piecol=viridis::plasma(n=2),
cex=1),
args.tiplabels=list(cex=0.7))
mtext("b) \"local\" ASR",line=0,adj=0)
One speed-up that I incorporated when I added this to phytools was allowing the model optimization to be distributed in parallel across computer cores. If you have a lot of cores (my laptop has 16), then this will speed up the calculations a lot.
Here’s an example using parity mode in Liolaemidae with data from Esquerre et al. (2019).
## load tree and data
data(liolaemid.tree)
data(liolaemid.data)
parity_mode<-setNames(liolaemid.data$parity_mode,
rownames(liolaemid.data))
levels(parity_mode)<-c("oviparous","viviparous")
head(parity_mode)
## Ctenoblepharys_adspersa Liolaemus_abaucan Liolaemus_albiceps
## oviparous oviparous viviparous
## Liolaemus_andinus Liolaemus_annectens Liolaemus_anomalus
## viviparous viviparous oviparous
## Levels: oviparous viviparous
## fit model
liolaemid_ard<-fitMk(liolaemid.tree,parity_mode,
model="ARD")
liolaemid_ard
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## oviparous viviparous
## oviparous -0.036703 0.036703
## viviparous 0.000000 0.000000
##
## Fitted (or set) value of pi:
## oviparous viviparous
## 0.5 0.5
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -64.963604
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
## global ASR
liolaemid_global<-ancr(liolaemid_ard)
liolaemid_global
## Marginal ancestral state estimates:
## oviparous viviparous
## 258 1 0
## 259 1 0
## 260 1 0
## 261 1 0
## 262 1 0
## 263 1 0
## ...
##
## Log-likelihood = -64.963604
## local ASR
liolaemid_local<-ancr(liolaemid_ard,local=TRUE,
parallel=TRUE)
liolaemid_local
## Marginal ancestral state estimates
## (local method):
## oviparous viviparous
## 258 0.881896 0.118104
## 259 0.898255 0.101745
## 260 0.903452 0.096548
## 261 0.897139 0.102861
## 262 0.924405 0.075595
## 263 0.998674 0.001326
## ...
##
## Log-likelihood = -64.837923
piecex<-apply(liolaemid_local$ace,1,
function(x) if(any(x>0.95)) 0.2 else 0.4)
plot(liolaemid_local,
args.plotTree=list(type="arc",arc_height=0.25,
fsize=0.2),
args.nodelabels=list(piecol=c("grey",palette()[2]),
cex=piecex),
args.tiplabels=list(cex=0.12),
legend=FALSE)
h<-max(nodeHeights(liolaemid.tree))
legend(x=0.2*h,y=0.35*h,levels(parity_mode),
pch=16,col=c("grey",palette()[2]),pt.cex=1.5,
cex=0.9,bty="n")
Cool!
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.