Monday, December 30, 2024

Local marginal ancestral state reconstruction of discrete phenotypic traits now in phytools

Yesterday in these pages I wrote about “global” vs. “local” marginal ancestral state reconstruction of discrete traits. If you haven’t seen that post, I recommend you check it out (here) before continuing.

Nonetheless, in brief, global marginal ancestral state estimation involves finding a single value of Q (usually the value that makes the probability of our data highest – that is, the maximum likelihood estimate), fixing to that value of Q, and then computing the marginal probabilities of our data at each node conditioning on each value of the trait and also our fixed value of Q. Global ancestral state estimation is the predominant method of ancestral character reconstruction of phenotypic traits, and is the method implemented in phytools, corHMM, ape, and other software.

Local ancestral state estimation is similar, except that rather than conditioning on a single value of Q, for each node and character condition we maximize the likelihood of Q. This approach was described by Pagel (1999), but has never been widely adopted – in large part, I suspect, due to the fact that it is very computational intensive to optimize Q for each node and character level combination.

I have now implemented local estimation as an option in phytools::ancr. My implementation is not very fast. Perhaps if there is interest I will work out how to speed it up. (One option would be to use the very fast R package castor by Stilianos Louca, but I have not investigated that yet.)

Just to see how (and that) it works, let’s just reiterate our analysis of the sunfish data for feeding mode that we examined yesterday:

## load phytools
library(phytools)
## check package version
packageVersion("phytools")
## [1] '2.4.2'
## load tree and data
data(sunfish.tree)
sunfish.tree<-as.phylo(sunfish.tree)
data(sunfish.data)
feeding<-setNames(sunfish.data$feeding.mode,
  rownames(sunfish.data))
feeding
##    Acantharchus_pomotis        Lepomis_gibbosus     Lepomis_microlophus 
##                    pisc                     non                     non 
##       Lepomis_punctatus        Lepomis_miniatus         Lepomis_auritus 
##                     non                     non                     non 
##      Lepomis_marginatus       Lepomis_megalotis         Lepomis_humilis 
##                     non                     non                     non 
##     Lepomis_macrochirus         Lepomis_gulosus     Lepomis_symmetricus 
##                     non                    pisc                     non 
##       Lepomis_cyanellus      Micropterus_coosae      Micropterus_notius 
##                    pisc                    pisc                    pisc 
##     Micropterus_treculi   Micropterus_salmoides  Micropterus_floridanus 
##                    pisc                    pisc                    pisc 
## Micropterus_punctulatus    Micropterus_dolomieu Centrarchus_macropterus 
##                    pisc                    pisc                     non 
##      Enneacantus_obesus       Pomoxis_annularis  Pomoxis_nigromaculatus 
##                     non                    pisc                    pisc 
##  Archolites_interruptus    Ambloplites_ariommus   Ambloplites_rupestris 
##                    pisc                    pisc                    pisc 
##   Ambloplites_cavifrons 
##                    pisc 
## Levels: non pisc
## fit ARD model
sunfish_ard<-fitMk(sunfish.tree,feeding,model="ARD")
sunfish_ard
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            non      pisc
## non  -6.087789  6.087789
## pisc  3.048905 -3.048905
## 
## Fitted (or set) value of pi:
##  non pisc 
##  0.5  0.5 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -12.864937 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
## perform global reconstruction
sunfish_global<-ancr(sunfish_ard)
sunfish_global
## Marginal ancestral state estimates:
##         non     pisc
## 29 0.554292 0.445708
## 30 0.568076 0.431924
## 31 0.589648 0.410352
## 32 0.931603 0.068397
## 33 0.995898 0.004102
## 34 0.998398 0.001602
## ...
## 
## Log-likelihood = -12.864937
## perform local reconstruction
sunfish_local<-ancr(sunfish_ard,local=TRUE)
sunfish_local
## Marginal ancestral state estimates
##   (local method):
##         non     pisc
## 29 0.610190 0.389810
## 30 0.611506 0.388494
## 31 0.627541 0.372459
## 32 0.781553 0.218447
## 33 0.984348 0.015652
## 34 0.990881 0.009119
## ...
## 
## Log-likelihood = -12.494214
## plot our results
par(mfrow=c(1,2))
plot(sunfish_global,
  args.plotTree=list(mar=c(1.1,1.1,2.1,1.1),
    fsize=0.8),
  args.nodelabels=list(piecol=viridis::plasma(n=2),
    cex=1),
  args.tiplabels=list(cex=0.7))
mtext("a) \"global\" ASR",line=0,adj=0)
plot(sunfish_local,
  args.plotTree=list(mar=c(1.1,1.1,2.1,1.1),
    fsize=0.8),
  args.nodelabels=list(piecol=viridis::plasma(n=2),
    cex=1),
  args.tiplabels=list(cex=0.7))
mtext("b) \"local\" ASR",line=0,adj=0)

plot of chunk unnamed-chunk-41

One speed-up that I incorporated when I added this to phytools was allowing the model optimization to be distributed in parallel across computer cores. If you have a lot of cores (my laptop has 16), then this will speed up the calculations a lot.

Here’s an example using parity mode in Liolaemidae with data from Esquerre et al. (2019).

## load tree and data
data(liolaemid.tree)
data(liolaemid.data)
parity_mode<-setNames(liolaemid.data$parity_mode,
  rownames(liolaemid.data))
levels(parity_mode)<-c("oviparous","viviparous")
head(parity_mode)
## Ctenoblepharys_adspersa       Liolaemus_abaucan      Liolaemus_albiceps 
##               oviparous               oviparous              viviparous 
##       Liolaemus_andinus     Liolaemus_annectens      Liolaemus_anomalus 
##              viviparous              viviparous               oviparous 
## Levels: oviparous viviparous
## fit model
liolaemid_ard<-fitMk(liolaemid.tree,parity_mode,
  model="ARD")
liolaemid_ard
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            oviparous viviparous
## oviparous  -0.036703   0.036703
## viviparous  0.000000   0.000000
## 
## Fitted (or set) value of pi:
##  oviparous viviparous 
##        0.5        0.5 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -64.963604 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
## global ASR
liolaemid_global<-ancr(liolaemid_ard)
liolaemid_global
## Marginal ancestral state estimates:
##     oviparous viviparous
## 258         1          0
## 259         1          0
## 260         1          0
## 261         1          0
## 262         1          0
## 263         1          0
## ...
## 
## Log-likelihood = -64.963604
## local ASR
liolaemid_local<-ancr(liolaemid_ard,local=TRUE,
  parallel=TRUE)
liolaemid_local
## Marginal ancestral state estimates
##   (local method):
##     oviparous viviparous
## 258  0.881896   0.118104
## 259  0.898255   0.101745
## 260  0.903452   0.096548
## 261  0.897139   0.102861
## 262  0.924405   0.075595
## 263  0.998674   0.001326
## ...
## 
## Log-likelihood = -64.837923
piecex<-apply(liolaemid_local$ace,1,
  function(x) if(any(x>0.95)) 0.2 else 0.4)
plot(liolaemid_local,
  args.plotTree=list(type="arc",arc_height=0.25,
    fsize=0.2),
  args.nodelabels=list(piecol=c("grey",palette()[2]),
    cex=piecex),
  args.tiplabels=list(cex=0.12),
  legend=FALSE)
h<-max(nodeHeights(liolaemid.tree))
legend(x=0.2*h,y=0.35*h,levels(parity_mode),
  pch=16,col=c("grey",palette()[2]),pt.cex=1.5,
  cex=0.9,bty="n")

plot of chunk unnamed-chunk-47

Cool!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.