Monday, March 15, 2021

Fitting a multi-rate, multi-correlation model to quantitative character data using phytools

As part of a project with colleagues, I just added some features to the phytools function evol.vcv and did a major, major overhaul of the related function evolvcv.lite.

evol.vcv, for those that don't know it, fits, the variable evolutionary correlation model that I developed with David Collar and originally published in 2009.

The idea of this method is pretty straightforward – it takes a data matrix containing two or more quantitative traits plus a phylogeny with mapped 'regimes', and then it fits two models: one in which the Brownian motion evolutionary rates and the evolutionary correlations between traits are constant across all the edges of the tree; and a second in which the rates and correlations can differ according to our different mapped regimes.

The idea here might be, for instance, that our regimes represent different constraints on multivariate evolution between our two or more traits, as reflected by a tendency for our traits to coevolve (or not).

The major feature updates that I added to evol.vcv yesterday is that it can now take as input not only the trait data, but sampling variances and (crucially) covariances for each set of species means for each taxon in our phylogeny. (Sampling variances are just the square of the standard error of the mean for each species. Sampling covariances are computed in the same way.) I'll try to get into the reasons this could be important in another post.

We can supply this through the 'hidden' function argument error_vcv which should be a list of m × m matrices for each of our N species in the tree.

In addition to this I also performed a major overhaul of the closely related function evolvcv.lite.

evolvcv.lite fits the same two models as evol.vcv, but also fit two other intermediate models – a model in which the rates differ between our regimes, but not the evolutionary correlation; and a second in which the evolutionary correlation differs, but not the rates. The main limitation of evolvcv.lite is that it can be used for only trait data consisting of two traits (though it works for an arbitrary number of different regimes).

In addition to adding sampling error to evolvcv.lite I also added a set of other intermediate models between the two extremes of identiy and no common structure. These are as follows: model 2b. different rates for trait 1 only, common correlation; model 2c. different rates for trait 2 only, common correlation; model 3b. different rates for trait 1 only, different correlations; and model 3c. different rates for trait 2 only, different correlation.

Whereas before the user was required to fit all four of the base models, now evolvcv.lite allows user control of which of these 8 models to fit.

Let's see an example using the sunfish example of our original study. (*Note that for multiple reasons these are slightly different data, and a slightly different phylogeny, then in our publication!)

library(phytools)
packageVersion("phytools")
## [1] '0.7.71'
data(sunfish.tree)
sunfish.tree
## 
## Phylogenetic tree with 28 tips and 27 internal nodes.
## 
## Tip labels:
##  Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
## 
## The tree includes a mapped, 2-state discrete character
## with states:
##  non, pisc
## 
## Rooted; includes branch lengths.
data(sunfish.data)
sunfish.data
##                         feeding.mode gape.width buccal.length
## Acantharchus_pomotis            pisc      0.114        -0.009
## Lepomis_gibbosus                 non     -0.133        -0.009
## Lepomis_microlophus              non     -0.151         0.012
## Lepomis_punctatus                non     -0.103        -0.019
## Lepomis_miniatus                 non     -0.134         0.001
## Lepomis_auritus                  non     -0.222        -0.039
## Lepomis_marginatus               non     -0.187        -0.075
## Lepomis_megalotis                non     -0.073        -0.049
## Lepomis_humilis                  non      0.024        -0.027
## Lepomis_macrochirus              non     -0.191         0.002
## Lepomis_gulosus                 pisc      0.131         0.122
## Lepomis_symmetricus              non      0.013        -0.025
## Lepomis_cyanellus               pisc     -0.002        -0.009
## Micropterus_coosae              pisc      0.045        -0.009
## Micropterus_notius              pisc      0.097        -0.009
## Micropterus_treculi             pisc      0.056         0.001
## Micropterus_salmoides           pisc      0.056        -0.059
## Micropterus_floridanus          pisc      0.096         0.051
## Micropterus_punctulatus         pisc      0.092         0.002
## Micropterus_dolomieu            pisc      0.035        -0.069
## Centrarchus_macropterus          non     -0.007        -0.055
## Enneacantus_obesus               non      0.016        -0.005
## Pomoxis_annularis               pisc     -0.004        -0.019
## Pomoxis_nigromaculatus          pisc      0.105         0.041
## Archolites_interruptus          pisc     -0.024         0.051
## Ambloplites_ariommus            pisc      0.135         0.123
## Ambloplites_rupestris           pisc      0.086         0.041
## Ambloplites_cavifrons           pisc      0.130         0.040

First, we can plot the tree:

cols<-setNames(palette()[c(4,2)],c("non","pisc"))
plot(sunfish.tree,cols,ftype="i",lwd=3)
legend("topleft",c("non-piscivorous","piscivorous"),
    pch=15,pt.cex=2,col=cols)

plot of chunk unnamed-chunk-3

or project our trait data into a phylomorphospace:

phylomorphospace(sunfish.tree,sunfish.data[,2:3],colors=cols,
    ftype="off",bty="n",lwd=3,node.size=c(0,1.5),
    node.by.map=TRUE,xlab="relative gape width",
    ylab="relative buccal length")
legend("topleft",c("non-piscivorous","piscivorous"),
    pch=21,pt.cex=2,pt.bg=cols,bty="n")
title(main="Phylomorphospace plot of Centrarchidae",font.main=3)

plot of chunk unnamed-chunk-4

Neat.

OK, now we're ready to fit our models.

We can start with the four models of the original evolvcv.lite function. This is a relatively small dataset, so they should run pretty fast.

fits<-evolvcv.lite(sunfish.tree,sunfish.data[,2:3])
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 4: no common structure...
fits
## Model 1: common rates, common correlation 
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## fitted   0.1139  0.033   0.0556  5   72.1893 -134.3786   
## 
## (R thinks it has found the ML solution for model 1.)
## 
## Model 2: different rates, common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.1831  0.0272  0.02    7   77.9869 -141.9738   
## pisc 0.0489  0.0297  0.0889  
## 
## (R thinks it has found the ML solution for model 2.)
## 
## Model 3: common rates, different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.0991  0.0129  0.0635  6   73.6144 -135.2287   
## pisc 0.0991  0.0539  0.0635  
## 
## (R thinks it has found the ML solution for model 3.)
## 
## Model 4: no common structure
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.1388  -0.0022 0.012   8   81.2392 -146.4785   
## pisc 0.0757  0.0795  0.129   
## 
## (R thinks it has found the ML solution for model 4.)

We can see from this result that all four models seem to have converged, and the best supported model (taking into account complexity) is model 4: the no common structure model.

But let's try all 8 possible models, and see if the result changes. I hope it does!

fits.all<-evolvcv.lite(sunfish.tree,sunfish.data[,2:3],
    models="all models")
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 2b: different rates (trait 1), common correlation...
## Fitting model 2c: different rates (trait 2), common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 3b: different rates (trait 1), different correlation...
## Fitting model 3c: different rates (trait 2), different correlation...
## Fitting model 4: no common structure...
fits.all
## Model 1: common rates, common correlation 
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## fitted   0.114   0.033   0.0556  5   72.1893 -134.3786   
## 
## (R thinks it has found the ML solution for model 1.)
## 
## Model 2: different rates, common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.1831  0.0272  0.02    7   77.9869 -141.9738   
## pisc 0.0489  0.0296  0.0889  
## 
## (R thinks it has found the ML solution for model 2.)
## 
## Model 2b: different rates (trait 1), common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.2006  0.0578  0.0553  6   75.9661 -139.9321   
## pisc 0.044   0.0271  0.0553  
## 
## (R thinks it has found the ML solution for model 2b.)
## 
## Model 2c: different rates (trait 2), common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.1142  0.0145  0.0166  6   75.3309 -138.6619   
## pisc 0.1142  0.0357  0.1011  
## 
## (R thinks it has found the ML solution for model 2c.)
## 
## Model 3: common rates, different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.0991  0.0129  0.0635  6   73.6144 -135.2287   
## pisc 0.0991  0.0539  0.0635  
## 
## (R thinks it has found the ML solution for model 3.)
## 
## Model 3b: different rates (trait 1), different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.1677  0.0343  0.0556  7   76.4875 -138.975    
## pisc 0.0454  0.0324  0.0556  
## 
## (R thinks it has found the ML solution for model 3b.)
## 
## Model 3c: different rates (trait 2), different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.1127  0   0.0119  7   80.6933 -147.3866   
## pisc 0.1127  0.1131  0.1564  
## 
## (R thinks it has found the ML solution for model 3c.)
## 
## Model 4: no common structure
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  0.1387  -0.0021 0.0121  8   81.2388 -146.4775   
## pisc 0.0763  0.0803  0.1298  
## 
## (R thinks it has found the ML solution for model 4.)

Neat. Now our result changes a tiny bit. Instead of the best supported model being the no common structure model, our model that is best supported by the data, taking into account model complexity, is model 3c – the 'different rates (trait 2 only), different correlation model.' (Although, to be fair, the ΔAIC between the two best-fitting models is pretty small. Still, since our preferred model is the simpler model also, I think the principle of parsimony would tell us to go with it!)

I've always found correlations a bit easier to interpret than covariances. Let's pull out our evolutionary covariance matrices (sometimes called Brownian 'rate matrices') and compute evolutionary correlations between the two traits for each one.

This can be done using the R base function cov2cor as follows:

cov2cor(fits.all$model3c$R$non)
##              [,1]         [,2]
## [1,] 1.0000000000 0.0000000001
## [2,] 0.0000000001 1.0000000000
cov2cor(fits.all$model3c$R$pisc)
##         [,1]    [,2]
## [1,] 1.00000 0.85224
## [2,] 0.85224 1.00000

Neat. In the best fitting model we go from basically an effective correlation of 0 on the blue branches, to 0.85 on the red branches of the phylogeny.

Just for fun, let's use simulation to generate data in which the sign of the correlation flips completely depending on the regime, the rate of trait evolution for trait 2 changes, but the rate for trait 1 stays the same – and then see what we get.

For simulation, I will use the sunfish tree, and the following matrices:

vcv<-setNames(list(
    matrix(c(2,1.13,1.13,1),2,2),
    matrix(c(2,-1.5,-1.5,2),2,2)),
    c("non","pisc"))
vcv
## $non
##      [,1] [,2]
## [1,] 2.00 1.13
## [2,] 1.13 1.00
## 
## $pisc
##      [,1] [,2]
## [1,]  2.0 -1.5
## [2,] -1.5  2.0
sim.data<-sim.corrs(sunfish.tree,vcv)
sim.data
##                                 [,1]        [,2]
## Acantharchus_pomotis    -0.371684795  0.40841302
## Lepomis_gibbosus        -0.042544962 -0.40828366
## Lepomis_microlophus      0.240558729 -0.21920005
## Lepomis_punctatus        0.034024263 -0.46007965
## Lepomis_miniatus         0.254541485 -0.40712468
## Lepomis_auritus         -0.156165764 -0.75086767
## Lepomis_marginatus      -0.007931413 -0.65704540
## Lepomis_megalotis       -0.074483229 -0.51354826
## Lepomis_humilis          0.254935589 -0.37641573
## Lepomis_macrochirus      0.447610067 -0.29662989
## Lepomis_gulosus         -0.014953560  0.02903732
## Lepomis_symmetricus      0.210644771 -0.35758651
## Lepomis_cyanellus        0.262097771 -0.21522591
## Micropterus_coosae       0.364080538 -0.37286095
## Micropterus_notius       0.430283637 -0.23973592
## Micropterus_treculi      0.299777050 -0.41508006
## Micropterus_salmoides    0.165997665 -0.28206826
## Micropterus_floridanus   0.269297465 -0.29304472
## Micropterus_punctulatus  0.618828045 -0.76414165
## Micropterus_dolomieu     0.734179458 -0.69705338
## Centrarchus_macropterus -0.700929526 -0.49840916
## Enneacantus_obesus       0.779775864  0.27662381
## Pomoxis_annularis       -0.131491324  0.28093372
## Pomoxis_nigromaculatus   0.576939076 -0.43978368
## Archolites_interruptus   0.554942656 -0.06412124
## Ambloplites_ariommus     0.774795549 -0.77056186
## Ambloplites_rupestris    0.895776231 -1.11081569
## Ambloplites_cavifrons    0.413002387 -0.58559226
phylomorphospace(sunfish.tree,sim.data,colors=cols,
    ftype="off",bty="n",lwd=3,node.size=c(0,1.5),
    node.by.map=TRUE,xlab="simulated trait 1",
    ylab="simulated trait 2")
legend("topleft",c("non-piscivorous","piscivorous"),
    pch=21,pt.cex=2,pt.bg=cols,bty="n")
title(main="Phylomorphospace plot with simulated data",
    font.main=3)

plot of chunk unnamed-chunk-9

OK, now let's fit our models – first with the original four models:

fits.4models<-evolvcv.lite(sunfish.tree,sim.data)
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 4: no common structure...
fits.4models
## Model 1: common rates, common correlation 
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## fitted   1.158   -0.2459 1.1344  5   -4.4665 18.9331 
## 
## (R thinks it has found the ML solution for model 1.)
## 
## Model 2: different rates, common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.3855  -0.1973 0.8032  7   -3.7345 21.469  
## pisc 0.9442  -0.2177 1.4344  
## 
## (R thinks it has found the ML solution for model 2.)
## 
## Model 3: common rates, different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.1514  0.7523  1.1368  6   2.022   7.956   
## pisc 1.1514  -0.7947 1.1368  
## 
## (R thinks it has found the ML solution for model 3.)
## 
## Model 4: no common structure
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.0575  0.4917  0.5516  8   4.8213  6.3574  
## pisc 1.2807  -1.2334 1.9461  
## 
## (R thinks it has found the ML solution for model 4.)

This tells us “Model 4: no common structure” is the one best supported by our data. Let's try all eight models, including the new four, instead:

fits.allmodels<-evolvcv.lite(sunfish.tree,sim.data,
    models="all models")
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 2b: different rates (trait 1), common correlation...
## Fitting model 2c: different rates (trait 2), common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 3b: different rates (trait 1), different correlation...
## Fitting model 3c: different rates (trait 2), different correlation...
## Error in solve.default(V) : 
##   system is computationally singular: reciprocal condition number = 2.31326e-21
## Fitting model 4: no common structure...

One of the neat things about this optimization now is that it survived what could have been a fatal error (system is computationally singular...) and continued on.

Let's look at the result.

fits.allmodels
## Model 1: common rates, common correlation 
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## fitted   1.158   -0.2459 1.1344  5   -4.4665 18.9331 
## 
## (R thinks it has found the ML solution for model 1.)
## 
## Model 2: different rates, common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.3855  -0.1973 0.8032  7   -3.7345 21.469  
## pisc 0.9442  -0.2177 1.4344  
## 
## (R thinks it has found the ML solution for model 2.)
## 
## Model 2b: different rates (trait 1), common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.2387  0   1.1344  6   -5.0916 22.1832 
## pisc 1.0653  0   1.1344  
## 
## (R thinks it has found the ML solution for model 2b.)
## 
## Model 2c: different rates (trait 2), common correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.158   0   0.6823  6   -4.051  20.1021 
## pisc 1.158   0   1.6604  
## 
## (R thinks it has found the ML solution for model 2c.)
## 
## Model 3: common rates, different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.1514  0.7523  1.1368  6   2.022   7.956   
## pisc 1.1514  -0.7947 1.1368  
## 
## (R thinks it has found the ML solution for model 3.)
## 
## Model 3b: different rates (trait 1), different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.3762  0.8837  1.1436  7   2.1995  9.6011  
## pisc 1.0059  -0.7425 1.1436  
## 
## (R thinks it has found the ML solution for model 3b.)
## 
## Model 3c: different rates (trait 2), different correlation
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.153   0.5326  0.5717  7   4.7579  4.4842  
## pisc 1.153   -1.1203 1.8564  
## 
## (R thinks it has found the ML solution for model 3c.)
## 
## Model 4: no common structure
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)  AIC
## non  1.0574  0.4917  0.5516  8   4.8213  6.3574  
## pisc 1.2807  -1.2333 1.946   
## 
## (R thinks it has found the ML solution for model 4.)

Now, compared to before, the picture changes. Model 3c, “different rates (trait 2), different correlation” is the best fitting. This is precisely the model we simulated under. Great!

1 comment:

  1. Thanks for this, have you written any post on plotting admixture results on pie charts on a map?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.