As part of a project with colleagues, I just added some features to the phytools function evol.vcv
and did
a major, major overhaul of the related function evolvcv.lite
.
evol.vcv
, for those that don't know it, fits, the variable evolutionary correlation model that I developed with
David Collar and originally published in
2009.
The idea of this method is pretty straightforward – it takes a data matrix containing two or more quantitative traits plus a phylogeny with mapped 'regimes', and then it fits two models: one in which the Brownian motion evolutionary rates and the evolutionary correlations between traits are constant across all the edges of the tree; and a second in which the rates and correlations can differ according to our different mapped regimes.
The idea here might be, for instance, that our regimes represent different constraints on multivariate evolution between our two or more traits, as reflected by a tendency for our traits to coevolve (or not).
The major feature updates
that I added to evol.vcv
yesterday is that it can now take as input not only the trait data, but sampling variances and
(crucially) covariances for each set of species means for each taxon in our phylogeny. (Sampling variances are just the
square of the standard error of the mean for each species. Sampling covariances are computed in the same way.) I'll try
to get into the reasons this could be important in another post.
We can supply this through the 'hidden' function argument error_vcv
which should be a list of m × m matrices
for each of our N species in the tree.
In addition to this I also performed a major overhaul of the closely related function evolvcv.lite
.
evolvcv.lite
fits the same two models as evol.vcv
, but also fit two other intermediate models – a model in which the
rates differ between our regimes, but not the evolutionary correlation; and a second in which the evolutionary correlation
differs, but not the rates. The main limitation of evolvcv.lite
is that it can be used for only trait data consisting
of two traits (though it works for an arbitrary number of different regimes).
In addition to adding sampling error to evolvcv.lite
I also added a set of other intermediate models between the two
extremes of identiy and no common structure. These are as follows: model 2b. different rates for trait 1 only, common
correlation; model 2c. different rates for trait 2 only, common correlation; model 3b. different rates for trait 1 only,
different correlations; and model 3c. different rates for trait 2 only, different correlation.
Whereas before the user was required to fit all four of the base models, now evolvcv.lite
allows user control of
which of these 8 models to fit.
Let's see an example using the sunfish example of our original study. (*Note that for multiple reasons these are slightly different data, and a slightly different phylogeny, then in our publication!)
library(phytools)
packageVersion("phytools")
## [1] '0.7.71'
data(sunfish.tree)
sunfish.tree
##
## Phylogenetic tree with 28 tips and 27 internal nodes.
##
## Tip labels:
## Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
##
## The tree includes a mapped, 2-state discrete character
## with states:
## non, pisc
##
## Rooted; includes branch lengths.
data(sunfish.data)
sunfish.data
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
## Lepomis_marginatus non -0.187 -0.075
## Lepomis_megalotis non -0.073 -0.049
## Lepomis_humilis non 0.024 -0.027
## Lepomis_macrochirus non -0.191 0.002
## Lepomis_gulosus pisc 0.131 0.122
## Lepomis_symmetricus non 0.013 -0.025
## Lepomis_cyanellus pisc -0.002 -0.009
## Micropterus_coosae pisc 0.045 -0.009
## Micropterus_notius pisc 0.097 -0.009
## Micropterus_treculi pisc 0.056 0.001
## Micropterus_salmoides pisc 0.056 -0.059
## Micropterus_floridanus pisc 0.096 0.051
## Micropterus_punctulatus pisc 0.092 0.002
## Micropterus_dolomieu pisc 0.035 -0.069
## Centrarchus_macropterus non -0.007 -0.055
## Enneacantus_obesus non 0.016 -0.005
## Pomoxis_annularis pisc -0.004 -0.019
## Pomoxis_nigromaculatus pisc 0.105 0.041
## Archolites_interruptus pisc -0.024 0.051
## Ambloplites_ariommus pisc 0.135 0.123
## Ambloplites_rupestris pisc 0.086 0.041
## Ambloplites_cavifrons pisc 0.130 0.040
First, we can plot the tree:
cols<-setNames(palette()[c(4,2)],c("non","pisc"))
plot(sunfish.tree,cols,ftype="i",lwd=3)
legend("topleft",c("non-piscivorous","piscivorous"),
pch=15,pt.cex=2,col=cols)
or project our trait data into a phylomorphospace:
phylomorphospace(sunfish.tree,sunfish.data[,2:3],colors=cols,
ftype="off",bty="n",lwd=3,node.size=c(0,1.5),
node.by.map=TRUE,xlab="relative gape width",
ylab="relative buccal length")
legend("topleft",c("non-piscivorous","piscivorous"),
pch=21,pt.cex=2,pt.bg=cols,bty="n")
title(main="Phylomorphospace plot of Centrarchidae",font.main=3)
Neat.
OK, now we're ready to fit our models.
We can start with the four models of the original evolvcv.lite
function. This is a relatively small dataset, so they
should run pretty fast.
fits<-evolvcv.lite(sunfish.tree,sunfish.data[,2:3])
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 4: no common structure...
fits
## Model 1: common rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## fitted 0.1139 0.033 0.0556 5 72.1893 -134.3786
##
## (R thinks it has found the ML solution for model 1.)
##
## Model 2: different rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.1831 0.0272 0.02 7 77.9869 -141.9738
## pisc 0.0489 0.0297 0.0889
##
## (R thinks it has found the ML solution for model 2.)
##
## Model 3: common rates, different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.0991 0.0129 0.0635 6 73.6144 -135.2287
## pisc 0.0991 0.0539 0.0635
##
## (R thinks it has found the ML solution for model 3.)
##
## Model 4: no common structure
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.1388 -0.0022 0.012 8 81.2392 -146.4785
## pisc 0.0757 0.0795 0.129
##
## (R thinks it has found the ML solution for model 4.)
We can see from this result that all four models seem to have converged, and the best supported model (taking into account complexity) is model 4: the no common structure model.
But let's try all 8 possible models, and see if the result changes. I hope it does!
fits.all<-evolvcv.lite(sunfish.tree,sunfish.data[,2:3],
models="all models")
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 2b: different rates (trait 1), common correlation...
## Fitting model 2c: different rates (trait 2), common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 3b: different rates (trait 1), different correlation...
## Fitting model 3c: different rates (trait 2), different correlation...
## Fitting model 4: no common structure...
fits.all
## Model 1: common rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## fitted 0.114 0.033 0.0556 5 72.1893 -134.3786
##
## (R thinks it has found the ML solution for model 1.)
##
## Model 2: different rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.1831 0.0272 0.02 7 77.9869 -141.9738
## pisc 0.0489 0.0296 0.0889
##
## (R thinks it has found the ML solution for model 2.)
##
## Model 2b: different rates (trait 1), common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.2006 0.0578 0.0553 6 75.9661 -139.9321
## pisc 0.044 0.0271 0.0553
##
## (R thinks it has found the ML solution for model 2b.)
##
## Model 2c: different rates (trait 2), common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.1142 0.0145 0.0166 6 75.3309 -138.6619
## pisc 0.1142 0.0357 0.1011
##
## (R thinks it has found the ML solution for model 2c.)
##
## Model 3: common rates, different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.0991 0.0129 0.0635 6 73.6144 -135.2287
## pisc 0.0991 0.0539 0.0635
##
## (R thinks it has found the ML solution for model 3.)
##
## Model 3b: different rates (trait 1), different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.1677 0.0343 0.0556 7 76.4875 -138.975
## pisc 0.0454 0.0324 0.0556
##
## (R thinks it has found the ML solution for model 3b.)
##
## Model 3c: different rates (trait 2), different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.1127 0 0.0119 7 80.6933 -147.3866
## pisc 0.1127 0.1131 0.1564
##
## (R thinks it has found the ML solution for model 3c.)
##
## Model 4: no common structure
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 0.1387 -0.0021 0.0121 8 81.2388 -146.4775
## pisc 0.0763 0.0803 0.1298
##
## (R thinks it has found the ML solution for model 4.)
Neat. Now our result changes a tiny bit. Instead of the best supported model being the no common structure model, our model that is best supported by the data, taking into account model complexity, is model 3c – the 'different rates (trait 2 only), different correlation model.' (Although, to be fair, the ΔAIC between the two best-fitting models is pretty small. Still, since our preferred model is the simpler model also, I think the principle of parsimony would tell us to go with it!)
I've always found correlations a bit easier to interpret than covariances. Let's pull out our evolutionary covariance matrices (sometimes called Brownian 'rate matrices') and compute evolutionary correlations between the two traits for each one.
This can be done using the R base function cov2cor
as follows:
cov2cor(fits.all$model3c$R$non)
## [,1] [,2]
## [1,] 1.0000000000 0.0000000001
## [2,] 0.0000000001 1.0000000000
cov2cor(fits.all$model3c$R$pisc)
## [,1] [,2]
## [1,] 1.00000 0.85224
## [2,] 0.85224 1.00000
Neat. In the best fitting model we go from basically an effective correlation of 0 on the blue branches, to 0.85 on the red branches of the phylogeny.
Just for fun, let's use simulation to generate data in which the sign of the correlation flips completely depending on the regime, the rate of trait evolution for trait 2 changes, but the rate for trait 1 stays the same – and then see what we get.
For simulation, I will use the sunfish tree, and the following matrices:
vcv<-setNames(list(
matrix(c(2,1.13,1.13,1),2,2),
matrix(c(2,-1.5,-1.5,2),2,2)),
c("non","pisc"))
vcv
## $non
## [,1] [,2]
## [1,] 2.00 1.13
## [2,] 1.13 1.00
##
## $pisc
## [,1] [,2]
## [1,] 2.0 -1.5
## [2,] -1.5 2.0
sim.data<-sim.corrs(sunfish.tree,vcv)
sim.data
## [,1] [,2]
## Acantharchus_pomotis -0.371684795 0.40841302
## Lepomis_gibbosus -0.042544962 -0.40828366
## Lepomis_microlophus 0.240558729 -0.21920005
## Lepomis_punctatus 0.034024263 -0.46007965
## Lepomis_miniatus 0.254541485 -0.40712468
## Lepomis_auritus -0.156165764 -0.75086767
## Lepomis_marginatus -0.007931413 -0.65704540
## Lepomis_megalotis -0.074483229 -0.51354826
## Lepomis_humilis 0.254935589 -0.37641573
## Lepomis_macrochirus 0.447610067 -0.29662989
## Lepomis_gulosus -0.014953560 0.02903732
## Lepomis_symmetricus 0.210644771 -0.35758651
## Lepomis_cyanellus 0.262097771 -0.21522591
## Micropterus_coosae 0.364080538 -0.37286095
## Micropterus_notius 0.430283637 -0.23973592
## Micropterus_treculi 0.299777050 -0.41508006
## Micropterus_salmoides 0.165997665 -0.28206826
## Micropterus_floridanus 0.269297465 -0.29304472
## Micropterus_punctulatus 0.618828045 -0.76414165
## Micropterus_dolomieu 0.734179458 -0.69705338
## Centrarchus_macropterus -0.700929526 -0.49840916
## Enneacantus_obesus 0.779775864 0.27662381
## Pomoxis_annularis -0.131491324 0.28093372
## Pomoxis_nigromaculatus 0.576939076 -0.43978368
## Archolites_interruptus 0.554942656 -0.06412124
## Ambloplites_ariommus 0.774795549 -0.77056186
## Ambloplites_rupestris 0.895776231 -1.11081569
## Ambloplites_cavifrons 0.413002387 -0.58559226
phylomorphospace(sunfish.tree,sim.data,colors=cols,
ftype="off",bty="n",lwd=3,node.size=c(0,1.5),
node.by.map=TRUE,xlab="simulated trait 1",
ylab="simulated trait 2")
legend("topleft",c("non-piscivorous","piscivorous"),
pch=21,pt.cex=2,pt.bg=cols,bty="n")
title(main="Phylomorphospace plot with simulated data",
font.main=3)
OK, now let's fit our models – first with the original four models:
fits.4models<-evolvcv.lite(sunfish.tree,sim.data)
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 4: no common structure...
fits.4models
## Model 1: common rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## fitted 1.158 -0.2459 1.1344 5 -4.4665 18.9331
##
## (R thinks it has found the ML solution for model 1.)
##
## Model 2: different rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.3855 -0.1973 0.8032 7 -3.7345 21.469
## pisc 0.9442 -0.2177 1.4344
##
## (R thinks it has found the ML solution for model 2.)
##
## Model 3: common rates, different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.1514 0.7523 1.1368 6 2.022 7.956
## pisc 1.1514 -0.7947 1.1368
##
## (R thinks it has found the ML solution for model 3.)
##
## Model 4: no common structure
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.0575 0.4917 0.5516 8 4.8213 6.3574
## pisc 1.2807 -1.2334 1.9461
##
## (R thinks it has found the ML solution for model 4.)
This tells us “Model 4: no common structure” is the one best supported by our data. Let's try all eight models, including the new four, instead:
fits.allmodels<-evolvcv.lite(sunfish.tree,sim.data,
models="all models")
## Fitting model 1: common rates, common correlation...
## Fitting model 2: different rates, common correlation...
## Fitting model 2b: different rates (trait 1), common correlation...
## Fitting model 2c: different rates (trait 2), common correlation...
## Fitting model 3: common rates, different correlation...
## Fitting model 3b: different rates (trait 1), different correlation...
## Fitting model 3c: different rates (trait 2), different correlation...
## Error in solve.default(V) :
## system is computationally singular: reciprocal condition number = 2.31326e-21
## Fitting model 4: no common structure...
One of the neat things about this optimization now is that it survived what could have been a fatal error (system is
computationally singular...
) and continued on.
Let's look at the result.
fits.allmodels
## Model 1: common rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## fitted 1.158 -0.2459 1.1344 5 -4.4665 18.9331
##
## (R thinks it has found the ML solution for model 1.)
##
## Model 2: different rates, common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.3855 -0.1973 0.8032 7 -3.7345 21.469
## pisc 0.9442 -0.2177 1.4344
##
## (R thinks it has found the ML solution for model 2.)
##
## Model 2b: different rates (trait 1), common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.2387 0 1.1344 6 -5.0916 22.1832
## pisc 1.0653 0 1.1344
##
## (R thinks it has found the ML solution for model 2b.)
##
## Model 2c: different rates (trait 2), common correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.158 0 0.6823 6 -4.051 20.1021
## pisc 1.158 0 1.6604
##
## (R thinks it has found the ML solution for model 2c.)
##
## Model 3: common rates, different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.1514 0.7523 1.1368 6 2.022 7.956
## pisc 1.1514 -0.7947 1.1368
##
## (R thinks it has found the ML solution for model 3.)
##
## Model 3b: different rates (trait 1), different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.3762 0.8837 1.1436 7 2.1995 9.6011
## pisc 1.0059 -0.7425 1.1436
##
## (R thinks it has found the ML solution for model 3b.)
##
## Model 3c: different rates (trait 2), different correlation
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.153 0.5326 0.5717 7 4.7579 4.4842
## pisc 1.153 -1.1203 1.8564
##
## (R thinks it has found the ML solution for model 3c.)
##
## Model 4: no common structure
## R[1,1] R[1,2] R[2,2] k log(L) AIC
## non 1.0574 0.4917 0.5516 8 4.8213 6.3574
## pisc 1.2807 -1.2333 1.946
##
## (R thinks it has found the ML solution for model 4.)
Now, compared to before, the picture changes. Model 3c, “different rates (trait 2), different correlation” is the best fitting. This is precisely the model we simulated under. Great!
Thanks for this, have you written any post on plotting admixture results on pie charts on a map?
ReplyDelete