An undergraduate in Daniel Cadena's
lab here at Uniandes that I have been helping with her senior thesis project
recently pointed me in the direction of the package
MuMIn which includes
in its arsenal the ominously named function
dredge
,
which automates the fitting of all possible models for a set of dependent variables.
Unfortunately, though this function works with a variety of different fitted
model types, it does not work with the function PGLS.SEy
(1,
2),
for PGLS with uncertainty in y, which I also wrote in support of this project.
Please note that this post does not consist of an endorsement of
any activity that could be classified as “data dredging” - and I warned
the student of the dangers of such behaviors. However, I nonetheless
assembled a highly simplified version of dredge
that she would
be able to apply to a model fit using pgls.SEy
(or, in fact,
any similarly structured model-fitting method in R, such as lm
).
The function takes as input arguments y
and x
which are not data but the name & a vector of the names (respectively) of
the dependent and independent variables of the model.
For instance:
tree ## tree
##
## Phylogenetic tree with 100 tips and 99 internal nodes.
##
## Tip labels:
## t65, t66, t29, t30, t13, t78, ...
##
## Rooted; includes branch lengths.
X ## our data
## ye x1 x2 x3 x4
## t65 -0.90772465 -0.933177258 -0.50289849 -0.833637779 -1.226811847
## t66 -1.17179800 -0.360779073 -0.67343194 -1.232873565 -1.558929304
## t29 4.15357374 -0.004324650 -0.86464732 -1.522271412 0.092652937
## t30 3.55198867 -0.283674892 -0.05950573 -1.158279742 0.783271000
## t13 -1.33541082 -0.260222231 1.69823520 -0.958306299 0.927545322
## t78 -0.56207919 -0.030577008 -0.12238217 -0.581989880 -0.789061617
## t79 -1.70823856 -0.242023433 0.32537283 -0.744757917 -0.830916154
## t5 3.54366591 -0.351473922 -0.86081240 -1.800558381 -0.146559729
## t26 -3.34799011 -0.417929491 -0.87860040 -0.964327081 -1.780002063
## t67 -3.58452023 -0.494481915 -1.04212209 -1.041371912 -1.810841912
## t68 -3.32209074 -0.588127345 -0.71373910 -0.700753022 -1.315598157
## t86 -2.54198007 -0.045597241 -1.28701218 -0.870836065 -1.752840010
## t87 -2.39865344 -0.284624727 -1.19177925 -0.413622945 -1.571490363
## t25 -1.57609619 -0.099326903 -1.26492190 -0.185306209 -1.449849790
## t16 -5.12139002 -0.329391279 -0.69189543 -0.731627112 -2.294395131
## t47 -5.88290498 -0.998675854 -0.60213264 -1.864536626 -2.411244328
## t48 -5.58564307 -1.135659972 -0.90384083 -2.177917526 -2.317161349
## t10 5.93972192 -0.170786099 -0.84404497 0.258286333 0.254189627
## t20 0.11366308 -0.116176393 -1.43439903 -1.008153379 -0.756823697
## t63 3.94588996 -0.271788410 -0.48803237 -0.940323863 0.501302246
## t64 3.21911945 -0.122474607 -0.87832126 -0.254448825 0.274372750
## t53 4.74552509 -0.188851979 -0.48504631 -0.201398930 0.257906423
## t21 0.75183425 -0.666761834 -1.27216994 -0.963997566 -0.511842190
## t23 5.96488228 -0.263860837 -1.16042313 -1.542287147 0.514510167
## t49 4.17655995 -0.143231501 -1.19002853 -1.450748764 0.507057795
## t50 4.80439077 0.258295208 -0.60080140 -1.075878405 0.781512173
## t80 2.32980803 -0.326915114 0.10754248 -0.923439256 0.643239607
## t81 4.08965563 -0.507894799 0.26522058 -0.469615756 1.120269519
## t19 -1.48944997 -0.811681993 -0.26318414 -2.777328413 -0.603814640
## t84 -2.91205159 -0.737986864 -1.02763382 -1.957564180 -1.353937182
## t85 -2.06165298 -0.712068874 -0.77728970 -2.245292884 -1.401371778
## t40 -3.02205037 -0.933452671 -0.83696879 -2.547047370 -1.476694576
## t41 -4.87730189 -0.229120475 -1.49848945 -2.765038158 -2.032890033
## t18 -0.52165070 -0.204012732 -1.64088635 -2.344461535 -1.124866746
## t76 -1.65255423 -0.117917709 -0.77909630 -2.544663932 -1.095495017
## t77 0.07018189 0.287088622 -0.72342580 -2.680871209 -0.986772078
## t27 0.92637845 -0.358620911 -1.71171019 -1.988283823 -0.648392578
## t92 3.12710215 -0.337589308 -1.45006236 -2.639934958 0.017958145
## t93 3.65860534 -0.312388755 -1.60509872 -2.595797518 0.158134389
## t8 -2.36028047 -0.676186538 -0.41457179 -1.775466733 -0.527343477
## t2 5.66245601 -1.382336906 -0.60476399 -2.774459210 0.987429394
## t95 2.16726940 0.071824226 -0.04136211 -0.978944992 0.702904379
## t96 5.23800963 -0.000369423 0.04677993 -1.153812002 0.845994491
## t6 -3.08927960 -1.228130839 1.24440860 -1.666626516 -0.183465598
## t44 -0.32675842 -0.778840190 -0.10432506 -0.930466508 0.132912688
## t51 0.71141037 -1.529691824 -0.63523463 -1.351234148 -0.203513384
## t52 -2.97932839 -0.671563524 0.25110920 -0.940436026 -0.144505347
## t72 -0.56710045 -0.208175485 0.21888791 -0.683621798 -0.439724856
## t73 3.06125726 -0.280504466 -0.05425123 -0.977418875 -0.399349006
## t35 0.88640748 -0.432133697 -0.60337432 -1.217613428 -0.161792920
## t38 2.22799237 -1.599070451 -1.12646600 -0.825638529 0.019129341
## t39 0.80698662 -1.019397388 -0.14879313 -0.727519051 0.100882723
## t99 -0.56690640 0.819938089 0.10457809 0.343316393 -0.571194773
## t100 0.31251389 0.797765277 0.09518138 0.337884832 -0.567458375
## t24 -5.01149282 0.132350015 1.25951182 -0.438835754 -0.889606534
## t11 -1.46762699 -0.338010923 0.63895729 -0.717547221 0.274369423
## t36 -0.79179712 -0.109222940 -0.84517257 -0.001648664 -1.305286979
## t37 0.67623873 -0.783798640 0.25056370 -0.099499147 0.026504819
## t28 -3.14350681 -0.624356825 -0.18617947 -1.046835231 -1.207608991
## t7 1.79371239 -0.995914763 -0.82088749 0.114602663 0.108922281
## t12 3.06529461 -0.430696448 -0.37673980 -0.904791446 0.664551613
## t33 1.01319467 -0.459191936 0.14792529 -0.647299464 0.429768234
## t34 -1.83383112 -1.369205857 0.15734648 -1.133452917 -0.185282740
## t15 -5.28474388 -0.668597155 -0.19397849 -1.378048152 -1.394404802
## t42 0.91464728 -1.006046065 -0.59102696 -2.544077955 -0.510802841
## t43 0.17074896 -0.657881494 -0.76035426 -1.710494052 -0.432355897
## t22 -0.93089794 -0.549687736 -0.49300322 -2.316250811 -0.798517510
## t1 1.37181061 -0.735847227 -0.10188643 -1.482318015 -0.069082305
## t3 5.83721322 -1.255254006 -1.36243767 -0.910301622 0.679504522
## t4 2.77524322 0.036571658 0.12260330 -1.806011253 0.233933022
## t74 2.32812021 1.178218512 -0.76133067 -0.509128250 -0.997411930
## t75 2.74291448 1.064090251 -0.95706527 -0.457299846 -0.867581954
## t31 -2.28596276 -0.435236571 0.54331428 -0.959954296 -0.098098866
## t59 -4.96441837 -0.567740108 0.91285849 -0.720905297 -0.427382552
## t60 -5.65604410 -0.516435916 0.64042685 -1.284172323 -0.730787277
## t9 7.14934917 1.375046920 -1.24425619 0.495073740 0.695899208
## t90 7.20415861 0.878034624 -2.01772447 0.450435820 0.664090248
## t91 5.67348238 0.935005458 -2.09172115 0.296478696 0.295762185
## t56 0.67333579 -0.517446554 0.30596671 -0.270272009 0.447548743
## t57 -1.04853187 -0.116159889 0.05991413 0.112933000 0.284765732
## t14 -3.77856865 -0.906819465 0.46607917 -0.328972208 -0.508998125
## t61 -3.93022454 -1.312009985 1.32878564 -0.562407187 0.007788355
## t62 -5.05415778 -1.612880369 1.08221968 -0.620768376 -0.315525473
## t69 -2.95408172 -0.242584724 0.76102366 -0.318980789 -0.494282321
## t70 -4.90426049 -0.396631201 1.07417172 -0.504832758 -0.631509904
## t88 -0.79912057 0.237257899 0.41246936 -0.726583288 -0.117175979
## t89 1.19199363 0.434875538 0.30737619 -0.326529423 -0.204683678
## t54 -2.81800403 -0.569749931 1.07419352 0.071561418 -0.288446193
## t55 -1.19879402 -0.152194258 1.08008287 -0.016827783 0.387440401
## t32 -1.75551359 0.602680886 0.75156851 -0.389637248 -0.359414984
## t45 -1.11541051 -1.438964529 2.01395563 0.804516332 0.786960566
## t46 -0.24158566 -1.404376125 1.86329858 0.837682607 1.315479613
## t17 3.18356839 -0.217652443 0.32900077 1.144934996 1.249666779
## t58 -5.15046378 -1.498640558 1.17233451 0.384922560 -0.425217811
## t82 -3.68968208 -1.348265012 1.08151593 0.523997757 0.152053682
## t83 -4.32950798 -1.381394718 1.22167672 0.207916476 -0.146115788
## t71 -4.28308869 -1.170708866 1.02083131 0.157810886 -0.164189066
## t94 -4.87114380 -1.200085947 1.03456095 0.172964350 -0.358510350
## t97 -5.38291471 -1.213089703 1.18276260 0.103353295 -0.596856961
## t98 -5.25643062 -1.402582088 1.24727944 0.018381114 -0.525440524
SE ## our standard errors in y
## t65 t66 t29 t30 t13 t78 t79
## 0.2838481 1.1192735 0.5848418 1.5342279 1.4452484 1.9503550 0.8550472
## t5 t26 t67 t68 t86 t87 t25
## 0.4786313 0.2227377 0.4710501 0.4939850 0.5502124 0.8233990 0.3534902
## t16 t47 t48 t10 t20 t63 t64
## 0.4161823 0.7945718 0.7589094 1.7952263 1.0058047 1.1459235 0.3266516
## t53 t21 t23 t49 t50 t80 t81
## 1.2841140 0.6995356 1.2588015 0.4612205 0.1606824 1.6941166 1.0015120
## t19 t84 t85 t40 t41 t18 t76
## 1.0403561 0.3647199 1.2730340 0.9148257 1.3012779 0.2001514 0.7520138
## t77 t27 t92 t93 t8 t2 t95
## 1.2245456 0.1813785 0.5115684 1.1474370 0.9674241 1.6606047 0.8268985
## t96 t6 t44 t51 t52 t72 t73
## 1.8817409 1.4248163 0.8072912 0.1955722 1.8251670 0.4031782 1.3043864
## t35 t38 t39 t99 t100 t24 t11
## 1.2561020 0.8432288 0.7994611 0.8215323 1.5471417 0.9000843 0.5142108
## t36 t37 t28 t7 t12 t33 t34
## 0.7591903 0.5203935 0.1893714 1.0115815 0.9492312 0.5408957 0.5958648
## t15 t42 t43 t22 t1 t3 t4
## 1.2035645 0.9897115 0.8888153 2.5470790 0.3322280 0.8961755 0.5924993
## t74 t75 t31 t59 t60 t9 t90
## 0.7070748 0.5629469 0.5976459 0.5577206 1.4663451 1.4102296 0.9487273
## t91 t56 t57 t14 t61 t62 t69
## 0.6127336 0.9353615 1.1832985 0.9536276 0.5271702 1.1272450 1.2397424
## t70 t88 t89 t54 t55 t32 t45
## 0.9483944 0.4326344 0.9381514 0.7614915 0.8734860 0.3894159 0.6516323
## t46 t17 t58 t82 t83 t71 t94
## 0.7583656 1.5420088 0.2704494 1.0767461 0.1762237 1.0317629 1.1824717
## t97 t98
## 1.4317323 0.9749933
## our variable names
y<-"ye"
x<-c("x1","x2","x3","x4")
## the function
DREDGE<-function(y,x,fn=pgls.SEy,...){
n<-length(x)
ii<-list()
for(i in 1:n) ii<-c(ii,combn(x,i,simplify=FALSE))
fits<-list()
fits[[1]]<-fn(as.formula(paste(y,"~1")),...)
for(i in 1:length(ii)){
model<-as.formula(paste(y,"~",paste(ii[[i]],collapse="+")))
fits[[i+1]]<-fn(model,...)
}
fits
}
Now, let's run it. I will fit both a Brownian & Pagel's λ model for the structure of the residual error, and then combine both lists of fitted models:
## Brownian error
fits.bm<-DREDGE(y,x,data=X,se=SE,tree=tree,method="ML")
## the first three of these (there are 16)
fits.bm[1:3]
## [[1]]
## Generalized least squares fit by maximum likelihood
## Model: model
## Data: cbind(data, vf)
## Log-likelihood: -223.3477
##
## Coefficients:
## (Intercept)
## 0.1609445
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~vf
## Degrees of freedom: 100 total; 99 residual
## Residual standard error: 0.9168844
##
## [[2]]
## Generalized least squares fit by maximum likelihood
## Model: model
## Data: cbind(data, vf)
## Log-likelihood: -220.0688
##
## Coefficients:
## (Intercept) x1
## 0.5595618 1.1912291
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~vf
## Degrees of freedom: 100 total; 98 residual
## Residual standard error: 0.9339633
##
## [[3]]
## Generalized least squares fit by maximum likelihood
## Model: model
## Data: cbind(data, vf)
## Log-likelihood: -213.2678
##
## Coefficients:
## (Intercept) x2
## -0.07352865 -1.77071022
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~vf
## Degrees of freedom: 100 total; 98 residual
## Residual standard error: 0.9507248
## lambda error
fits.lambda<-DREDGE(y,x,data=X,corClass=corPagel,tree=tree,method="ML")
fits<-c(fits.bm,fits.lambda)
## AIC values for each model
aic<-sapply(fits,AIC)
aic
## [1] 450.6955 446.1376 432.5356 450.2586 394.5708 430.4351 447.0244
## [8] 379.7736 428.2995 314.1018 396.3431 428.0877 291.3430 378.9618
## [15] 315.3299 293.1996 460.7735 456.1583 443.1340 460.3964 410.5457
## [22] 441.2863 457.0599 395.3290 439.3130 335.7291 412.4450 439.2609
## [29] 316.4621 395.0639 336.6753 318.3983
Finally, here is our best-fitting model based on AIC:
fits[aic==min(aic)][[1]]
## Generalized least squares fit by maximum likelihood
## Model: model
## Data: cbind(data, vf)
## Log-likelihood: -140.6715
##
## Coefficients:
## (Intercept) x1 x2 x4
## 0.6896117 0.9263428 -2.0679544 3.2012811
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~vf
## Degrees of freedom: 100 total; 96 residual
## Residual standard error: 0.8682554
Not bad, because the fitted model very closely approximates the simulation conditions:
## simulate tree & data
set.seed(999)
tree<-pbtree(n=100,scale=1)
X<-fastBM(tree,nsim=4)
colnames(X)<-c("x1","x2","x3","x4")
beta<-c(1,1,-2,0,3)
y<-cbind(rep(1,Ntip(tree)),X)%*%beta+fastBM(tree)
v<-setNames(rexp(n=Ntip(tree)),tree$tip.label)
ye<-setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),
rownames(y))
X<-as.data.frame(cbind(ye,X))
SE<-sqrt(v)
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.