Saturday, March 18, 2017

Simplified version of MuMIn::dredge for pgls.SEy - but not an endorsement of data dredging!

An undergraduate in Daniel Cadena's lab here at Uniandes that I have been helping with her senior thesis project recently pointed me in the direction of the package MuMIn which includes in its arsenal the ominously named function dredge, which automates the fitting of all possible models for a set of dependent variables. Unfortunately, though this function works with a variety of different fitted model types, it does not work with the function PGLS.SEy (1, 2), for PGLS with uncertainty in y, which I also wrote in support of this project.

Please note that this post does not consist of an endorsement of any activity that could be classified as “data dredging” - and I warned the student of the dangers of such behaviors. However, I nonetheless assembled a highly simplified version of dredge that she would be able to apply to a model fit using pgls.SEy (or, in fact, any similarly structured model-fitting method in R, such as lm).

The function takes as input arguments y and x which are not data but the name & a vector of the names (respectively) of the dependent and independent variables of the model.

For instance:

tree ## tree
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##  t65, t66, t29, t30, t13, t78, ...
## 
## Rooted; includes branch lengths.
X ## our data
##               ye           x1          x2           x3           x4
## t65  -0.90772465 -0.933177258 -0.50289849 -0.833637779 -1.226811847
## t66  -1.17179800 -0.360779073 -0.67343194 -1.232873565 -1.558929304
## t29   4.15357374 -0.004324650 -0.86464732 -1.522271412  0.092652937
## t30   3.55198867 -0.283674892 -0.05950573 -1.158279742  0.783271000
## t13  -1.33541082 -0.260222231  1.69823520 -0.958306299  0.927545322
## t78  -0.56207919 -0.030577008 -0.12238217 -0.581989880 -0.789061617
## t79  -1.70823856 -0.242023433  0.32537283 -0.744757917 -0.830916154
## t5    3.54366591 -0.351473922 -0.86081240 -1.800558381 -0.146559729
## t26  -3.34799011 -0.417929491 -0.87860040 -0.964327081 -1.780002063
## t67  -3.58452023 -0.494481915 -1.04212209 -1.041371912 -1.810841912
## t68  -3.32209074 -0.588127345 -0.71373910 -0.700753022 -1.315598157
## t86  -2.54198007 -0.045597241 -1.28701218 -0.870836065 -1.752840010
## t87  -2.39865344 -0.284624727 -1.19177925 -0.413622945 -1.571490363
## t25  -1.57609619 -0.099326903 -1.26492190 -0.185306209 -1.449849790
## t16  -5.12139002 -0.329391279 -0.69189543 -0.731627112 -2.294395131
## t47  -5.88290498 -0.998675854 -0.60213264 -1.864536626 -2.411244328
## t48  -5.58564307 -1.135659972 -0.90384083 -2.177917526 -2.317161349
## t10   5.93972192 -0.170786099 -0.84404497  0.258286333  0.254189627
## t20   0.11366308 -0.116176393 -1.43439903 -1.008153379 -0.756823697
## t63   3.94588996 -0.271788410 -0.48803237 -0.940323863  0.501302246
## t64   3.21911945 -0.122474607 -0.87832126 -0.254448825  0.274372750
## t53   4.74552509 -0.188851979 -0.48504631 -0.201398930  0.257906423
## t21   0.75183425 -0.666761834 -1.27216994 -0.963997566 -0.511842190
## t23   5.96488228 -0.263860837 -1.16042313 -1.542287147  0.514510167
## t49   4.17655995 -0.143231501 -1.19002853 -1.450748764  0.507057795
## t50   4.80439077  0.258295208 -0.60080140 -1.075878405  0.781512173
## t80   2.32980803 -0.326915114  0.10754248 -0.923439256  0.643239607
## t81   4.08965563 -0.507894799  0.26522058 -0.469615756  1.120269519
## t19  -1.48944997 -0.811681993 -0.26318414 -2.777328413 -0.603814640
## t84  -2.91205159 -0.737986864 -1.02763382 -1.957564180 -1.353937182
## t85  -2.06165298 -0.712068874 -0.77728970 -2.245292884 -1.401371778
## t40  -3.02205037 -0.933452671 -0.83696879 -2.547047370 -1.476694576
## t41  -4.87730189 -0.229120475 -1.49848945 -2.765038158 -2.032890033
## t18  -0.52165070 -0.204012732 -1.64088635 -2.344461535 -1.124866746
## t76  -1.65255423 -0.117917709 -0.77909630 -2.544663932 -1.095495017
## t77   0.07018189  0.287088622 -0.72342580 -2.680871209 -0.986772078
## t27   0.92637845 -0.358620911 -1.71171019 -1.988283823 -0.648392578
## t92   3.12710215 -0.337589308 -1.45006236 -2.639934958  0.017958145
## t93   3.65860534 -0.312388755 -1.60509872 -2.595797518  0.158134389
## t8   -2.36028047 -0.676186538 -0.41457179 -1.775466733 -0.527343477
## t2    5.66245601 -1.382336906 -0.60476399 -2.774459210  0.987429394
## t95   2.16726940  0.071824226 -0.04136211 -0.978944992  0.702904379
## t96   5.23800963 -0.000369423  0.04677993 -1.153812002  0.845994491
## t6   -3.08927960 -1.228130839  1.24440860 -1.666626516 -0.183465598
## t44  -0.32675842 -0.778840190 -0.10432506 -0.930466508  0.132912688
## t51   0.71141037 -1.529691824 -0.63523463 -1.351234148 -0.203513384
## t52  -2.97932839 -0.671563524  0.25110920 -0.940436026 -0.144505347
## t72  -0.56710045 -0.208175485  0.21888791 -0.683621798 -0.439724856
## t73   3.06125726 -0.280504466 -0.05425123 -0.977418875 -0.399349006
## t35   0.88640748 -0.432133697 -0.60337432 -1.217613428 -0.161792920
## t38   2.22799237 -1.599070451 -1.12646600 -0.825638529  0.019129341
## t39   0.80698662 -1.019397388 -0.14879313 -0.727519051  0.100882723
## t99  -0.56690640  0.819938089  0.10457809  0.343316393 -0.571194773
## t100  0.31251389  0.797765277  0.09518138  0.337884832 -0.567458375
## t24  -5.01149282  0.132350015  1.25951182 -0.438835754 -0.889606534
## t11  -1.46762699 -0.338010923  0.63895729 -0.717547221  0.274369423
## t36  -0.79179712 -0.109222940 -0.84517257 -0.001648664 -1.305286979
## t37   0.67623873 -0.783798640  0.25056370 -0.099499147  0.026504819
## t28  -3.14350681 -0.624356825 -0.18617947 -1.046835231 -1.207608991
## t7    1.79371239 -0.995914763 -0.82088749  0.114602663  0.108922281
## t12   3.06529461 -0.430696448 -0.37673980 -0.904791446  0.664551613
## t33   1.01319467 -0.459191936  0.14792529 -0.647299464  0.429768234
## t34  -1.83383112 -1.369205857  0.15734648 -1.133452917 -0.185282740
## t15  -5.28474388 -0.668597155 -0.19397849 -1.378048152 -1.394404802
## t42   0.91464728 -1.006046065 -0.59102696 -2.544077955 -0.510802841
## t43   0.17074896 -0.657881494 -0.76035426 -1.710494052 -0.432355897
## t22  -0.93089794 -0.549687736 -0.49300322 -2.316250811 -0.798517510
## t1    1.37181061 -0.735847227 -0.10188643 -1.482318015 -0.069082305
## t3    5.83721322 -1.255254006 -1.36243767 -0.910301622  0.679504522
## t4    2.77524322  0.036571658  0.12260330 -1.806011253  0.233933022
## t74   2.32812021  1.178218512 -0.76133067 -0.509128250 -0.997411930
## t75   2.74291448  1.064090251 -0.95706527 -0.457299846 -0.867581954
## t31  -2.28596276 -0.435236571  0.54331428 -0.959954296 -0.098098866
## t59  -4.96441837 -0.567740108  0.91285849 -0.720905297 -0.427382552
## t60  -5.65604410 -0.516435916  0.64042685 -1.284172323 -0.730787277
## t9    7.14934917  1.375046920 -1.24425619  0.495073740  0.695899208
## t90   7.20415861  0.878034624 -2.01772447  0.450435820  0.664090248
## t91   5.67348238  0.935005458 -2.09172115  0.296478696  0.295762185
## t56   0.67333579 -0.517446554  0.30596671 -0.270272009  0.447548743
## t57  -1.04853187 -0.116159889  0.05991413  0.112933000  0.284765732
## t14  -3.77856865 -0.906819465  0.46607917 -0.328972208 -0.508998125
## t61  -3.93022454 -1.312009985  1.32878564 -0.562407187  0.007788355
## t62  -5.05415778 -1.612880369  1.08221968 -0.620768376 -0.315525473
## t69  -2.95408172 -0.242584724  0.76102366 -0.318980789 -0.494282321
## t70  -4.90426049 -0.396631201  1.07417172 -0.504832758 -0.631509904
## t88  -0.79912057  0.237257899  0.41246936 -0.726583288 -0.117175979
## t89   1.19199363  0.434875538  0.30737619 -0.326529423 -0.204683678
## t54  -2.81800403 -0.569749931  1.07419352  0.071561418 -0.288446193
## t55  -1.19879402 -0.152194258  1.08008287 -0.016827783  0.387440401
## t32  -1.75551359  0.602680886  0.75156851 -0.389637248 -0.359414984
## t45  -1.11541051 -1.438964529  2.01395563  0.804516332  0.786960566
## t46  -0.24158566 -1.404376125  1.86329858  0.837682607  1.315479613
## t17   3.18356839 -0.217652443  0.32900077  1.144934996  1.249666779
## t58  -5.15046378 -1.498640558  1.17233451  0.384922560 -0.425217811
## t82  -3.68968208 -1.348265012  1.08151593  0.523997757  0.152053682
## t83  -4.32950798 -1.381394718  1.22167672  0.207916476 -0.146115788
## t71  -4.28308869 -1.170708866  1.02083131  0.157810886 -0.164189066
## t94  -4.87114380 -1.200085947  1.03456095  0.172964350 -0.358510350
## t97  -5.38291471 -1.213089703  1.18276260  0.103353295 -0.596856961
## t98  -5.25643062 -1.402582088  1.24727944  0.018381114 -0.525440524
SE ## our standard errors in y
##       t65       t66       t29       t30       t13       t78       t79 
## 0.2838481 1.1192735 0.5848418 1.5342279 1.4452484 1.9503550 0.8550472 
##        t5       t26       t67       t68       t86       t87       t25 
## 0.4786313 0.2227377 0.4710501 0.4939850 0.5502124 0.8233990 0.3534902 
##       t16       t47       t48       t10       t20       t63       t64 
## 0.4161823 0.7945718 0.7589094 1.7952263 1.0058047 1.1459235 0.3266516 
##       t53       t21       t23       t49       t50       t80       t81 
## 1.2841140 0.6995356 1.2588015 0.4612205 0.1606824 1.6941166 1.0015120 
##       t19       t84       t85       t40       t41       t18       t76 
## 1.0403561 0.3647199 1.2730340 0.9148257 1.3012779 0.2001514 0.7520138 
##       t77       t27       t92       t93        t8        t2       t95 
## 1.2245456 0.1813785 0.5115684 1.1474370 0.9674241 1.6606047 0.8268985 
##       t96        t6       t44       t51       t52       t72       t73 
## 1.8817409 1.4248163 0.8072912 0.1955722 1.8251670 0.4031782 1.3043864 
##       t35       t38       t39       t99      t100       t24       t11 
## 1.2561020 0.8432288 0.7994611 0.8215323 1.5471417 0.9000843 0.5142108 
##       t36       t37       t28        t7       t12       t33       t34 
## 0.7591903 0.5203935 0.1893714 1.0115815 0.9492312 0.5408957 0.5958648 
##       t15       t42       t43       t22        t1        t3        t4 
## 1.2035645 0.9897115 0.8888153 2.5470790 0.3322280 0.8961755 0.5924993 
##       t74       t75       t31       t59       t60        t9       t90 
## 0.7070748 0.5629469 0.5976459 0.5577206 1.4663451 1.4102296 0.9487273 
##       t91       t56       t57       t14       t61       t62       t69 
## 0.6127336 0.9353615 1.1832985 0.9536276 0.5271702 1.1272450 1.2397424 
##       t70       t88       t89       t54       t55       t32       t45 
## 0.9483944 0.4326344 0.9381514 0.7614915 0.8734860 0.3894159 0.6516323 
##       t46       t17       t58       t82       t83       t71       t94 
## 0.7583656 1.5420088 0.2704494 1.0767461 0.1762237 1.0317629 1.1824717 
##       t97       t98 
## 1.4317323 0.9749933
## our variable names
y<-"ye"
x<-c("x1","x2","x3","x4")
## the function
DREDGE<-function(y,x,fn=pgls.SEy,...){
    n<-length(x)
    ii<-list()
    for(i in 1:n) ii<-c(ii,combn(x,i,simplify=FALSE))
    fits<-list()
    fits[[1]]<-fn(as.formula(paste(y,"~1")),...)
    for(i in 1:length(ii)){
        model<-as.formula(paste(y,"~",paste(ii[[i]],collapse="+")))
        fits[[i+1]]<-fn(model,...)
    }
    fits
}

Now, let's run it. I will fit both a Brownian & Pagel's λ model for the structure of the residual error, and then combine both lists of fitted models:

## Brownian error
fits.bm<-DREDGE(y,x,data=X,se=SE,tree=tree,method="ML")
## the first three of these (there are 16)
fits.bm[1:3]
## [[1]]
## Generalized least squares fit by maximum likelihood
##   Model: model 
##   Data: cbind(data, vf) 
##   Log-likelihood: -223.3477
## 
## Coefficients:
## (Intercept) 
##   0.1609445 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~vf 
## Degrees of freedom: 100 total; 99 residual
## Residual standard error: 0.9168844 
## 
## [[2]]
## Generalized least squares fit by maximum likelihood
##   Model: model 
##   Data: cbind(data, vf) 
##   Log-likelihood: -220.0688
## 
## Coefficients:
## (Intercept)          x1 
##   0.5595618   1.1912291 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~vf 
## Degrees of freedom: 100 total; 98 residual
## Residual standard error: 0.9339633 
## 
## [[3]]
## Generalized least squares fit by maximum likelihood
##   Model: model 
##   Data: cbind(data, vf) 
##   Log-likelihood: -213.2678
## 
## Coefficients:
## (Intercept)          x2 
## -0.07352865 -1.77071022 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~vf 
## Degrees of freedom: 100 total; 98 residual
## Residual standard error: 0.9507248
## lambda error
fits.lambda<-DREDGE(y,x,data=X,corClass=corPagel,tree=tree,method="ML")
fits<-c(fits.bm,fits.lambda)
## AIC values for each model
aic<-sapply(fits,AIC)
aic
##  [1] 450.6955 446.1376 432.5356 450.2586 394.5708 430.4351 447.0244
##  [8] 379.7736 428.2995 314.1018 396.3431 428.0877 291.3430 378.9618
## [15] 315.3299 293.1996 460.7735 456.1583 443.1340 460.3964 410.5457
## [22] 441.2863 457.0599 395.3290 439.3130 335.7291 412.4450 439.2609
## [29] 316.4621 395.0639 336.6753 318.3983

Finally, here is our best-fitting model based on AIC:

fits[aic==min(aic)][[1]]
## Generalized least squares fit by maximum likelihood
##   Model: model 
##   Data: cbind(data, vf) 
##   Log-likelihood: -140.6715
## 
## Coefficients:
## (Intercept)          x1          x2          x4 
##   0.6896117   0.9263428  -2.0679544   3.2012811 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~vf 
## Degrees of freedom: 100 total; 96 residual
## Residual standard error: 0.8682554

Not bad, because the fitted model very closely approximates the simulation conditions:

## simulate tree & data
set.seed(999)
tree<-pbtree(n=100,scale=1)
X<-fastBM(tree,nsim=4)
colnames(X)<-c("x1","x2","x3","x4")
beta<-c(1,1,-2,0,3)
y<-cbind(rep(1,Ntip(tree)),X)%*%beta+fastBM(tree)
v<-setNames(rexp(n=Ntip(tree)),tree$tip.label)
ye<-setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),
    rownames(y))
X<-as.data.frame(cbind(ye,X))
SE<-sqrt(v)

No comments:

Post a Comment