Friday, August 19, 2016

Simple function to compute a 'pooled variance'

One must exist, but for some reason I was not able to find a function to compute the pooled variance, which is merely the weighted variance in which the weights are proportional to Ni - 1 for each ith population sample size.

Here is a simple function to do this in R, in which y is a numerical vector & x is a factor indicating population membership:

vars<-function(x) mean(x^2)-mean(x)^2

pooled.var<-function(y,x){
    k<-length(levels(x))
    N<-summary(x)
    sum(sapply(levels(x),function(l,y,x) vars(y[x==l]),y=y,x=x)*
        (N-1))/(sum(N)-k)
}

For instance:

y
##   [1]  1.66643347  3.58241307  3.40930188 -0.96244911  3.01537895
##   [6]  3.22039153  2.04945935  1.79276033  0.98006731  0.98435105
##  [11]  2.21611933  0.99441260  4.76126946  3.67653758  2.05488395
##  [16]  3.52969225  1.39656560  2.59958819  2.58375755  3.71407970
##  [21]  1.78405433  2.14309964  3.13001545  2.08055789  2.24084935
##  [26]  2.58348915  1.64146883  1.02864576  1.92799065  2.43943986
##  [31]  1.15720010  2.13305540  0.01381969  0.74266785  1.81005551
##  [36]  2.11757020  2.89246446  1.43198968  0.22427963  2.06503921
##  [41]  2.25585299  1.40974141 -0.15913367  2.71146586  2.61244346
##  [46]  4.75615711  2.00224151  1.76408595  3.84962963  0.76250972
##  [51]  2.10770717 -0.74763840  2.68343365  2.91244893  2.77712141
##  [56]  2.22598145  2.39142779  3.97542995  4.92703017  2.59852609
##  [61]  4.46669762  2.68039345  0.50455457  3.51400883  1.19135850
##  [66]  0.91997801  4.12391415  2.12656807  1.29092722  1.89407708
##  [71]  3.10823065  0.73155236  1.72465298 -1.13770643  3.27578802
##  [76]  0.14202787  2.15587529  3.09957313  2.14737089  2.21819412
##  [81]  1.01937433  0.61840772  3.06101451  3.09378180  0.67329345
##  [86]  1.14656171  3.59971781  3.27936076  2.53420574  2.78287851
##  [91]  2.00604970  3.13107129  1.31622894  4.38502427  0.54058999
##  [96]  2.79645069  2.61484882  4.33965617  3.42823455  1.88759749
x
##   [1] a c c a a b b c b b b b c c c c c c c b a b c c a a b a a b b b b a c
##  [36] b a a a a b a a c c c b b b a b a c b c c c c c c c b b c b a c b a b
##  [71] c b c a b b b b c c a a b c a c c b b c b b b c a b b c b c
## Levels: a b c
pooled.var(y,x)
## [1] 1.011426

That's it.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.