[R] What BIC is calculated by 'regsubsets'?

From: Paul Murtaugh <murtaugh_at_stat.oregonstate.edu>
Date: Fri, 19 Dec 2008 11:39:17 -0800

The function 'regsubsets' appears to calculate a BIC value that is different from that returned by the function 'BIC'. The latter is explained in the documentation, but I can't find an expression for the statistic returned by 'regsubsets'.

Incidentally, both of these differ from the BIC that is given in Ramsey and Schafer's, The Statistical Sleuth. I assume these are all linear transformations of each other, but I'd like to know the 'regsubsets' formula (so that I can develop a way to do all-subsets selection based on the AIC rather than the BIC).

The following code defines a function that illustrates the issue. Thanks

script.ic <- function() {

print(names(airquality)) # Ozone Solar.R Wind Temp Month Day

# Fit a model with two predictors

 mod1 <- lm(Ozone ~ Wind + Temp, data=airquality)

 npar <- length(mod1$coef)+1         # no. parameters in fitted model,
                                     #   including s2, is 4
 nobs <- length(mod1$fitted)         # no. of observations = 116
 s2 <- summary(mod1)$sigma2         # MSE = 477.6371
 logL <- as.vector(logLik(mod1))     # log likelihood = -520.8705

# Use the R function BIC, defined as: -2*log-likelihood + npar*log(nobs)
tmp1 <- BIC(mod1) # 1060.755 tmp2 <- -2 * (-520.8705) + 4 * log(116) # 1060.755, agrees
 cat(paste("\nFrom R's BIC:",signif(tmp1,5),"(",signif(tmp2,5),

    "obtained 'by hand')\n\n"))

# Now see how 'regsubsets' calculates the BIC
 tmp3 <- regsubsets(Ozone ~ Solar.R + Wind + Temp, data=airquality)  tmp3.s <- summary(tmp3)

# 'mod1' is the second model in 'tmp3'; what is the formula for this BIC?
 cat("\nThe corresponding model from 'regsubsets':\n")  print(tmp3.s$which[2,])

 tmp4 <- tmp3.s$bic[2]                       # -82.52875
 cat(paste("\nBIC =",signif(tmp4,5),"\n"))

# Incidentally, the 'rsq' and 'rss' components of tmp3.s do not agree
# with the values in the 'mod1' lm output object.

# Just for kicks, try the formula for Schwarz's BIC from Ramsey & Schafer,
# Statistical Sleuth: nobs*log(MSE) + npar*log(nobs))
 tmp5 <- 116 * log(477.6371) + 4*log(116) # 734.6011  cat(paste("\nStat. Sleuth's BIC =",signif(tmp5,5),"\n"))  cat("\nUff da!\n")



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 20 Dec 2008 - 11:54:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 20 Dec 2008 - 12:30:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive