From: Paul Murtaugh <murtaugh_at_stat.oregonstate.edu>

Date: Fri, 19 Dec 2008 11:39:17 -0800

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 20 Dec 2008 - 11:54:21 GMT

Date: Fri, 19 Dec 2008 11:39:17 -0800

The function 'regsubsets' appears to calculate a BIC value that is
different from that returned by the function 'BIC'. The latter is
explained in the documentation, but I can't find an expression for the
statistic returned by 'regsubsets'.

Incidentally, both of these differ from the BIC that is given in Ramsey and Schafer's, The Statistical Sleuth. I assume these are all linear transformations of each other, but I'd like to know the 'regsubsets' formula (so that I can develop a way to do all-subsets selection based on the AIC rather than the BIC).

The following code defines a function that illustrates the issue.
Thanks

-Paul

script.ic <- function() {

library(datasets)

print(names(airquality)) # Ozone Solar.R Wind Temp Month Day

# Fit a model with two predictors

mod1 <- lm(Ozone ~ Wind + Temp, data=airquality)

npar <- length(mod1$coef)+1 # no. parameters in fitted model, # including s2, is 4 nobs <- length(mod1$fitted) # no. of observations = 116 s2 <- summary(mod1)$sigma2 # MSE = 477.6371 logL <- as.vector(logLik(mod1)) # log likelihood = -520.8705cat(paste("\nFrom R's BIC:",signif(tmp1,5),"(",signif(tmp2,5),

# Use the R function BIC, defined as: -2*log-likelihood + npar*log(nobs)

tmp1 <- BIC(mod1) # 1060.755 tmp2 <- -2 * (-520.8705) + 4 * log(116) # 1060.755, agrees

"obtained 'by hand')\n\n"))

# Now see how 'regsubsets' calculates the BIC

tmp3 <- regsubsets(Ozone ~ Solar.R + Wind + Temp, data=airquality)
tmp3.s <- summary(tmp3)

# 'mod1' is the second model in 'tmp3'; what is the formula for this BIC?

cat("\nThe corresponding model from 'regsubsets':\n")
print(tmp3.s$which[2,])

tmp4 <- tmp3.s$bic[2] # -82.52875cat(paste("\nBIC =",signif(tmp4,5),"\n"))

# Incidentally, the 'rsq' and 'rss' components of tmp3.s do not agree

# with the values in the 'mod1' lm output object.

# Just for kicks, try the formula for Schwarz's BIC from Ramsey & Schafer,

# Statistical Sleuth: nobs*log(MSE) + npar*log(nobs))

tmp5 <- 116 * log(477.6371) + 4*log(116) # 734.6011
cat(paste("\nStat. Sleuth's BIC =",signif(tmp5,5),"\n"))
cat("\nUff da!\n")

browser()

}

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 20 Dec 2008 - 11:54:21 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 20 Dec 2008 - 12:30:35 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*