# Re: [R] OLS standard errors

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue, 26 Feb 2008 08:07:15 +0000 (GMT)

Please check your statistical methods lecture notes. var(e) has divisor n-1, and that is not an unbiased estimator of the residual variance when 'e' are residuals. From summary.lm (and you are allowed to read the code)

```     rdf <- n - p
if (is.na(z\$df.residual) || rdf != z\$df.residual)
warning("residual degrees of freedom in object suggest this is not
an \"lm\" fit")
p1 <- 1:p
r <- z\$residuals
f <- z\$fitted.values
w <- z\$weights
if (is.null(w)) {
mss <- if (attr(z\$terms, "intercept"))
sum((f - mean(f))^2)
else sum(f^2)
}
else {
mss <- if (attr(z\$terms, "intercept")) {
m <- sum(w * f/sum(w))
sum(w * (f - m)^2)
}
else sum(w * f^2)
rss <- sum(w * r^2)
r <- sqrt(w) * r
}

```

the correct divisor is n-p. Since p=3 in your example, that explains a 2% difference in variances and hence a 1% difference in SEs.

On Tue, 26 Feb 2008, Daniel Malter wrote:

> Hi,
>
> the standard errors of the coefficients in two regressions that I computed
> by hand and using lm() differ by about 1%. Can somebody help me to identify
> the source of this difference? The coefficient estimates are the same, but
> the standard errors differ.
>
> ####Simulate data
>
> happiness=0
> income=0
> gender=(rep(c(0,1,1,0),25))
> for(i in 1:100){
> happiness[i]=1000+i+rnorm(1,0,40)
> income[i]=2*i+rnorm(1,0,40)
> }
>
> ####Run lm()
>
> reg=lm(happiness~income+factor(gender))
> summary(reg)
>
> ####Compute coefficient estimates "by hand"
>
> x=cbind(income,gender)
> y=happiness
>
> z=as.matrix(cbind(rep(1,100),x))
> beta=solve(t(z)%*%z)%*%t(z)%*%y
>
> ####Compare estimates
>
> cbind(reg\$coef,beta) ##fine so far, they both look the same
>
> reg\$coef[1]-beta[1]
> reg\$coef[2]-beta[2]
> reg\$coef[3]-beta[3] ##differences are too small to cause a 1%
> difference
>
> ####Check predicted values
>
> estimates=c(beta[2],beta[3])
>
> predicted=estimates%*%t(x)
> predicted=as.vector(t(as.double(predicted+beta[1])))
>
> cbind(reg\$fitted,predicted) ##inspect fitted values
> as.vector(reg\$fitted-predicted) ##differences are marginal
>
> #### Compute errors
>
> e=NA
> e2=NA
> for(i in 1:length(happiness)){
> e[i]=y[i]-predicted[i] ##for "hand-computed" regression
> e2[i]=y[i]-reg\$fitted[i] ##for lm() regression
> }
>
> #### Compute standard error of the coefficients
>
> sqrt(abs(var(e)*solve(t(z)%*%z))) ##for "hand-computed" regression
> sqrt(abs(var(e2)*solve(t(z)%*%z))) ##for lm() regression using e2 from
> above
>
> ##Both are the same
>
> ####Compare to lm() standard errors of the coefficients again
>
> summary(reg)
>
>
> The diagonal elements of the variance/covariance matrices should be the
> standard errors of the coefficients. Both are identical when computed by
> hand. However, they differ from the standard errors reported in
> summary(reg). The difference of 1% seems nonmarginal. Should I have
> multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do
> this, I still observe a difference. Can anybody help me out what the source
> of this difference is?
>
> Cheers,
> Daniel
>
>
> -------------------------
> cuncta stricte discussurus
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

```--
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
```
Received on Tue 26 Feb 2008 - 08:09:40 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 26 Feb 2008 - 09:30:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.