# [R] OLS standard errors

From: Daniel Malter <daniel_at_umd.edu>
Date: Tue, 26 Feb 2008 02:25:05 -0500

Hi,

####Simulate data

```	happiness=0
income=0
gender=(rep(c(0,1,1,0),25))
for(i in 1:100){
happiness[i]=1000+i+rnorm(1,0,40)
income[i]=2*i+rnorm(1,0,40)
}

```

####Run lm()

```	reg=lm(happiness~income+factor(gender))
summary(reg)

```

####Compute coefficient estimates "by hand"

```	x=cbind(income,gender)
y=happiness

z=as.matrix(cbind(rep(1,100),x))
beta=solve(t(z)%*%z)%*%t(z)%*%y

```

####Compare estimates

cbind(reg\$coef,beta) ##fine so far, they both look the same

```	reg\$coef-beta
reg\$coef-beta
reg\$coef-beta	##differences are too small to cause a 1%
```
difference

####Check predicted values

estimates=c(beta,beta)

```	predicted=estimates%*%t(x)
predicted=as.vector(t(as.double(predicted+beta)))

cbind(reg\$fitted,predicted)		##inspect fitted values
as.vector(reg\$fitted-predicted)	##differences are marginal

```

#### Compute errors

```	e=NA
e2=NA
for(i in 1:length(happiness)){
e[i]=y[i]-predicted[i]   ##for "hand-computed" regression
e2[i]=y[i]-reg\$fitted[i] ##for lm() regression
}

```

#### Compute standard error of the coefficients

```  sqrt(abs(var(e)*solve(t(z)%*%z)))	##for "hand-computed" regression
sqrt(abs(var(e2)*solve(t(z)%*%z)))	##for lm() regression using e2 from
```
above

##Both are the same

summary(reg)

The diagonal elements of the variance/covariance matrices should be the standard errors of the coefficients. Both are identical when computed by hand. However, they differ from the standard errors reported in summary(reg). The difference of 1% seems nonmarginal. Should I have multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do this, I still observe a difference. Can anybody help me out what the source of this difference is?

Cheers,
Daniel

cuncta stricte discussurus

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 26 Feb 2008 - 07:31:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 26 Feb 2008 - 09:30:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.