From: Daniel Malter <daniel_at_umd.edu>

Date: Tue, 26 Feb 2008 02:25:05 -0500

cuncta stricte discussurus

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 26 Feb 2008 - 07:31:48 GMT

Date: Tue, 26 Feb 2008 02:25:05 -0500

Hi,

happiness=0 income=0 gender=(rep(c(0,1,1,0),25)) for(i in 1:100){ happiness[i]=1000+i+rnorm(1,0,40) income[i]=2*i+rnorm(1,0,40) }

####Run lm()

reg=lm(happiness~income+factor(gender)) summary(reg)

####Compute coefficient estimates "by hand"

x=cbind(income,gender) y=happiness z=as.matrix(cbind(rep(1,100),x)) beta=solve(t(z)%*%z)%*%t(z)%*%y

####Compare estimates

cbind(reg$coef,beta) ##fine so far, they both look the same

reg$coef[1]-beta[1] reg$coef[2]-beta[2] reg$coef[3]-beta[3] ##differences are too small to cause a 1%difference

####Check predicted values

estimates=c(beta[2],beta[3])

predicted=estimates%*%t(x) predicted=as.vector(t(as.double(predicted+beta[1]))) cbind(reg$fitted,predicted) ##inspect fitted values as.vector(reg$fitted-predicted) ##differences are marginal

#### Compute errors

e=NA e2=NA for(i in 1:length(happiness)){ e[i]=y[i]-predicted[i] ##for "hand-computed" regression e2[i]=y[i]-reg$fitted[i] ##for lm() regression }

#### Compute standard error of the coefficients

sqrt(abs(var(e)*solve(t(z)%*%z))) ##for "hand-computed" regression sqrt(abs(var(e2)*solve(t(z)%*%z))) ##for lm() regression using e2 fromabove

##Both are the same

####Compare to lm() standard errors of the coefficients again

summary(reg)

The diagonal elements of the variance/covariance matrices should be the standard errors of the coefficients. Both are identical when computed by hand. However, they differ from the standard errors reported in summary(reg). The difference of 1% seems nonmarginal. Should I have multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do this, I still observe a difference. Can anybody help me out what the source of this difference is?

Cheers,

Daniel

cuncta stricte discussurus

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 26 Feb 2008 - 07:31:48 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 26 Feb 2008 - 09:30:16 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*