From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>

Date: Tue, 26 Feb 2008 08:57:54 +0100

Date: Tue, 26 Feb 2008 08:57:54 +0100

Daniel Malter wrote:

*> Hi,
**>
*

> the standard errors of the coefficients in two regressions that I computed

*> by hand and using lm() differ by about 1%. Can somebody help me to identify
**> the source of this difference? The coefficient estimates are the same, but
**> the standard errors differ.
**>
**> ####Simulate data
**>
**> happiness=0
**> income=0
**> gender=(rep(c(0,1,1,0),25))
**> for(i in 1:100){
**> happiness[i]=1000+i+rnorm(1,0,40)
**> income[i]=2*i+rnorm(1,0,40)
**> }
**>
**> ####Run lm()
**>
**> reg=lm(happiness~income+factor(gender))
**> summary(reg)
**>
**> ####Compute coefficient estimates "by hand"
**>
**> x=cbind(income,gender)
**> y=happiness
**>
**> z=as.matrix(cbind(rep(1,100),x))
**> beta=solve(t(z)%*%z)%*%t(z)%*%y
**>
**> ####Compare estimates
**>
**> cbind(reg$coef,beta) ##fine so far, they both look the same
**>
**> reg$coef[1]-beta[1]
**> reg$coef[2]-beta[2]
**> reg$coef[3]-beta[3] ##differences are too small to cause a 1%
**> difference
**>
**> ####Check predicted values
**>
**> estimates=c(beta[2],beta[3])
**>
**> predicted=estimates%*%t(x)
**> predicted=as.vector(t(as.double(predicted+beta[1])))
**>
**> cbind(reg$fitted,predicted) ##inspect fitted values
**> as.vector(reg$fitted-predicted) ##differences are marginal
**>
**> #### Compute errors
**>
**> e=NA
**> e2=NA
**> for(i in 1:length(happiness)){
**> e[i]=y[i]-predicted[i] ##for "hand-computed" regression
**> e2[i]=y[i]-reg$fitted[i] ##for lm() regression
**> }
**>
**> #### Compute standard error of the coefficients
**>
**> sqrt(abs(var(e)*solve(t(z)%*%z))) ##for "hand-computed" regression
**> sqrt(abs(var(e2)*solve(t(z)%*%z))) ##for lm() regression using e2 from
**> above
**>
**> ##Both are the same
**>
**> ####Compare to lm() standard errors of the coefficients again
**>
**> summary(reg)
**>
**>
**> The diagonal elements of the variance/covariance matrices should be the
**> standard errors of the coefficients. Both are identical when computed by
**> hand. However, they differ from the standard errors reported in
**> summary(reg). The difference of 1% seems nonmarginal. Should I have
**> multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do
**> this, I still observe a difference. Can anybody help me out what the source
**> of this difference is?
**>
**>
*

The degrees of freedom in a regression analysis is n minus the number of
parameters, three in your case. I.e. the variance var(e) does not know
about this and divides by n-1 where it should have been n-3, so.....

*> Cheers,
*

> Daniel

*>
**>
**> -------------------------
**> cuncta stricte discussurus
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

-- O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Tue 26 Feb 2008 - 08:07:48 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 26 Feb 2008 - 08:30:16 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*