From: roger koenker <roger_at_ysidro.econ.uiuc.edu>

Date: Sat, 05 May 2007 10:21:08 -0500

}

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 05 May 2007 - 15:28:16 GMT

Date: Sat, 05 May 2007 10:21:08 -0500

Ivo,

I don't know whether you ever got a proper answer to this question. Here is a kludgy one -- someone else can probably provide a more elegant version of my diffid function.

What you want to do is sweep out the mean deviations from both y and x based on the factor fe and then estimate the simple y on x linear model.

I have an old function that was originally designed to do panel data models that looks like this:

"diffid" <- function(h, id)

{

if(is.vector(h)) h <- matrix(h, ncol = 1) Ph <- unique(id) Ph <- cbind(Ph, table(id)) for(i in 1:ncol(h)) Ph <- cbind(Ph, tapply(h[, i], id, mean)) is <- tapply(id, id) Ph <- Ph[is, - (1:2)] h - Ph

}

With this you can do:

set.seed(1);

fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm
(100);

summary(lm(diffid(y,fe) ~ diffid(x,fe)))

**HTH,
**
Roger

On May 4, 2007, at 3:08 PM, ivo welch wrote:

> hi doug: yikes. could I have done better? Oh dear. I tried to make

*> my example clearer half-way through, but made it worse. I meant
**>
**> set.seed(1);
**> fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm
**> (100);
**> print(summary(lm( y ~ x + fe)))
**> <deleted>
**> Coefficients:
**> Estimate Std. Error t value Pr(>|t|)
**> (Intercept) 0.1128 0.3680 0.31 0.76
**> x 0.0232 0.0960 0.24 0.81
**> fe1 -0.6628 0.5467 -1.21 0.23
**> <deleted more fe's>
**> Residual standard error: 0.949 on 89 degrees of freedom
**> Multiple R-Squared: 0.0838, Adjusted R-squared: -0.0192
**> F-statistic: 0.814 on 10 and 89 DF, p-value: 0.616
**>
**> I really am interested only in this linear specification, the
**> coefficient on x (0.0232) and the R^2 of 8.38% (adjusted -1.92%). If
**> I did not have so much data in my real application, I would never have
**> to look at nlme or nlme4. I really only want to be able to run this
**> specification through lm with far more observations (100,000) and
**> groups (10,000), and be done with my problem.
**>
**> now, with a few IQ points more, I would have looked at the lme
**> function instead of the nlme function in library(nlme). [then
**> again, I could understand stats a lot better with a few more IQ
**> points.] I am reading the lme description now, but I still don't
**> understand how to specify that I want to have dummies in my
**> specification, plus the x variable, and that's it. I think I am not
**> understanding the integration of fixed and random effects in the same
**> R functions.
**>
**> thanks for pointing me at your lme4 library. on linux, version
**> 2.5.0, I did
**> R CMD INSTALL matrix*.tar.gz
**> R CMD INSTALL lme4*.tar.gz
**> and it installed painlessly. (I guess R install packages don't have
**> knowledge of what they rely on; lme4 requires matrix, which the docs
**> state, but having gotten this wrong, I didn't get an error. no big
**> deal. I guess I am too used to automatic resolution of dependencies
**> from linux installers these days that I did not expect this.)
**>
**> I now tried your specification:
**>
**>> library(lme4)
**> Loading required package: Matrix
**> Loading required package: lattice
**>> lmer(y~x+(1|fe))
**> Linear mixed-effects model fit by REML
**> Formula: y ~ x + (1 | fe)
**> AIC BIC logLik MLdeviance REMLdeviance
**> 282 290 -138 270 276
**> Random effects:
**> Groups Name Variance Std.Dev.
**> fe (Intercept) 0.000000000445 0.0000211
**> Residual 0.889548532468 0.9431588
**> number of obs: 100, groups: fe, 10
**>
**> Fixed effects:
**> Estimate Std. Error t value
**> (Intercept) -0.0188 0.0943 -0.199
**> x 0.0528 0.0904 0.585
**>
**> Correlation of Fixed Effects:
**> (Intr)
**> x -0.022
**> Warning messages:
**> 1: Estimated variance for factor 'fe' is effectively zero
**> in: `LMEoptimize<-`(`*tmp*`, value = list(maxIter = 200L, tolerance =
**> 0.0000000149011611938477,
**> 2: $ operator not defined for this S4 class, returning NULL in: x
**> $symbolic.cor
**>
**> Without being a statistician, I can still determine that this is not
**> the model I would like to work with. The coefficient is 0.0528, not
**> 0.0232. (I am also not sure why I am getting these warning messages
**> on my system, either, but I don't think it matters.)
**>
**> is there a simple way to get the equivalent specification for my smple
**> model, using lmer or lme, which does not choke on huge data sets?
**>
**> regards,
**>
**> /ivo
**>
**> ______________________________________________
**> R-help_at_stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-
**> guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 05 May 2007 - 15:28:16 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 05 May 2007 - 15:31:37 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*