Re: [R] Large number of dummy variables

From: Doran, Harold <>
Date: Mon, 21 Jul 2008 18:16:17 -0400

Well, at the risk of entering a debate I really don't have time for (I'm doing it anyway) why not consider a random coefficient model? If your response has anything like, "well, random effects and fixed effects are correlated and so the estimates are biased but OLS is consistent and unbiased via an appeal to Gauss-Markov" then I will probably make time for this discussion :)

I have experienced this problem, though. In what you're doing, you are first creating the model matrix and then doing the demeaning, correct? I do recall Doug Bates was, at one point, doing some work where the model matrix for the fixed effects was immediately created as a sparse matrix for OLS models. I think doing the work on the sparse matrix is a better analytical method than time-demeaning. I don't remember where that work is, though.

There is a package called sparseM which had functions for doing OLS with sparse matrices. I don't know its status, but vaguely recall the author of sparseM at one point noting that the work of Bates and Maechler would be the go to package for work with large, sparse model matrices.

> -----Original Message-----
> From:
> [] On Behalf Of Alan Spearot
> Sent: Monday, July 21, 2008 5:59 PM
> To:
> Subject: [R] Large number of dummy variables
> Hello,
> I'm trying to run a regression predicting trade flows between
> importers and exporters. I wish to include both
> year-importer dummies and year-exporter dummies. The former
> includes 1378 levels, and the latter includes 1390 levels. I
> have roughly 100,000 total observations.
> When I'm using lm() to run a simple regression, it give me a
> "cannot allocate ___" error. I've been able to get around
> time-demeaning over one large group, but since I have two, it
> doesn't work in the correct way. Is there a more efficient
> way to handling a model matrix this large in R?
> Thanks for your help.
> Alan Spearot
> --
> Alan Spearot
> Assistant Professor - International Economics University of
> California - Santa Cruz
> 1156 High Street
> 453 Engineering 2
> Santa Cruz, CA 95064
> Office: (831) 459-1530
> [[alternative HTML version deleted]]
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Mon 21 Jul 2008 - 22:18:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 Jul 2008 - 23:31:56 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive