From: Martin Maechler <maechler_at_stat.math.ethz.ch>

Date: Tue, 22 Jul 2008 16:07:14 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 14:42:27 GMT

Date: Tue, 22 Jul 2008 16:07:14 +0200

>>>>> "HaroldD" == Doran, Harold <HDoran_at_air.org> >>>>> on Mon, 21 Jul 2008 19:15:37 -0400 writes:

HaroldD> Well, yes and no. In R there really isn't a need to create the model matrix because this is done in R from the factors. But, to implement this computational trick Alan is asking about, it requires that he first create the full, dense model matrix and the do the time-demeaning on that matrix.

HaroldD> If lm() could go straight from a factor to a sparse HaroldD> model matrix, time-demeaning would not be necessary.

Well, lm() is in "stats" would only work with dense matrices
anyway.

But you are right in what you *meant*:
We'd need versions of model.frame() and model.matrix() which
from a formula produce a sparse model matrix (aka "X matrix") or
its transpose.

Doug Bates showed you how to do the latter manually,
equivalently to model.matrix(~ 0 + f1 + f2) when f1 and f2 are
factors.

I'm sure that longer-term we'd want versions of model.matrix() / model.frame() that work with sparse matrices.

HaroldD> Doing work as Doug suggests in the other HaroldD> post is what would be best for now, me thinks.

Yes.

BTW, you mentioned SparseM's "OLS with sparse matrices".
The problem there is the same as with 'Matrix': You must somehow
get your sparse X matrix and the best currrent tools to that, AFAIK,
are the ones in 'Matrix' Doug Bates mentioned (and wrote!).

Martin Maechler

HaroldD> -----Original Message----- HaroldD> From: Bert Gunter [mailto:gunter.berton_at_gene.com] HaroldD> Sent: Mon 7/21/2008 6:45 PM HaroldD> To: Doran, Harold; aspearot_at_ucsc.edu; r-help_at_r-project.org HaroldD> Subject: RE: [R] Large number of dummy variables HaroldD> Unless I'm way off base, dummy variable are never needed (nor are desirable)HaroldD> in R; they should be modelled as factors instead. AN INTRO TO R might, and HaroldD> certainly V&R's MASS and others will, explain this in more detail.

HaroldD> -- Bert Gunter

HaroldD> Genentech, Inc.

HaroldD> -----Original Message----- HaroldD> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On HaroldD> Behalf Of Doran, Harold HaroldD> Sent: Monday, July 21, 2008 3:16 PM HaroldD> To: aspearot_at_ucsc.edu; r-help_at_r-project.org HaroldD> Cc: Douglas Bates HaroldD> Subject: Re: [R] Large number of dummy variables HaroldD> Well, at the risk of entering a debate I really don't have time for (I'm HaroldD> doing it anyway) why not consider a random coefficient model? If your HaroldD> response has anything like, "well, random effects and fixed effects are HaroldD> correlated and so the estimates are biased but OLS is consistent andHaroldD> unbiased via an appeal to Gauss-Markov" then I will probably make time HaroldD> for this discussion :)

HaroldD> I have experienced this problem, though. In what you're doing, you are HaroldD> first creating the model matrix and then doing the demeaning, correct? I HaroldD> do recall Doug Bates was, at one point, doing some work where the model HaroldD> matrix for the fixed effects was immediately created as a sparse matrix HaroldD> for OLS models. I think doing the work on the sparse matrix is a better HaroldD> analytical method than time-demeaning. I don't remember where that work HaroldD> is, though. HaroldD> There is a package called sparseM which had functions for doing OLS with HaroldD> sparse matrices. I don't know its status, but vaguely recall the authorHaroldD> of sparseM at one point noting that the work of Bates and Maechler would HaroldD> be the go to package for work with large, sparse model matrices.

>> -----Original Message-----

* >> From: r-help-bounces_at_r-project.org
** >> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Alan Spearot
** >> Sent: Monday, July 21, 2008 5:59 PM
** >> To: r-help_at_r-project.org
** >> Subject: [R] Large number of dummy variables
** >>
** >> Hello,
** >>
** >> I'm trying to run a regression predicting trade flows between
** >> importers and exporters. I wish to include both
** >> year-importer dummies and year-exporter dummies. The former
** >> includes 1378 levels, and the latter includes 1390 levels. I
** >> have roughly 100,000 total observations.
** >>
** >> When I'm using lm() to run a simple regression, it give me a
** >> "cannot allocate ___" error. I've been able to get around
** >> time-demeaning over one large group, but since I have two, it
** >> doesn't work in the correct way. Is there a more efficient
** >> way to handling a model matrix this large in R?
** >>
** >> Thanks for your help.
** >>
** >> Alan Spearot
** >>
** >> --
** >> Alan Spearot
** >> Assistant Professor - International Economics University of
** >> California - Santa Cruz
** >> 1156 High Street
** >> 453 Engineering 2
** >> Santa Cruz, CA 95064
** >> Office: (831) 459-1530
** >> acspearot_at_gmail.com
** >> http://people.ucsc.edu/~aspearot
*

>>

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 14:42:27 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 15:01:57 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*