**From:** roger koenker (*rkoenker@uiuc.edu*)

**Date:** Tue 11 May 2004 - 22:42:41 EST

**Next message:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"**Previous message:**Christian Hennig: "Re: [R] stability measures for heirarchical clustering"**In reply to:**Douglas Bates: "Re: [R] R versus SAS: lm performance"**Next in thread:**Liaw, Andy: "RE: [R] R versus SAS: lm performance"

Message-id: <B1549A4A-A348-11D8-873F-000A95A7E3AA@uiuc.edu>

I would be curious to know how sparse the model.matrix for this problem

is...

Unless it is quite dense, or as Brian implies quite singular, I might

suggest

computing a Cholesky factorization in SparseM.

url: www.econ.uiuc.edu/~roger Roger Koenker

email rkoenker@uiuc.edu Department of Economics

vox: 217-333-4558 University of Illinois

fax: 217-244-6678 Champaign, IL 61820

On May 11, 2004, at 7:07 AM, Douglas Bates wrote:

*> <Arne.Muller@aventis.com> writes:
*

*>
*

*>> Hello,
*

*>>
*

*>> A collegue of mine has compared the runtime of a linear model + anova
*

*>> in SAS and S+. He got the same results, but SAS took a bit more than
*

*>> a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and
*

*>> it took 15 min. Neither machine run out of memory, and I assume that
*

*>> all machines have similar hardware, but the S+ and SAS machines are
*

*>> on windows whereas the R machine is Redhat Linux 7.2.
*

*>>
*

*>> My question is if I'm doing something wrong (technically) calling the
*

*>> lm routine, or (if not), how I can optimize the call to lm or even
*

*>> using an alternative to lm. I'd like to run about 12,000 of these
*

*>> models in R (for a gene expression experiment - one model per gene,
*

*>> which would take far too long).
*

*>>
*

*>> I've run the follwong code in R (and S+):
*

*>
*

*> ...
*

*>
*

*> As Brian Ripley mentioned, you could save the model matrix and use it
*

*> with each of your responses. Versions 0.8-1 and later of the Matrix
*

*> package have a vignette that provides comparative timings of various
*

*> ways of obtaining the least squares estimates. If you use the classes
*

*> from the Matrix package and create and save the crossproduct of the
*

*> model matrix
*

*>
*

*> mm = as(model.matrix(Va ~ Ba+Ti..., df), "geMatrix")
*

*> cprod = crossprod(mm)
*

*>
*

*> then successive calls to
*

*>
*

*> coef = solve(cprod, crossprod(mm, df$Va))
*

*>
*

*> will produce the coefficient estimates much faster than will calls to
*

*> lm, which each do all the work of generating and decomposing the very
*

*> large model matrix.
*

*>
*

*> Note that this method only produces the coefficient estimates, which
*

*> may be enough for your purposes. Also, this method will not handle
*

*> missing data or rank-deficient model matrices in the elegant way that
*

*> lm does.
*

*>
*

*> If you are doing this 12,000 times it may be worthwhile checking if
*

*> the sparse matrix formulation
*

*>
*

*> mmS = as(mm, "cscMatrix")
*

*> cprodS = crossprod(mmS)
*

*>
*

*> is faster.
*

*>
*

*> The dense matrix formulation (but not the sparse) can benefit from
*

*> installation of optimized BLAS routines such as Atlas or Goto's BLAS.
*

*>
*

*> --
*

*> Douglas Bates bates@stat.wisc.edu
*

*> Statistics Department 608/262-2598
*

*> University of Wisconsin - Madison
*

*> http://www.stat.wisc.edu/~bates/
*

*>
*

*> ______________________________________________
*

*> R-help@stat.math.ethz.ch mailing list
*

*> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
*

*> PLEASE do read the posting guide!
*

*> http://www.R-project.org/posting-guide.html
*

______________________________________________

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

**Next message:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"**Previous message:**Christian Hennig: "Re: [R] stability measures for heirarchical clustering"**In reply to:**Douglas Bates: "Re: [R] R versus SAS: lm performance"**Next in thread:**Liaw, Andy: "RE: [R] R versus SAS: lm performance"

*
This archive was generated by hypermail 2.1.3
: Mon 31 May 2004 - 23:05:09 EST
*