Re: [R] R versus SAS: lm performance

About this list Date view Thread view Subject view Author view Attachment view

From: roger koenker (rkoenker@uiuc.edu)
Date: Tue 11 May 2004 - 22:42:41 EST


Message-id: <B1549A4A-A348-11D8-873F-000A95A7E3AA@uiuc.edu>

I would be curious to know how sparse the model.matrix for this problem
is...
Unless it is quite dense, or as Brian implies quite singular, I might
suggest
computing a Cholesky factorization in SparseM.

url: www.econ.uiuc.edu/~roger Roger Koenker
email rkoenker@uiuc.edu Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820

On May 11, 2004, at 7:07 AM, Douglas Bates wrote:

> <Arne.Muller@aventis.com> writes:
>
>> Hello,
>>
>> A collegue of mine has compared the runtime of a linear model + anova
>> in SAS and S+. He got the same results, but SAS took a bit more than
>> a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and
>> it took 15 min. Neither machine run out of memory, and I assume that
>> all machines have similar hardware, but the S+ and SAS machines are
>> on windows whereas the R machine is Redhat Linux 7.2.
>>
>> My question is if I'm doing something wrong (technically) calling the
>> lm routine, or (if not), how I can optimize the call to lm or even
>> using an alternative to lm. I'd like to run about 12,000 of these
>> models in R (for a gene expression experiment - one model per gene,
>> which would take far too long).
>>
>> I've run the follwong code in R (and S+):
>
> ...
>
> As Brian Ripley mentioned, you could save the model matrix and use it
> with each of your responses. Versions 0.8-1 and later of the Matrix
> package have a vignette that provides comparative timings of various
> ways of obtaining the least squares estimates. If you use the classes
> from the Matrix package and create and save the crossproduct of the
> model matrix
>
> mm = as(model.matrix(Va ~ Ba+Ti..., df), "geMatrix")
> cprod = crossprod(mm)
>
> then successive calls to
>
> coef = solve(cprod, crossprod(mm, df$Va))
>
> will produce the coefficient estimates much faster than will calls to
> lm, which each do all the work of generating and decomposing the very
> large model matrix.
>
> Note that this method only produces the coefficient estimates, which
> may be enough for your purposes. Also, this method will not handle
> missing data or rank-deficient model matrices in the elegant way that
> lm does.
>
> If you are doing this 12,000 times it may be worthwhile checking if
> the sparse matrix formulation
>
> mmS = as(mm, "cscMatrix")
> cprodS = crossprod(mmS)
>
> is faster.
>
> The dense matrix formulation (but not the sparse) can benefit from
> installation of optimized BLAS routines such as Atlas or Goto's BLAS.
>
> --
> Douglas Bates bates@stat.wisc.edu
> Statistics Department 608/262-2598
> University of Wisconsin - Madison
> http://www.stat.wisc.edu/~bates/
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:09 EST