Re: [R] R versus SAS: lm performance

About this list Date view Thread view Subject view Author view Attachment view

From: roger koenker (
Date: Tue 11 May 2004 - 22:42:41 EST

Message-id: <>

I would be curious to know how sparse the model.matrix for this problem
Unless it is quite dense, or as Brian implies quite singular, I might
computing a Cholesky factorization in SparseM.

url: Roger Koenker
email Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820

On May 11, 2004, at 7:07 AM, Douglas Bates wrote:

> <> writes:
>> Hello,
>> A collegue of mine has compared the runtime of a linear model + anova
>> in SAS and S+. He got the same results, but SAS took a bit more than
>> a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and
>> it took 15 min. Neither machine run out of memory, and I assume that
>> all machines have similar hardware, but the S+ and SAS machines are
>> on windows whereas the R machine is Redhat Linux 7.2.
>> My question is if I'm doing something wrong (technically) calling the
>> lm routine, or (if not), how I can optimize the call to lm or even
>> using an alternative to lm. I'd like to run about 12,000 of these
>> models in R (for a gene expression experiment - one model per gene,
>> which would take far too long).
>> I've run the follwong code in R (and S+):
> ...
> As Brian Ripley mentioned, you could save the model matrix and use it
> with each of your responses. Versions 0.8-1 and later of the Matrix
> package have a vignette that provides comparative timings of various
> ways of obtaining the least squares estimates. If you use the classes
> from the Matrix package and create and save the crossproduct of the
> model matrix
> mm = as(model.matrix(Va ~ Ba+Ti..., df), "geMatrix")
> cprod = crossprod(mm)
> then successive calls to
> coef = solve(cprod, crossprod(mm, df$Va))
> will produce the coefficient estimates much faster than will calls to
> lm, which each do all the work of generating and decomposing the very
> large model matrix.
> Note that this method only produces the coefficient estimates, which
> may be enough for your purposes. Also, this method will not handle
> missing data or rank-deficient model matrices in the elegant way that
> lm does.
> If you are doing this 12,000 times it may be worthwhile checking if
> the sparse matrix formulation
> mmS = as(mm, "cscMatrix")
> cprodS = crossprod(mmS)
> is faster.
> The dense matrix formulation (but not the sparse) can benefit from
> installation of optimized BLAS routines such as Atlas or Goto's BLAS.
> --
> Douglas Bates
> Statistics Department 608/262-2598
> University of Wisconsin - Madison
> ______________________________________________
> mailing list
> PLEASE do read the posting guide!

______________________________________________ mailing list
PLEASE do read the posting guide!

About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:09 EST