**From:** Prof Brian Ripley (*ripley@stats.ox.ac.uk*)

**Date:** Tue 11 May 2004 - 22:59:17 EST

**Next message:**Liaw, Andy: "RE: [R] calling data frames"**Previous message:**Roger D. Peng: "Re: [R] How to draw holes generated by gpclib using plot function"**In reply to:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"**Next in thread:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"**Reply:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"

Message-id: <Pine.LNX.4.44.0405111350060.1134-100000@gannet.stats>

On 11 May 2004, Peter Dalgaard wrote:

*> "Liaw, Andy" <andy_liaw@merck.com> writes:
*

*>
*

*> > I tried the following on an Opteron 248, R-1.9.0 w/Goto's BLAS:
*

*> >
*

*> > > y <- matrix(rnorm(14000*1344), 1344)
*

*> > > x <- matrix(runif(1344*503),1344)
*

*> > > system.time(fit <- lm(y~x))
*

*> > [1] 106.00 55.60 265.32 0.00 0.00
*

*> >
*

*> > The resulting fit object is over 600MB. (The coefficient compoent is a 504
*

*> > x 14000 matrix.)
*

*> >
*

*> > If I'm not mistaken, SAS sweeps on the extended cross product matrix to fit
*

*> > regression models. That, I believe, in usually faster than doing QR
*

*> > decomposition on the model matrix itself, but there are trade-offs.
*

Roughly twice as fast but the price is accuracy.

*> You
*

*> > could try what Prof. Bates suggested.
*

*>
*

*> Hmm. Shouldn't be all that much faster, but it will produce the Type I
*

*> SS as you go along, whereas R probably wants to fit the 15 different
*

*> models.
*

Nope, R can read off the Type I SSQs from the QR decomposition so only one

fit is done. (Effectively you remove the effect of one column at a time,

and you get the change in residual/regression SSq as a side effect. Take

a look at anova.lm, which just aggregates squared effects over terms.)

*> I'm still surprised that R/S-PLUS manages to use a full 15 minutes on
*

*> a single response variable. It might be due to the singularities --
*

*> the SAS code indicated that there was a nesting issue with the "A"
*

*> factor in the last 4-factor interaction. If so, a reformulation of the
*

*> model might help.
*

I think we need to understand this better. My guess (but only a guess) is

that the model matrix has very many columns and is highly singular. If

the singularity is by design, a reformulation will help.

-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

**Next message:**Liaw, Andy: "RE: [R] calling data frames"**Previous message:**Roger D. Peng: "Re: [R] How to draw holes generated by gpclib using plot function"**In reply to:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"**Next in thread:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"**Reply:**Peter Dalgaard: "Re: [R] R versus SAS: lm performance"

*
This archive was generated by hypermail 2.1.3
: Mon 31 May 2004 - 23:05:09 EST
*