Re: R-alpha: Memory

Bill Venables (wvenable@attunga.stats.adelaide.edu.au)
Wed, 22 May 1996 12:20:38 +0930


Date: Wed, 22 May 1996 12:20:38 +0930
Message-Id: <9605220250.AA06756@attunga.stats.adelaide.edu.au>
From: Bill Venables <wvenable@attunga.stats.adelaide.edu.au>
To: Ross Ihaka <ihaka@stat.auckland.ac.nz>
Subject: Re: R-alpha: Memory
In-Reply-To: <199605220201.OAA28307@stat.auckland.ac.nz>
 <199605220201.OAA28307@stat.auckland.ac.nz>

Ross Ihaka writes:
...
 > 
 > Even with the enormous amount of RAM present in most modern computers
 > having three copies of a big design matrix around does seem rather
 > wasteful, but the only way I can see around this is to move all the
 > interpreted fitting code into hand-coded C.  This would be a LARGE
 > undertaking.

but probably worth the effort, ultimately.  (However see the last
paragraph below.)

 > Because of this, a future Son-of-R (vaguely on the drawing boards)
 > will be implemented using call-by-reference instead of call-by-value
 > (there are other reasons for doing this besides cutting down memory
 > use).  I'm going to be muttering somthing about this at the upcoming
 > interface meeting.

I think S version 4 (confusingly, to become the basis for S-Plus
version *5*) will probably have something similar.  Certainly
reference counting is to be a feature.

 > It is true that you don't need the entire design matrix to compute
 > regression results, but keeping the QR decomposition around as a basic
 > summary statistic is very useful (try getting the hat matrix out of
 > GLIM).  We (like S) use the Householder form of QR, because it has the
 > most compact representation.

There is no right answer to this problem.  Glim was written in
1972.  In those days Genstat (old and crabby uncle of Glim)
allowed you to read in your regression data line at a time and it
could accumulate the X'X and X'y matrices as it went along.  You
save on storage (which in those far off card reader days) was
vital, but the mere act of calculating X'X essentially squares
your condition number and makes the calculation so much more
error prone.  And of course, no, you can't get the hat matrix (or
even just the diagonal, but as glim will now get you the diagonal
of the hat matrix, things must have changed even in glim).

I am also very suspicious when it comes to very large regression
problems.  From past experience they are often based on
accumulations of very heterogeneous data, or the same results can
be achieved using only a small random sample of the data.

Bill

-- 
_________________________________________________________________
William Venables, Department of Statistics,  Tel.: +61 8 303 3026
The University of Adelaide,                  Fax.: +61 8 303 3696
South AUSTRALIA.     5005.   Email: Bill.Venables@adelaide.edu.au
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-testers mailing list -- To (un)subscribe, send
subscribe	or	unsubscribe
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-