Date: Wed, 22 May 1996 12:20:38 +0930 Message-Id: <9605220250.AA06756@attunga.stats.adelaide.edu.au> From: Bill Venables <wvenable@attunga.stats.adelaide.edu.au> To: Ross Ihaka <ihaka@stat.auckland.ac.nz> Subject: Re: R-alpha: Memory In-Reply-To: <199605220201.OAA28307@stat.auckland.ac.nz> <199605220201.OAA28307@stat.auckland.ac.nz> Ross Ihaka writes: ... > > Even with the enormous amount of RAM present in most modern computers > having three copies of a big design matrix around does seem rather > wasteful, but the only way I can see around this is to move all the > interpreted fitting code into hand-coded C. This would be a LARGE > undertaking. but probably worth the effort, ultimately. (However see the last paragraph below.) > Because of this, a future Son-of-R (vaguely on the drawing boards) > will be implemented using call-by-reference instead of call-by-value > (there are other reasons for doing this besides cutting down memory > use). I'm going to be muttering somthing about this at the upcoming > interface meeting. I think S version 4 (confusingly, to become the basis for S-Plus version *5*) will probably have something similar. Certainly reference counting is to be a feature. > It is true that you don't need the entire design matrix to compute > regression results, but keeping the QR decomposition around as a basic > summary statistic is very useful (try getting the hat matrix out of > GLIM). We (like S) use the Householder form of QR, because it has the > most compact representation. There is no right answer to this problem. Glim was written in 1972. In those days Genstat (old and crabby uncle of Glim) allowed you to read in your regression data line at a time and it could accumulate the X'X and X'y matrices as it went along. You save on storage (which in those far off card reader days) was vital, but the mere act of calculating X'X essentially squares your condition number and makes the calculation so much more error prone. And of course, no, you can't get the hat matrix (or even just the diagonal, but as glim will now get you the diagonal of the hat matrix, things must have changed even in glim). I am also very suspicious when it comes to very large regression problems. From past experience they are often based on accumulations of very heterogeneous data, or the same results can be achieved using only a small random sample of the data. Bill -- _________________________________________________________________ William Venables, Department of Statistics, Tel.: +61 8 303 3026 The University of Adelaide, Fax.: +61 8 303 3696 South AUSTRALIA. 5005. Email: Bill.Venables@adelaide.edu.au =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- r-testers mailing list -- To (un)subscribe, send subscribe or unsubscribe (in the "body", not the subject !) To: r-testers-request@stat.math.ethz.ch =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-