[Rd] saving objects with embedded environments

From: McGehee, Robert <Robert.McGehee_at_geodecapital.com>
Date: Thu, 28 Jun 2007 18:30:39 -0400


Hello,
I have been running linear regressions on large data sets. As 'lm' saves a great deal of extraneous (for me) data including the residuals, fitted.values, model frame, etc., I generally set these to NULL within the object before saving off the model to a file.

In the below example, however, I have found that depending on whether or not I run 'lm' within another function or not, the entire function environment is saved off with the file. So, even while object.size and all.equal report that both 'lm's are equal and of small size, one saves as a 24MB file and the other as 646 bytes. These seems to be because in the first example the function environment is saved in attr(x1$terms, ".Environment") and takes up all 24MB of space.

Anyway, I think this is a bug, or if nothing else very undesirable (that an object reported to be 0.5kb takes up 24MB). There also seems to be some inconsistency on how environments are saved depending on if it is the global environment or not, though I'm not familiar enough with environments to know if this was intentional. Comments are appreciated.

Thanks,
Robert

##################################################################
testEq <- function(B) {

    x <- lm(y ~ x1+x2+x3, data=B, model=FALSE)     x$residuals <- x$effects <- x$fitted.values <- x$qr$qr <- NULL     x
}

N <- 900000
B <- data.frame(y=rnorm(N)+1:N, x1=rnorm(N)+1:N, x2=rnorm(N)+1:N, x3=rnorm(N)+1:N)
x1 <- testEq(B)
x2 <- lm(y ~ x1+x2+x3, data=B, model=FALSE) x2$residuals <- x2$effects <- x2$fitted.values <- x2$qr$qr <- NULL

all.equal(x1, x2) ## TRUE
object.size(x1)  ## 5112
object.size(x2)  ## 5112

save(x1, file="x1.RData")
save(x2, file="x2.RData")
file.info("x1.RData")$size ## 24063852 bytes file.info("x2.RData")$size ## 646 bytes

> R.version

               _                           
platform       i686-pc-linux-gnu           
arch           i686                        
os             linux-gnu                   
system         i686, linux-gnu             
status                                     
major          2                           
minor          5.0                         
year           2007                        
month          04                          
day            23                          
svn rev        41293                       
language       R                           
version.string R version 2.5.0 (2007-04-23)

Robert McGehee, CFA
Quantitative Analyst
Geode Capital Management, LLC

One Post Office Square, 28th Floor | Boston, MA | 02109 Tel: 617/392-8396 Fax:617/476-6389
mailto:robert.mcgehee_at_geodecapital.com

This e-mail, and any attachments hereto, are intended for us...{{dropped}}



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 29 Jun 2007 - 00:14:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 30 Jun 2007 - 00:35:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.