Re: [Rd] saving objects with embedded environments

From: Roger Peng <rdpeng_at_gmail.com>
Date: Fri, 29 Jun 2007 19:44:28 -0400

I believe this is intentional. See ?serialize. When lm() is called in a function, the environment is saved in case the resulting fitted model object needs to be updated, for example, with update().

if you don't want the linear model object, you might try just saving the relevant objects to a separate list rather than try to delete everything that is irrelevant from the 'lm' object.

-roger

On 6/28/07, McGehee, Robert <Robert.McGehee_at_geodecapital.com> wrote:
> Hello,
> I have been running linear regressions on large data sets. As 'lm' saves
> a great deal of extraneous (for me) data including the residuals,

> fitted.values, model frame, etc., I generally set these to NULL within
> the object before saving off the model to a file.
>
> In the below example, however, I have found that depending on whether or
> not I run 'lm' within another function or not, the entire function
> environment is saved off with the file. So, even while object.size and
> all.equal report that both 'lm's are equal and of small size, one saves
> as a 24MB file and the other as 646 bytes. These seems to be because in
> the first example the function environment is saved in attr(x1$terms,
> ".Environment") and takes up all 24MB of space.
>
> Anyway, I think this is a bug, or if nothing else very undesirable (that
> an object reported to be 0.5kb takes up 24MB). There also seems to be
> some inconsistency on how environments are saved depending on if it is
> the global environment or not, though I'm not familiar enough with
> environments to know if this was intentional. Comments are appreciated.
>
> Thanks,
> Robert
>
> ##################################################################
> testEq <- function(B) {
> x <- lm(y ~ x1+x2+x3, data=B, model=FALSE)
> x$residuals <- x$effects <- x$fitted.values <- x$qr$qr <- NULL
> x
> }
>
> N <- 900000
> B <- data.frame(y=rnorm(N)+1:N, x1=rnorm(N)+1:N, x2=rnorm(N)+1:N,
> x3=rnorm(N)+1:N)
> x1 <- testEq(B)
> x2 <- lm(y ~ x1+x2+x3, data=B, model=FALSE)
> x2$residuals <- x2$effects <- x2$fitted.values <- x2$qr$qr <- NULL
>
> all.equal(x1, x2) ## TRUE
> object.size(x1) ## 5112
> object.size(x2) ## 5112
> save(x1, file="x1.RData")
> save(x2, file="x2.RData")
> file.info("x1.RData")$size ## 24063852 bytes
> file.info("x2.RData")$size ## 646 bytes
>
> > R.version
> _
> platform i686-pc-linux-gnu
> arch i686
> os linux-gnu
> system i686, linux-gnu
> status
> major 2
> minor 5.0
> year 2007
> month 04
> day 23
> svn rev 41293
> language R
> version.string R version 2.5.0 (2007-04-23)
>
>
> Robert McGehee, CFA
> Quantitative Analyst
> Geode Capital Management, LLC
> One Post Office Square, 28th Floor | Boston, MA | 02109
> Tel: 617/392-8396 Fax:617/476-6389
> mailto:robert.mcgehee_at_geodecapital.com
>
>
>
> This e-mail, and any attachments hereto, are intended for us...{{dropped}}
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 30 Jun 2007 - 00:04:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 02 Jul 2007 - 14:35:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.