Re: [Rd] Large discrepancies in the same object being saved to .RData

From: Tony Plate <taplate_at_gmail.com>
Date: Sun, 11 Jul 2010 11:08:44 -0400

Another way of seeing the environments referenced in an object is using str(), e.g.:

 > f1 <- function() {

+ junk <- rnorm(10000000)
+ x <- 1:3
+ y <- rnorm(3)
+ lm(y ~ x)
+ }

 > v1 <- f1()
 > object.size(f1)
1636 bytes

 > grep("Environment", capture.output(str(v1)), value=TRUE) [1] " .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> " [2] " .. .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> "  >

On 7/10/2010 10:10 PM, Bill.Venables_at_csiro.au wrote:
> Well, I have answered one of my questions below. The hidden
> environment is attached to the 'terms' component of v1.
>
> To see this
>
>
>> lapply(v1, environment)
>>
> $coefficients
> NULL
>
> $residuals
> NULL
>
> $effects
> NULL
>
> $rank
> NULL
>
> $fitted.values
> NULL
>
> $assign
> NULL
>
> $qr
> NULL
>
> $df.residual
> NULL
>
> $xlevels
> NULL
>
> $call
> NULL
>
> $terms
> <environment: 0x021b9e18>
>
> $model
> NULL
>
>
>> rm(junk, envir = with(v1, environment(terms)))
>> usedVcells()
>>
> [1] 96532
>
>>
>>
> This is still a bit of a trap for young (and old!) players...
>
> I think the main point in my mind is why is it that object.size()
> excludes enclosing environments in its reckonings?
>
> Bill Venables.
>
> -----Original Message-----
> From: Venables, Bill (CMIS, Cleveland)
> Sent: Sunday, 11 July 2010 11:40 AM
> To: 'Duncan Murdoch'; 'Paul Johnson'
> Cc: 'r-devel_at_r-project.org'; Taylor, Julian (CMIS, Waite Campus)
> Subject: RE: [Rd] Large discrepancies in the same object being saved to .RData
>
> I'm still a bit puzzled by the original question. I don't think it
> has much to do with .RData files and their sizes. For me the puzzle
> comes much earlier. Here is an example of what I mean using a little
> session
>
>
>> usedVcells<- function() gc()["Vcells", "used"]
>> usedVcells() ### the base load
>>
> [1] 96345
>
> ### Now look at what happens when a function returns a formula as the
> ### value, with a big item floating around in the function closure:
>
>
>> f0<- function() {
>>
> + junk<- rnorm(10000000)
> + y ~ x
> + }
>
>> v0<- f0()
>> usedVcells() ### much bigger than base, why?
>>
> [1] 10096355
>
>> v0 ### no obvious envirnoment
>>
> y ~ x
>
>> object.size(v0) ### so far, no clue given where
>>
> ### the extra Vcells are located.
> 372 bytes
>
> ### Does v0 have an enclosing environment?
>
>
>> environment(v0) ### yep.
>>
> <environment: 0x021cc538>
>
>> ls(envir = environment(v0)) ### as expected, there's the junk
>>
> [1] "junk"
>
>> rm(junk, envir = environment(v0)) ### this does the trick.
>> usedVcells()
>>
> [1] 96355
>
> ### Now consider a second example where the object
> ### is not a formula, but contains one.
>
>
>> f1<- function() {
>>
> + junk<- rnorm(10000000)
> + x<- 1:3
> + y<- rnorm(3)
> + lm(y ~ x)
> + }
>
>
>> v1<- f1()
>> usedVcells() ### as might have been expected.
>>
> [1] 10096455
>
> ### in this case, though, there is no
> ### (obvious) enclosing environment
>
>
>> environment(v1)
>>
> NULL
>
>> object.size(v1) ### so where are the junk Vcells located?
>>
> 7744 bytes
>
>> ls(envir = environment(v1)) ### clearly wil not work
>>
> Error in ls(envir = environment(v1)) : invalid 'envir' argument
>
>
>> rm(v1) ### removing the object does clear out the junk.
>> usedVcells()
>>
> [1] 96366
>
>>
> And in this second case, as noted by Julian Taylor, if you save() the
> object the .RData file is also huge. There is an environment attached
> to the object somewhere, but it appears to be occluded and entirely
> inaccessible. (I have poked around the object components trying to
> find the thing but without success.)
>
> Have I missed something?
>
> Bill Venables.
>
> -----Original Message-----
> From: r-devel-bounces_at_r-project.org [mailto:r-devel-bounces_at_r-project.org] On Behalf Of Duncan Murdoch
> Sent: Sunday, 11 July 2010 10:36 AM
> To: Paul Johnson
> Cc: r-devel_at_r-project.org
> Subject: Re: [Rd] Large discrepancies in the same object being saved to .RData
>
> On 10/07/2010 2:33 PM, Paul Johnson wrote:
>
>> On Wed, Jul 7, 2010 at 7:12 AM, Duncan Murdoch<murdoch.duncan_at_gmail.com> wrote:
>>
>>
>>> On 06/07/2010 9:04 PM, Julian.Taylor_at_csiro.au wrote:
>>>
>>>
>>>> Hi developers,
>>>>
>>>>
>>>>
>>>> After some investigation I have found there can be large discrepancies in
>>>> the same object being saved as an external "xx.RData" file. The immediate
>>>> repercussion of this is the possible increased size of your .RData workspace
>>>> for no apparent reason.
>>>>
>>>>
>>>>
>>>>
>>>>
>>> I haven't worked through your example, but in general the way that local
>>> objects get captured is when part of the return value includes an
>>> environment.
>>>
>>>
>> Hi, can I ask a follow up question?
>>
>> Is there a tool to browse *.Rdata files without loading them into R?
>>
>>
> I don't know of one. You can load the whole file into an empty
> environment, but then you lose information about "where did it come from"?
>
> Duncan Murdoch
>
>> In HDF5 (a data storage format we use sometimes), there is a CLI
>> program "h5dump" that will spit out line-by-line all the contents of a
>> storage entity. It will literally track through all the metadata, all
>> the vectors of scores, etc. I've found that handy to "see what's
>> really in there" in cases like the one that OP asked about.
>> Sometimes, we find that there are things that are "in there" by
>> mistake, as Duncan describes, and then we can try to figure why they
>> are in there.
>>
>> pj
>>
>>
>>
>>
> ______________________________________________
> R-devel_at_r-project.org mailing list
>
https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 11 Jul 2010 - 15:12:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 11 Jul 2010 - 19:00:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive