Re: [Rd] modifying large R objects in place

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri, 28 Sep 2007 16:36:40 +0100 (BST)

On Fri, 28 Sep 2007, Luke Tierney wrote:

> On Fri, 28 Sep 2007, Petr Savicky wrote:

[...]

>> This leads me to a question. Some of the tests, which I did, suggest
>> that gc() may not free all the memory, even if I remove all data
>> objects by rm() before calling gc(). Is this possible or I must have
>> missed something?

> Not impossible but very unlikely givent he use gc gets. There are a
> few internal tables that are grown but not shrunk at the moment but
> that should not usually cause much total growth. If you are ooking at
> system memopry use then that is a malloc issue -- there was a thread
> about this a month or so ago.

A likely explanation is lazy-loading. Almost all the package code is stored externally until used: 2.6.0 is better at not bringing in unused code. E.g. (2.6.0, 64-bit system)

> gc()

          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 141320  7.6     350000 18.7   350000 18.7
Vcells 130043  1.0     786432  6.0   561893  4.3

> for(s in search()) lsf.str(s)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 424383 22.7     531268 28.4   437511 23.4
Vcells 228005  1.8     786432  6.0   700955  5.4

'if I remove all data objects by rm()' presumably means clearing the user workspace: there are lots of other environments containing objects ('data' or otherwise), many of which are needed to run R.

Otherwise the footer to every R-help message applies ....

>> A possible solution to the unwanted increase of NAMED due to temporary
>> calculations could be to give the user the possibility
>> to store NAMED attribute of an object before a call to a function
>> and restore it after the call. To use this, the user should be
>> confident that no new reference to the object persists after the
>> function is completed.
>
> This would be too dangerous for general use. Some more structured
> approach may be possible. A related issue is that user-defined
> assignment functions always see a NAMED of 2 and hence cannot modify
> in place. We've been trying to come up with a reasonable solution to
> this, so far without success but I'm moderately hopeful.

I am not persuaded that the difference between NAMED=1/2 makes much difference in general use of R, and I recall Ross saying that he no longer believed that this was a worthwhile optimization. It's not just 'user-defined' replacement functions, but also all the system-defined closures (including all methods for the generic replacement functions which are primitive) that are unable to benefit from it.

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 28 Sep 2007 - 15:51:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 29 Sep 2007 - 10:42:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.