Re: [Rd] modifying large R objects in place

From: Prof Brian Ripley <>
Date: Fri, 28 Sep 2007 16:36:40 +0100 (BST)

On Fri, 28 Sep 2007, Luke Tierney wrote:

> On Fri, 28 Sep 2007, Petr Savicky wrote:


>> This leads me to a question. Some of the tests, which I did, suggest
>> that gc() may not free all the memory, even if I remove all data
>> objects by rm() before calling gc(). Is this possible or I must have
>> missed something?

> Not impossible but very unlikely givent he use gc gets. There are a
> few internal tables that are grown but not shrunk at the moment but
> that should not usually cause much total growth. If you are ooking at
> system memopry use then that is a malloc issue -- there was a thread
> about this a month or so ago.

A likely explanation is lazy-loading. Almost all the package code is stored externally until used: 2.6.0 is better at not bringing in unused code. E.g. (2.6.0, 64-bit system)

> gc()

          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 141320  7.6     350000 18.7   350000 18.7
Vcells 130043  1.0     786432  6.0   561893  4.3

> for(s in search()) lsf.str(s)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 424383 22.7     531268 28.4   437511 23.4
Vcells 228005  1.8     786432  6.0   700955  5.4

'if I remove all data objects by rm()' presumably means clearing the user workspace: there are lots of other environments containing objects ('data' or otherwise), many of which are needed to run R.

Otherwise the footer to every R-help message applies ....

>> A possible solution to the unwanted increase of NAMED due to temporary
>> calculations could be to give the user the possibility
>> to store NAMED attribute of an object before a call to a function
>> and restore it after the call. To use this, the user should be
>> confident that no new reference to the object persists after the
>> function is completed.
> This would be too dangerous for general use. Some more structured
> approach may be possible. A related issue is that user-defined
> assignment functions always see a NAMED of 2 and hence cannot modify
> in place. We've been trying to come up with a reasonable solution to
> this, so far without success but I'm moderately hopeful.

I am not persuaded that the difference between NAMED=1/2 makes much difference in general use of R, and I recall Ross saying that he no longer believed that this was a worthwhile optimization. It's not just 'user-defined' replacement functions, but also all the system-defined closures (including all methods for the generic replacement functions which are primitive) that are unable to benefit from it.

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
Received on Fri 28 Sep 2007 - 15:51:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 29 Sep 2007 - 10:42:32 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.