Re: [Rd] allocVector bug ?

From: Luke Tierney <luke_at_stat.uiowa.edu>
Date: Fri 03 Nov 2006 - 04:26:22 GMT

On Wed, 1 Nov 2006, Vladimir Dergachev wrote:

>
> Hi all,
>
> I was looking at the following piece of code in src/main/memory.c, function
> allocVector :
>
> if (size <= NodeClassSize[1]) {
> node_class = 1;
> alloc_size = NodeClassSize[1];
> }
> else {
> node_class = LARGE_NODE_CLASS;
> alloc_size = size;
> for (i = 2; i < NUM_SMALL_NODE_CLASSES; i++) {
> if (size <= NodeClassSize[i]) {
> node_class = i;
> alloc_size = NodeClassSize[i];
> break;
> }
> }
> }
>
>
> It appears that for LARGE_NODE_CLASS the variable alloc_size should not be
> size, but something far less as we are not using vector heap, but rather
> calling malloc directly in the code below (and from discussions I read on
> this mailing list I think that these two are different - please let me know
> if I am wrong).
>
> So when allocate a large vector the garbage collector goes nuts trying to find
> all that space which is not going to be needed after all.

This is as intended, not a bug. The garbage collector does not "go nuts" -- it is doing a garbage collection that may release memory in advance of making a large allocation. The size of the current allocation request is used as part of the process of deciding when to satisfy an allocation by malloc (of a single large noda or a page) and when to first do a gc. It is essential to do this for large allocations as well to keep the memory footprint down and help reduce fragmentation.

The strategy for deciding when to allocate and when to gc is by necessity heuristic. It tries to keep overall memory footprint low but at the same time tries to adapt to usage so that gc happens less oftn once a pattern of using larger amounts of memory emerges. The current strategy seems quite robust across a large range of architactures, memory configurations, and applications.

That said, when I wrote the mamager I kept in mind that we might eventually want to try morre sophisticated schemes and/or allow some user control over the schemes used. It may be time to revisit this soon.

luke

>
> I made an experiment and replaced the line alloc_size=size with alloc_size=0.
>
> R compiled fine (both 2.4.0 and 2.3.1) and passed make check with no issues
> (it all printed OK).
>
> Furthermore, all allocVector calls completed in no time and my test case run
> very fast (22 seconds, as opposed to minutes).
>
> In addition, attach() was instantaneous which was wonderful.
>
> Could anyone with deeper knowledge of R internals comment on whether this
> makes any sense ?
>
> thank you very much !
>
> Vladimir Dergachev
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke@stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri Nov 03 22:19:30 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 07 Nov 2006 - 05:30:36 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.