Re: [Rd] allocVector bug ?

From: Luke Tierney <luke_at_stat.uiowa.edu>
Date: Tue 14 Nov 2006 - 18:57:16 GMT

I have made the change to the threshold calculation (R_VSize instead of R_NSize for the vector heap) in R-patched and R-devel. Seems to have negligible impact on the standard tests and VR scripts.

Best,

luke

On Thu, 9 Nov 2006, Vladimir Dergachev wrote:

> On Thursday 09 November 2006 12:21 pm, Luke Tierney wrote:

>> On Wed, 8 Nov 2006, Vladimir Dergachev wrote:
>>> On Wednesday 08 November 2006 12:56 pm, Luke Tierney wrote:
>>>> On Mon, 6 Nov 2006, Vladimir Dergachev wrote:
>>>
>>> Hi Luke,
>>>
>>> Yes, I gladly concede the point that for a heuristic algorithm the
>>> notion of what is a "bug" is murky (besides crashes, etc, which is not
>>> what I am not talking about).
>>>
>>> Here is why I called this a bug:
>>>
>>> 1. My understanding is that each time gc() needs to increase memory
>>> it performs a full garbage collection run. Right ?
>>
>> The allocation process does not call gc before every call to malloc.
>> It only calls gc if the allocation would cross a threshold level.
>> Those theshold levels are adjusted in an effort to compromise between
>> keeping memory footprint low and not calling gc too often. The code
>> you quote below is part of this adjustment process. If this process
>> is working properly then as memory use grows there will initially be
>> more gc activity and then less as the thresholds adjust.
>
> Well, I was seeing it call gc for every large vector. This probably happens be
> only for those larger  than R_VGrowIncrFrac * R_NSize. On my system R_NSize
> is never more than 1e6 so this would explain the problems when using 1e6 (and
> larger) vectors.
>

>>
>>> 2. This is not a problem with small memory sizes as they imply
>>> (presumably) small number of objects.
>>>
>>> 3. However, if one wants to allocate many objects (say columns in a
>>> data frame or just vectors) this results in large penalty
>>>
>>> Example 1: This simulates allocation of a data.frame with some character
>>> columns which are assumed to be factors. On my system first assignment is
>>> nearly instantaneous, why subsequent assignments take slightly less than
>>> 0.1 seconds each.
>>
>> I'm not sure these are quite doing what you intend. You define Chars
>> but don't use it. Also, system.time by default calls gc() before
>> doing the evaluation. Giving FALSE as the second argument may give you
>> a more realistic picture.
>
> The Chars are defined to create lots of ncells and make gc() run time more
> realistic. It also mimics having a data.frame with a few factor columns.
>
> As for system.time - thank you, I missed that !
> Setting gcFirst=FALSE changes behavior in the first example to be 2 times
> faster and makes all the allocations in the second example faster.
>
> I guess that extra call to gc() caused R_VSize to shrink too fast.
>

>>> I looked more carefully at your code in src/main/memory.c, function
>>> AdjustHeapSize:
>>>
>>> R_VSize = VNeeded;
>>> if (vect_occup > R_VGrowFrac) {
>>> R_size_t change = R_VGrowIncrMin + R_VGrowIncrFrac * R_NSize;
>>> if (R_MaxVSize - R_VSize >= change)
>>> R_VSize += change;
>>> }
>>>
>>> Could it be that R_NSize should be R_VSize ? This would explain why I see
>>> a problem in case R_VSize>>R_NSize.
>>
>> That does indeed look like a bug and that R_NSize should be R_VSize --
>> well spotted, thanks. I will need to experiment with this a bit more
>> to see if it can safely be changed. It will increase the memory
>> footprint a bit. Probaly not by enough to matter but if it does we
>> may need to adjust some of the tuning constants.
>>
>
> Would there be something I can help you with ? Is there a script to run
> through common usage patterns ?
>
>                          thank you !
>
>                                  Vladimir Dergachev
>
>

>> Best,
>>
>> luke
>>

>
>
-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke@stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Nov 15 06:00:32 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 15 Nov 2006 - 01:30:42 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.