Re: [Rd] gc()$Vcells < 0 (PR#9345)

From: Vladimir Dergachev <vdergachev_at_rcgardis.com>
Date: Tue 07 Nov 2006 - 15:45:26 GMT

On Tuesday 07 November 2006 6:28 am, Prof Brian Ripley wrote:
> On Mon, 6 Nov 2006, Vladimir Dergachev wrote:
> > On Monday 06 November 2006 6:12 pm, dmaszle@mendelbio.com wrote:
> >> version.string Version 2.3.0 (2006-04-24)
> >>
> >>> x<-matrix(nrow=44000,ncol=48000)
> >>> y<-matrix(nrow=44000,ncol=48000)
> >>> z<-matrix(nrow=44000,ncol=48000)
> >>> gc()
> >>
> >> used (Mb) gc trigger (Mb) max used (Mb)
> >> Ncells 177801 9.5 407500 21.8 350000 18.7
> >> Vcells -1126881981 24170.6 NA 24173.4 NA 24170.6
> >
> > Happens to me with versions 2.40 and 2.3.1. The culprit is this line
> > in src/main/memory.c:
> >
> > INTEGER(value)[1] = R_VSize - VHEAP_FREE();
> >
> > Since the amount used is greater than 4G and INTEGER is 32bit long
> > (even on 64 bit machines) this returns (harmless) nonsense.

>

> That's not quite correct. The units here are Vcells (8 bytes), and
> integer() is signed, so this can happen only if more than 16Gb of heap is
> allocated.

I see - thank you for the explanation !

>

> We are aware that we begin to hit problems at 16Gb: it is for example the
> maximum size of an R vector. Those objects are logical and so about 7.8Gb
> each: their length as vectors is 98% of the maximum possible. However,
> the first time we discussed it we thought it would be about 5 years before
> those limits would become important -- I think three of those years have
> since passed.
>

> > The megabyte value nearby is correct and gc trigger and max used fields
> > are marked as NA already.
>
> and now 'used' is also marked as NA in 2.4.0 patched.

Great, thank you !

>

> This is only a reporting issue. When I first used R it reported only
> numbers, and I added the Mb as a more comprehensible figure (especially
> for Ncells). I think it would be sensible now to only report these
> figures in Mb or Gb (and also the reports for gcinfo(TRUE)).

Why not use KB ? This still preserves information about small allocations and raises the limit to 16 TB - surely at least 5 years off ! :)

Alternatively, doubles should be able to hold the entire number, but this would require changes to how information is displayed.

>

> The model behind the report actually pre-dates the GC change in 1.2.0.
> The 'Vcells' are nowadays the sum of all the allocations from VECSXPs
> (which include their headers), rather than the 'vector heap' (although
> some of the earlier terminology persists).

I see.

           thank you !

                  Vladimir Dergachev

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Nov 08 02:49:41 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 08 Nov 2006 - 15:30:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.