Re: [Rd] memory issues with new release (PR#9344)

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Mon 06 Nov 2006 - 23:22:20 GMT

"Derek Stephen Elmerick" <delmeric@gmail.com> writes:

> Peter,
>
> I ran the memory limit function you mention below and both versions provide
> the same result:
>
> >
> > memory.limit(size=4095)
> NULL
> > memory.limit(NA)
> [1] 4293918720
> >
> I do have 4GB ram on my PC. As a more reproducible form of the test, I
> have attached output that uses a randomly generated dataset after fixing the
> seed. Same result as last time: works with 2.3.0 and not 2.4.0. I guess the
> one caveat here is that I just increased the dataset size until I got the
> memory issue with at least one of the R versions. It's okay. No need to
> spend more time on this. I really don't mind using the previous version.
> Like you mentioned, probably just a function of the new version requiring
> more memory.

Hmm, you might want to take a final look at the Windows FAQ 2.9. I am still not quite convinced you're really getting more than the default 1.5 GB.

Also, how much can you increase the problem size on 2.3.0 before it breaks? If you can only go to say 39 or 40 variables, then there's probably not much we can do. If it is orders of magnitude, then we may have a real bug (or not: sometimes we fix bugs resulting from things not being duplicated when they should have been, the fixed code then uses more memory than the unfixed code.)  

> Thanks,
> Derek
>
>
>
> On 06 Nov 2006 21:42:04 +0100, Peter Dalgaard <p.dalgaard@biostat.ku.dk>
> wrote:
> >
> > "Derek Stephen Elmerick" <delmeric@gmail.com> writes:
> >
> > > Thanks for the replies. Point taken regarding submission protocol. I
> > have
> > > included a text file attachment that shows the R output with version
> > 2.3.0and
> > > 2.4.0. A label distinguishing the version is included in the comments.
> > >
> > > A quick background on the attached example. The dataset has 650,000
> > records
> > > and 32 variables. the response is dichotomous (0/1) and i ran a logistic
> > > model (i previously mentioned multinomial, but decided to start simple
> > for
> > > the example). Covariates in the model may be continuous or categorical,
> > but
> > > all are numeric. You'll notice that the code is the same for both
> > versions;
> > > however, there is a memory error with the 2.3.0 version. i ran this
> > several
> > > times and in different orders to make sure it was not some sort of
> > hardware
> > > issue.
> > >
> > > If there is some sort of additional output that would be helpful, I can
> > > provide as well. Or, if there is nothing I can do, that is fine also.
> >
> > I don't think it was ever possible to request 4GB on XP. The version
> > difference might be caused by different response to invalid input in
> > memory.limit(). What does memory.limit(NA) tell you after the call to
> > memory.limit(4095) in the two versions?
> >
> > If that is not the reason: What is the *real* restriction of memory on
> > your system? Do you actually have 4GB in your system (RAM+swap)?
> >
> > Your design matrix is on the order of 160 MB, so shouldn't be a
> > problem with a GB-sized workspace. However, three copies of it will
> > brush against 512 MB, and it's not unlikely to have that many copies
> > around.
> >
> >
> >
> > > -Derek
> > >
> > >
> > > On 11/6/06, Kasper Daniel Hansen < khansen@stat.berkeley.edu> wrote:
> > > >
> > > > It would be helpful to produce a script that reproduces the error on
> > > > your system. And include details on the size of your data set and
> > > > what you are doing with it. It is unclear what function is actually
> > > > causing the error and such. Really, in order to do something about it
> > > > you need to show how to actually obtain the error.
> > > >
> > > > To my knowledge nothing _major_ has happened with the memory
> > > > consumption, but of course R could use slightly more memory for
> > > > specific purposes.
> > > >
> > > > But chances are that this is not really memory related but more
> > > > related to the functions your are using - perhaps a bug or perhaps a
> > > > user error.
> > > >
> > > > Kasper
> > > >
> > > > On Nov 6, 2006, at 10:20 AM, Derek Stephen Elmerick wrote:
> > > >
> > > > > thanks for the friendly reply. i think my description was fairly
> > > > > clear: i
> > > > > import a large dataset and run a model. using the same dataset, the
> > > > > process worked previously and it doesn't work now. if the new
> > > > > version of R
> > > > > requires more memory and this compromises some basic data analyses,
> > > > > i would
> > > > > label this as a bug. if this memory issue was mentioned in the
> > > > > documentation, then i apologize. this email was clearly not well
> > > > > received,
> > > > > so if there is a more appropriate place to post these sort of

> > > > > questions,
> > > > > that would be helpful.
> > > > >
> > > > > -derek
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 06 Nov 2006 18:20:33 +0100, Peter Dalgaard
> > > > > < p.dalgaard@biostat.ku.dk>
> > > > > wrote:
> > > > >>
> > > > >> delmeric@gmail.com writes:
> > > > >>
> > > > >>> Full_Name: Derek Elmerick
> > > > >>> Version: 2.4.0
> > > > >>> OS: Windows XP
> > > > >>> Submission from: (NULL) ( 38.117.162.243 )
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> hello -
> > > > >>>
> > > > >>> i have some code that i run regularly using R version 2.3.x . the
> > > > >>> final
> > > > >> step of
> > > > >>> the code is to build a multinomial logit model. the dataset is
> > > > >>> large;
> > > > >> however, i
> > > > >>> have not had issues in the past. i just installed the 2.4.0
> > > > >>> version of R
> > > > >> and now
> > > > >>> have memory allocation issues. to verify, i ran the code again
> > > > >>> against
> > > > >> the 2.3
> > > > >>> version and no problems. since i have set the memory limit to the
> > > > >>> max
> > > > >> size, i
> > > > >>> have no alternative but to downgrade to the 2.3 version. thoughts?
> > > > >>
> > > > >> And what do you expect the maintainers to do about it? ( I.e. why
> > are
> > > > >> you filing a bug report.)
> > > > >>
> > > > >> You give absolutely no handle on what the cause of the problem
> > might
> > > > >> be, or even to reproduce it. It may be a bug, or maybe just R
> > > > >> requiring more memory to run than previously.
> > > > >>
> > > > >> --
> > > > >> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> > > > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> > > > >> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
> > > > >> 35327918
> > > > >> ~~~~~~~~~~ - ( p.dalgaard@biostat.ku.dk) FAX:
> > (+45)
> > > > >> 35327907
> > > > >>
> > > > >
> > > > > [[alternative HTML version deleted]]
> > > > >
> > > > > ______________________________________________
> > > > > R-devel@r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >
> > > >
> > >
> > >
> > >
> > > > ######
> > > > ### R 2.4.0
> > > > ######
> > > >
> > > > rm(list=ls(all=TRUE))
> > > > memory.limit(size=4095)
> > > NULL
> > > >
> > > > clnt=read.table
> > (file="K:\\all_data_reduced_vars.dat",header=T,sep="\t")
> > > >
> > > > chk.rsp=glm(formula = resp_chkonly ~ x1 + x2 + x3 + x4 +
> > > + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
> > > + x14 + x15 + x16 + x17 + x18 + x19 +x20 +
> > > + x21 + x22 +x23 + x24 + x25 + x26 +x27 +
> > > + x28 + x29 + x30 + x27*x29 + x28*x30, family = binomial,
> > > + data = clnt)
> > > Error: cannot allocate vector of size 167578 Kb
> > > >
> > > > dim(clnt)
> > > [1] 650000 32
> > > > sum(clnt)
> > > [1] 112671553493
> > > >
> > >
> > > ##################################################
> > > ##################################################
> > >
> > > > ######
> > > > ### R 2.3.0
> > > > ######
> > > >
> > > > rm(list=ls(all=TRUE))
> > > > memory.limit(size=4095)
> > > NULL
> > > >
> > > > clnt=read.table
> > (file="K:\\all_data_reduced_vars.dat",header=T,sep="\t")
> > > >
> > > > chk.rsp=glm(formula = resp_chkonly ~ x1 + x2 + x3 + x4 +
> > > + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
> > > + x14 + x15 + x16 + x17 + x18 + x19 +x20 +
> > > + x21 + x22 +x23 + x24 + x25 + x26 +x27 +
> > > + x28 + x29 + x30 + x27*x29 + x28*x30, family = binomial,
> > > + data = clnt)
> > > >
> > > > dim(clnt)
> > > [1] 650000 32
> > > > sum(clnt)
> > > [1] 112671553493
> > > >
> >
> > --
> > O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> > (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
> > 35327918
> > ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45)
> > 35327907
> >
>
> >
> > ######
> > ### R 2.4.0
> > ######
> >
> > rm(list=ls(all=TRUE))
> > memory.limit(size=4095)
> NULL
> > memory.limit(NA)
> [1] 4293918720
> >
> > set.seed(314159)
> > clnt=matrix(runif(650000*38),650000,38)
> > y=round(runif(650000,0,1))
> > clnt=data.frame(y,clnt)
> > attributes(clnt)$names=c("y","x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14",
> + "x15","x16","x17","x18","x19","x20","x21","x22","x23","x24","x25","x26","x27",
> + "x28","x29","x30","x31","x32","x33","x34","x35","x36","x37")
> > dim(clnt)
> [1] 650000 39
> > sum(clnt)
> [1] 12674827
> >
> >
> > chk.rsp=glm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 +
> + x15 + x16 + x17 + x18 + x19 + x20 + x21 + x22 + x23 + x24 + x25 + x26 + x27 +
> + x28 + x29 + x30 + x31 + x32 + x33 + x34 + x35 + x36 + x37, family = binomial, data = clnt)
> Error: cannot allocate vector of size 192968 Kb
> >
> >
>
> ##############################################################
> ##############################################################
> ##############################################################
>
> >
> > ######
> > ### R 2.3.0
> > ######
> >
> > rm(list=ls(all=TRUE))
> > memory.limit(size=4095)
> NULL
> > memory.limit(NA)
> [1] 4293918720
> >
> > set.seed(314159)
> > clnt=matrix(runif(650000*38),650000,38)
> > y=round(runif(650000,0,1))
> > clnt=data.frame(y,clnt)
> > attributes(clnt)$names=c("y","x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14",
> + "x15","x16","x17","x18","x19","x20","x21","x22","x23","x24","x25","x26","x27",
> + "x28","x29","x30","x31","x32","x33","x34","x35","x36","x37")
> > dim(clnt)
> [1] 650000 39
> > sum(clnt)
> [1] 12674827
> >
> >
> > chk.rsp=glm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 +
> + x15 + x16 + x17 + x18 + x19 + x20 + x21 + x22 + x23 + x24 + x25 + x26 + x27 +
> + x28 + x29 + x30 + x31 + x32 + x33 + x34 + x35 + x36 + x37, family = binomial, data = clnt)
> >
> >
> >

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue Nov 07 14:53:44 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 07 Nov 2006 - 05:30:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.