Re: [Rd] memory issues with new release (PR#9344)

From: Derek Stephen Elmerick <delmeric_at_gmail.com>
Date: Mon 06 Nov 2006 - 22:36:29 GMT

Peter,

I ran the memory limit function you mention below and both versions provide the same result:

>
> memory.limit(size=4095)

NULL
> memory.limit(NA)

[1] 4293918720
>

I do have 4GB ram on my PC. As a more reproducible form of the test, I have attached output that uses a randomly generated dataset after fixing the seed. Same result as last time: works with 2.3.0 and not 2.4.0. I guess the one caveat here is that I just increased the dataset size until I got the memory issue with at least one of the R versions. It's okay. No need to spend more time on this. I really don't mind using the previous version. Like you mentioned, probably just a function of the new version requiring more memory.

Thanks,
Derek

On 06 Nov 2006 21:42:04 +0100, Peter Dalgaard <p.dalgaard@biostat.ku.dk> wrote:
>
> "Derek Stephen Elmerick" <delmeric@gmail.com> writes:
>
> > Thanks for the replies. Point taken regarding submission protocol. I
> have
> > included a text file attachment that shows the R output with version
> 2.3.0and
> > 2.4.0. A label distinguishing the version is included in the comments.
> >
> > A quick background on the attached example. The dataset has 650,000
> records
> > and 32 variables. the response is dichotomous (0/1) and i ran a logistic
> > model (i previously mentioned multinomial, but decided to start simple
> for
> > the example). Covariates in the model may be continuous or categorical,
> but
> > all are numeric. You'll notice that the code is the same for both
> versions;
> > however, there is a memory error with the 2.3.0 version. i ran this
> several
> > times and in different orders to make sure it was not some sort of
> hardware
> > issue.
> >
> > If there is some sort of additional output that would be helpful, I can
> > provide as well. Or, if there is nothing I can do, that is fine also.
>
> I don't think it was ever possible to request 4GB on XP. The version
> difference might be caused by different response to invalid input in
> memory.limit(). What does memory.limit(NA) tell you after the call to
> memory.limit(4095) in the two versions?
>
> If that is not the reason: What is the *real* restriction of memory on
> your system? Do you actually have 4GB in your system (RAM+swap)?
>
> Your design matrix is on the order of 160 MB, so shouldn't be a
> problem with a GB-sized workspace. However, three copies of it will
> brush against 512 MB, and it's not unlikely to have that many copies
> around.
>
>
>
> > -Derek
> >
> >
> > On 11/6/06, Kasper Daniel Hansen < khansen@stat.berkeley.edu> wrote:
> > >
> > > It would be helpful to produce a script that reproduces the error on
> > > your system. And include details on the size of your data set and
> > > what you are doing with it. It is unclear what function is actually
> > > causing the error and such. Really, in order to do something about it
> > > you need to show how to actually obtain the error.
> > >
> > > To my knowledge nothing _major_ has happened with the memory
> > > consumption, but of course R could use slightly more memory for
> > > specific purposes.
> > >
> > > But chances are that this is not really memory related but more
> > > related to the functions your are using - perhaps a bug or perhaps a
> > > user error.
> > >
> > > Kasper
> > >
> > > On Nov 6, 2006, at 10:20 AM, Derek Stephen Elmerick wrote:
> > >
> > > > thanks for the friendly reply. i think my description was fairly
> > > > clear: i
> > > > import a large dataset and run a model. using the same dataset, the
> > > > process worked previously and it doesn't work now. if the new
> > > > version of R
> > > > requires more memory and this compromises some basic data analyses,
> > > > i would
> > > > label this as a bug. if this memory issue was mentioned in the
> > > > documentation, then i apologize. this email was clearly not well
> > > > received,
> > > > so if there is a more appropriate place to post these sort of
> > > > questions,
> > > > that would be helpful.
> > > >
> > > > -derek
> > > >
> > > >
> > > >
> > > >
> > > > On 06 Nov 2006 18:20:33 +0100, Peter Dalgaard
> > > > < p.dalgaard@biostat.ku.dk>
> > > > wrote:
> > > >>
> > > >> delmeric@gmail.com writes:
> > > >>
> > > >>> Full_Name: Derek Elmerick
> > > >>> Version: 2.4.0
> > > >>> OS: Windows XP
> > > >>> Submission from: (NULL) ( 38.117.162.243 )
> > > >>>
> > > >>>
> > > >>>
> > > >>> hello -
> > > >>>
> > > >>> i have some code that i run regularly using R version 2.3.x . the
> > > >>> final
> > > >> step of
> > > >>> the code is to build a multinomial logit model. the dataset is
> > > >>> large;
> > > >> however, i
> > > >>> have not had issues in the past. i just installed the 2.4.0
> > > >>> version of R
> > > >> and now
> > > >>> have memory allocation issues. to verify, i ran the code again
> > > >>> against
> > > >> the 2.3
> > > >>> version and no problems. since i have set the memory limit to the
> > > >>> max
> > > >> size, i
> > > >>> have no alternative but to downgrade to the 2.3 version. thoughts?
> > > >>
> > > >> And what do you expect the maintainers to do about it? ( I.e. why
> are
> > > >> you filing a bug report.)
> > > >>
> > > >> You give absolutely no handle on what the cause of the problem
> might
> > > >> be, or even to reproduce it. It may be a bug, or maybe just R
> > > >> requiring more memory to run than previously.
> > > >>
> > > >> --
> > > >> O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
> > > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> > > >> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
> > > >> 35327918
> > > >> ~~~~~~~~~~ - ( p.dalgaard@biostat.ku.dk) FAX:
> (+45)
> > > >> 35327907
> > > >>
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-devel@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > >
> >
> >
> >
> > > ######
> > > ### R 2.4.0
> > > ######
> > >
> > > rm(list=ls(all=TRUE))
> > > memory.limit(size=4095)
> > NULL
> > >
> > > clnt=read.table
> (file="K:\\all_data_reduced_vars.dat",header=T,sep="\t")
> > >
> > > chk.rsp=glm(formula = resp_chkonly ~ x1 + x2 + x3 + x4 +
> > + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
> > + x14 + x15 + x16 + x17 + x18 + x19 +x20 +
> > + x21 + x22 +x23 + x24 + x25 + x26 +x27 +
> > + x28 + x29 + x30 + x27*x29 + x28*x30, family = binomial,
> > + data = clnt)
> > Error: cannot allocate vector of size 167578 Kb
> > >
> > > dim(clnt)
> > [1] 650000 32
> > > sum(clnt)
> > [1] 112671553493
> > >
> >
> > ##################################################
> > ##################################################
> >
> > > ######
> > > ### R 2.3.0
> > > ######
> > >
> > > rm(list=ls(all=TRUE))
> > > memory.limit(size=4095)
> > NULL
> > >
> > > clnt=read.table
> (file="K:\\all_data_reduced_vars.dat",header=T,sep="\t")
> > >
> > > chk.rsp=glm(formula = resp_chkonly ~ x1 + x2 + x3 + x4 +
> > + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
> > + x14 + x15 + x16 + x17 + x18 + x19 +x20 +
> > + x21 + x22 +x23 + x24 + x25 + x26 +x27 +
> > + x28 + x29 + x30 + x27*x29 + x28*x30, family = binomial,
> > + data = clnt)
> > >
> > > dim(clnt)
> > [1] 650000 32
> > > sum(clnt)
> > [1] 112671553493
> > >
>
> --
> O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
> 35327918
> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45)
> 35327907
>



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue Nov 07 16:01:02 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 07 Nov 2006 - 07:37:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.