Re: [Rd] memory issues with new release (PR#9344)

From: Pfaff, Bernhard Dr. <Bernhard_Pfaff_at_fra.invesco.com>
Date: Tue 07 Nov 2006 - 08:56:31 GMT


>> spend more time on this. I really don't mind using the
>previous version.

Hello Derek,

or upgrade to R 2.5.0dev; the execution of your code snippet is not hampered by memory issues:

> sessionInfo()

R version 2.5.0 Under development (unstable) (2006-10-10 r39600) i386-pc-mingw32

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY= German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] "methods" "stats" "graphics" "grDevices" "datasets" "utils"

[7] "base"

other attached packages:
fortunes
 "1.3-2"
>

My output with respect to memory.limit(NA) is the same as yours.

Best,
Bernhard

>> Like you mentioned, probably just a function of the new
>version requiring
>> more memory.
>
>
>Hmm, you might want to take a final look at the Windows FAQ 2.9. I am
>still not quite convinced you're really getting more than the default
>1.5 GB.
>
>Also, how much can you increase the problem size on 2.3.0 before it
>breaks? If you can only go to say 39 or 40 variables, then there's
>probably not much we can do. If it is orders of magnitude, then we may
>have a real bug (or not: sometimes we fix bugs resulting from things
>not being duplicated when they should have been, the fixed code then
>uses more memory than the unfixed code.)
>
>=20
>> Thanks,
>> Derek
>>=20
>>=20
>>=20
>> On 06 Nov 2006 21:42:04 +0100, Peter Dalgaard
><p.dalgaard@biostat.ku.dk>
>> wrote:
>> >
>> > "Derek Stephen Elmerick" <delmeric@gmail.com> writes:
>> >
>> > > Thanks for the replies. Point taken regarding submission
>protocol. I
>> > have
>> > > included a text file attachment that shows the R output
>with version
>> > 2.3.0and
>> > > 2.4.0. A label distinguishing the version is included in
>the comments.
>> > >
>> > > A quick background on the attached example. The dataset
>has 650,000
>> > records
>> > > and 32 variables. the response is dichotomous (0/1) and
>i ran a logis=
>tic
>> > > model (i previously mentioned multinomial, but decided
>to start simple
>> > for
>> > > the example). Covariates in the model may be continuous
>or categorica=
>l,
>> > but
>> > > all are numeric. You'll notice that the code is the same for both
>> > versions;
>> > > however, there is a memory error with the 2.3.0 version.
>i ran this
>> > several
>> > > times and in different orders to make sure it was not
>some sort of
>> > hardware
>> > > issue.
>> > >
>> > > If there is some sort of additional output that would be
>helpful, I c=
>an
>> > > provide as well. Or, if there is nothing I can do, that
>is fine also.
>> >
>> > I don't think it was ever possible to request 4GB on XP.
>The version
>> > difference might be caused by different response to
>invalid input in
>> > memory.limit(). What does memory.limit(NA) tell you after
>the call to
>> > memory.limit(4095) in the two versions?
>> >
>> > If that is not the reason: What is the *real* restriction
>of memory on
>> > your system? Do you actually have 4GB in your system (RAM+swap)?
>> >
>> > Your design matrix is on the order of 160 MB, so shouldn't be a
>> > problem with a GB-sized workspace. However, three copies of it will
>> > brush against 512 MB, and it's not unlikely to have that
>many copies
>> > around.
>> >
>> >
>> >
>> > > -Derek
>> > >
>> > >
>> > > On 11/6/06, Kasper Daniel Hansen <
>khansen@stat.berkeley.edu> wrote:
>> > > >
>> > > > It would be helpful to produce a script that
>reproduces the error on
>> > > > your system. And include details on the size of your
>data set and
>> > > > what you are doing with it. It is unclear what
>function is actually
>> > > > causing the error and such. Really, in order to do
>something about =
>it
>> > > > you need to show how to actually obtain the error.
>> > > >
>> > > > To my knowledge nothing _major_ has happened with the memory
>> > > > consumption, but of course R could use slightly more memory for
>> > > > specific purposes.
>> > > >
>> > > > But chances are that this is not really memory related but more
>> > > > related to the functions your are using - perhaps a
>bug or perhaps a
>> > > > user error.
>> > > >
>> > > > Kasper
>> > > >
>> > > > On Nov 6, 2006, at 10:20 AM, Derek Stephen Elmerick wrote:
>> > > >
>> > > > > thanks for the friendly reply. i think my
>description was fairly
>> > > > > clear: i
>> > > > > import a large dataset and run a model. using the
>same dataset, t=
>he
>> > > > > process worked previously and it doesn't work now. if the new
>> > > > > version of R
>> > > > > requires more memory and this compromises some basic
>data analyse=
>s,
>> > > > > i would
>> > > > > label this as a bug. if this memory issue was
>mentioned in the
>> > > > > documentation, then i apologize. this email was
>clearly not well
>> > > > > received,
>> > > > > so if there is a more appropriate place to post these sort of
>> > > > > questions,
>> > > > > that would be helpful.
>> > > > >
>> > > > > -derek
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On 06 Nov 2006 18:20:33 +0100, Peter Dalgaard
>> > > > > < p.dalgaard@biostat.ku.dk>
>> > > > > wrote:
>> > > > >>
>> > > > >> delmeric@gmail.com writes:
>> > > > >>
>> > > > >>> Full_Name: Derek Elmerick
>> > > > >>> Version: 2.4.0
>> > > > >>> OS: Windows XP
>> > > > >>> Submission from: (NULL) ( 38.117.162.243 )
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> hello -
>> > > > >>>
>> > > > >>> i have some code that i run regularly using R
>version 2.3.x . t=
>he
>> > > > >>> final
>> > > > >> step of
>> > > > >>> the code is to build a multinomial logit model.
>the dataset is
>> > > > >>> large;
>> > > > >> however, i
>> > > > >>> have not had issues in the past. i just installed the 2.4.0
>> > > > >>> version of R
>> > > > >> and now
>> > > > >>> have memory allocation issues. to verify, i ran
>the code again
>> > > > >>> against
>> > > > >> the 2.3
>> > > > >>> version and no problems. since i have set the
>memory limit to t=
>he
>> > > > >>> max
>> > > > >> size, i
>> > > > >>> have no alternative but to downgrade to the 2.3
>version. though=
>ts?
>> > > > >>
>> > > > >> And what do you expect the maintainers to do about
>it? ( I.e. why
>> > are
>> > > > >> you filing a bug report.)
>> > > > >>
>> > > > >> You give absolutely no handle on what the cause of
>the problem
>> > might
>> > > > >> be, or even to reproduce it. It may be a bug, or
>maybe just R
>> > > > >> requiring more memory to run than previously.
>> > > > >>
>> > > > >> --
>> > > > >> O__ ---- Peter Dalgaard =C3=98ster
>Farimagsgade 5=
>, Entr.B
>> > > > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099,
>1014 Cph. K
>> > > > >> (*) \(*) -- University of Copenhagen Denmark
> Ph: (+4=
>5)
>> > > > >> 35327918
>> > > > >> ~~~~~~~~~~ - ( p.dalgaard@biostat.ku.dk)
> FAX:
>> > (+45)
>> > > > >> 35327907
>> > > > >>
>> > > > >
>> > > > > [[alternative HTML version deleted]]
>> > > > >
>> > > > > ______________________________________________
>> > > > > R-devel@r-project.org mailing list
>> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> > > >
>> > > >
>> > >
>> > >
>> > >
>> > > > ######
>> > > > ### R 2.4.0
>> > > > ######
>> > > >
>> > > > rm(list=3Dls(all=3DTRUE))
>> > > > memory.limit(size=3D4095)
>> > > NULL
>> > > >
>> > > > clnt=3Dread.table
>> > (file=3D"K:\\all_data_reduced_vars.dat",header=3DT,sep=3D"\t")
>> > > >
>> > > > chk.rsp=3Dglm(formula =3D resp_chkonly ~ x1 + x2 + x3 + x4 +
>> > > + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
>> > > + x14 + x15 + x16 + x17 + x18 + x19 +x20 +
>> > > + x21 + x22 +x23 + x24 + x25 + x26 +x27 +
>> > > + x28 + x29 + x30 + x27*x29 + x28*x30, family =3D binomial,
>> > > + data =3D clnt)
>> > > Error: cannot allocate vector of size 167578 Kb
>> > > >
>> > > > dim(clnt)
>> > > [1] 650000 32
>> > > > sum(clnt)
>> > > [1] 112671553493
>> > > >
>> > >
>> > > ##################################################
>> > > ##################################################
>> > >
>> > > > ######
>> > > > ### R 2.3.0
>> > > > ######
>> > > >
>> > > > rm(list=3Dls(all=3DTRUE))
>> > > > memory.limit(size=3D4095)
>> > > NULL
>> > > >
>> > > > clnt=3Dread.table
>> > (file=3D"K:\\all_data_reduced_vars.dat",header=3DT,sep=3D"\t")
>> > > >
>> > > > chk.rsp=3Dglm(formula =3D resp_chkonly ~ x1 + x2 + x3 + x4 +
>> > > + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
>> > > + x14 + x15 + x16 + x17 + x18 + x19 +x20 +
>> > > + x21 + x22 +x23 + x24 + x25 + x26 +x27 +
>> > > + x28 + x29 + x30 + x27*x29 + x28*x30, family =3D binomial,
>> > > + data =3D clnt)
>> > > >
>> > > > dim(clnt)
>> > > [1] 650000 32
>> > > > sum(clnt)
>> > > [1] 112671553493
>> > > >
>> >
>> > --
>> > O__ ---- Peter Dalgaard =C3=98ster
>Farimagsgade 5, Entr.B
>> > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>> > (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
>> > 35327918
>> > ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45)
>> > 35327907
>> >
>>=20
>> >=20
>> > ######
>> > ### R 2.4.0
>> > ######
>> >=20
>> > rm(list=3Dls(all=3DTRUE))
>> > memory.limit(size=3D4095)
>> NULL
>> > memory.limit(NA)
>> [1] 4293918720
>> >=20
>> > set.seed(314159)
>> > clnt=3Dmatrix(runif(650000*38),650000,38)
>> > y=3Dround(runif(650000,0,1))
>> > clnt=3Ddata.frame(y,clnt)
>> >
>attributes(clnt)$names=3Dc("y","x1","x2","x3","x4","x5","x6","x
>7","x8",=
>"x9","x10","x11","x12","x13","x14",
>> +
>"x15","x16","x17","x18","x19","x20","x21","x22"=
>,"x23","x24","x25","x26","x27",
>> +
>"x28","x29","x30","x31","x32","x33","x34","x35"=
>,"x36","x37")
>> > dim(clnt)
>> [1] 650000 39
>> > sum(clnt)
>> [1] 12674827
>> >=20
>> >=20
>> > chk.rsp=3Dglm(formula =3D y ~ x1 + x2 + x3 + x4 + x5 + x6
>+ x7 + x8 + x=
>9 + x10 + x11 + x12 + x13 + x14 +=20
>> + x15 + x16 + x17 + x18 + x19 +
>x20 + x21 + x22=
> + x23 + x24 + x25 + x26 + x27 +=20
>> + x28 + x29 + x30 + x31 + x32 +
>x33 + x34 + x35=
> + x36 + x37, family =3D binomial, data =3D clnt)
>> Error: cannot allocate vector of size 192968 Kb
>> >=20
>> >=20
>>=20
>> ##############################################################
>> ##############################################################
>> ##############################################################
>>=20
>> >=20
>> > ######
>> > ### R 2.3.0
>> > ######
>> >=20
>> > rm(list=3Dls(all=3DTRUE))
>> > memory.limit(size=3D4095)
>> NULL
>> > memory.limit(NA)
>> [1] 4293918720
>> >=20
>> > set.seed(314159)
>> > clnt=3Dmatrix(runif(650000*38),650000,38)
>> > y=3Dround(runif(650000,0,1))
>> > clnt=3Ddata.frame(y,clnt)
>> >
>attributes(clnt)$names=3Dc("y","x1","x2","x3","x4","x5","x6","x
>7","x8",=
>"x9","x10","x11","x12","x13","x14",
>> +
>"x15","x16","x17","x18","x19","x20","x21","x22"=
>,"x23","x24","x25","x26","x27",
>> +
>"x28","x29","x30","x31","x32","x33","x34","x35"=

>,"x36","x37")
>> > dim(clnt)
>> [1] 650000 39
>> > sum(clnt)
>> [1] 12674827
>> >=20
>> >=20
>> > chk.rsp=3Dglm(formula =3D y ~ x1 + x2 + x3 + x4 + x5 + x6
>+ x7 + x8 + x=
>9 + x10 + x11 + x12 + x13 + x14 +=20
>> + x15 + x16 + x17 + x18 + x19 +
>x20 + x21 + x22=
> + x23 + x24 + x25 + x26 + x27 +=20
>> + x28 + x29 + x30 + x31 + x32 +
>x33 + x34 + x35=
> + x36 + x37, family =3D binomial, data =3D clnt)
>> >=20
>> >=20
>> >=20
>
>--=20
> O__ ---- Peter Dalgaard =C3=98ster
>Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph:
>(+45) 35327918
>~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX:
>(+45) 35327907
>
>______________________________________________
>R-devel@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
>



Confidentiality Note: The information contained in this mess...{{dropped}}

R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed Nov 08 01:19:24 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 07 Nov 2006 - 14:30:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.