Re: [R] --max-vsize and --max-nsize linux?

From: Christian Schulz <ozric_at_web.de>
Date: Tue 20 Jul 2004 - 23:40:58 EST

Many thanks for clear me up the
vectorized approach what's indeed the advantage of R.

regards, christian

Am Dienstag, 20. Juli 2004 15:24 schrieb Marc Schwartz:
> On Tue, 2004-07-20 at 07:55, Christian Schulz wrote:
> > Hi,
> >
> > somtimes i have trivial recodings like this:
> > > dim(tt)
> >
> > [1] 252382 98
> >
> > system.time(for(i in 2:length(tt)){
> > tt[,i][is.na(tt[,i])] <- 0
> > })
> >
> > ...and a win2000(XP2000+,1GB) machine makes it in several minutes, but
> > my linux notebook (XP2.6GHZ,512MB) don't get success after some hours.
> >
> > I recognize that the cpu load is most time relative small, but the
> > hardisk have a lot of work.
> >
> > Is this a problem of --max-vsize and --max-nsize and i should play with
> > that, because i can't believe that the difference of RAM is the reason?
> >
> > Have anybody experience what is an "optimal" setting with i.e.
> > 512 MB RAM in Linux?
> >
> > Many thanks for help and comments
> > regards,christian

>

> Christian,
>

> I am unclear as to the nature of your loop above.
>

> Note that:
> > length(tt)
>

> [1] 24733436
>

> which is 252382 * 98. Your looping approach is not efficient and
> incorrect.
>

> Note that when trying to run your loop 'as is', I get:
> > system.time(for(i in 2:length(tt)){
>

> + tt[,i][is.na(tt[,i])] <- 0
> + })
> Error: subscript out of bounds
> Timing stopped at: 3.54 1.81 5.5 0 0
>

> This is because 'i' eventually exceeds the number of columns (98) in
> 'tt', since you have 'i' going from 2 to 24733436.
>
>

> I am presuming that you simply want to set any 'NA' values in 'tt' to 0?
>

> Take note of using a vectorized approach:
>
>

> tt <- matrix(sample(c(1:10, NA), 252382 * 98, replace = TRUE),
> ncol = 98)
>

> > dim(tt)
>

> [1] 252382 98
>

> > table(is.na(tt))
>

> FALSE TRUE
> 22484834 2248602
>

> Now use:
> > system.time(tt[is.na(tt)] <- 0)
>

> [1] 1.56 0.73 2.42 0.00 0.00
>

> > table(is.na(tt))
>

> FALSE
> 24733436
>
>

> This is on a 3.2 Ghz system with 2 Gb of RAM.
>

> However, this is not a memory issue, it is an inefficient use of loops.
>

> HTH,

>
> Marc Schwartz


R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jul 20 23:48:12 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:55:07 EST