Re: [R] ideas about how to reduce RAM & improve speed in trying to use lapply(strsplit())

From: Ian Gow <iandgow_at_gmail.com>
Date: Sun, 29 May 2011 19:44:27 -0500


Not a new approach, but some benchmark data (the perl=TRUE speeds up Jim's suggestion):

> x <- c('18x.6','12x.9','302x.3')
> y <- rep(x,100000)
> system.time(temp <- unlist(lapply(strsplit(y,".",fixed=TRUE),function(x)
>x[1])))

   user system elapsed
  1.203 0.018 1.222
> system.time(temp2 <- gsub("^(.*?)\\..*$","\\1",y, perl=TRUE))

   user system elapsed
  0.176 0.001 0.176
> identical(temp2, temp)

[1] TRUE
> system.time(temp3 <- gsub("^(.*)\\..*", '\\1', y))

   user system elapsed
  0.292 0.001 0.291
> identical(temp3, temp)

[1] TRUE
> system.time(temp3 <- gsub("^(.*)\\..*", '\\1', y, perl=TRUE))

   user system elapsed
  0.160 0.001 0.161

On 5/29/11 7:40 PM, "jim holtman" <jholtman_at_gmail.com> wrote:

>Try this approach:
>
>> x <- c('18x.6','12x.9','302x.3')
>> gsub("^(.*)\\..*", '\\1', x)
>[1] "18x" "12x" "302x"
>
>
>On Sun, May 29, 2011 at 8:10 PM, Matthew Keller <mckellercran_at_gmail.com>
>wrote:
>> hi all,
>>
>> I'm full of questions today :). Thanks in advance for your help!
>>
>> Here's the problem:
>> x <- c('18x.6','12x.9','302x.3')
>>
>> I want to get a vector that is c('18x','12x','302x')
>>
>> This is easily done using this code:
>>
>> unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1]))
>>
>> So far so good. The problem is that x is a vector of length 132e6.
>> When I run the above code, it runs for > 30 minutes, and it takes > 23
>> Gb RAM (no kidding!).
>>
>> Does anyone have ideas about how to speed up the code above and (more
>> importantly) reduce the RAM footprint? I'd prefer not to change the
>> file on disk using, e.g., awk, but I will do that as a last resort.
>>
>> Best
>>
>> Matt
>>
>> --
>> Matthew C Keller
>> Asst. Professor of Psychology
>> University of Colorado at Boulder
>> www.matthewckeller.com
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>--
>Jim Holtman
>Data Munger Guru
>
>What is the problem that you are trying to solve?
>
>______________________________________________
>R-help_at_r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 30 May 2011 - 00:46:08 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 30 May 2011 - 06:20:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive