Re: [R] Getting many substrings but only loading the original string one time.

From: Jonathan <jonsleepy_at_gmail.com>
Date: Mon, 11 Apr 2011 16:22:16 -0400

Duncan,

    That would appear to be exactly what I was looking for! I will follow up if I have trouble after implementing the script this'll be used in. I suppose I'd be wondering whether R is a reasonably fast language to use for this type of task (given the very large long string size, and the large number of substrings to fetch), ie is it much slower than C++, or in the same ballpark?

Thanks!
Jonathan

On Mon, Apr 11, 2011 at 4:14 PM, Duncan Murdoch <murdoch.duncan_at_gmail.com>wrote:

> On 11/04/2011 3:48 PM, Jonathan wrote:
>
>> Hi All,
>> I'm looking for a way to get many substrings from a longer string and
>> then stitch them together. But, since the longer string is really, really
>> long (like 250 MB long), I don't want to do this in a loop and load and
>> re-load the longer string many times. Does anybody have an idea?
>>
>> Maybe I could pass in two vectors (the first would have the starting
>> coordinates, and the second would have the stopping coordinates), so it
>> would be like a vectorized version of substr, where start and stop would
>> be
>> vector instead of single integers.
>>
>> Example (I'm reducing the size of the string for the example) of how this
>> might work:
>>
>> > longerString<- 'HelloThisIsMyLongerString"
>> > startVector<- c(2,6,4)
>> > stopVector<- c(4,10,5)
>>
>> > substrings<- vectorized_substr(longerString, startVector, stop Vector)
>> > longerString
>> [1] "ell" "ThisI" "lo"
>>
>
> Use substring(), not substr(). It is vectorized:
>
> > substring(longerString, startVector, stopVector)
> [1] "ell" "ThisI" "lo"
>
> It does this by replicating the longerString, but that doesn't mean actual
> copies are made: just multiple pointers to the same big one.
>
> Duncan Murdoch
>
> Then I'd like to concatenate them (there will be many of them)
>>
>> > result<- paste(longerString,collapse='')
>> > result
>> [1] "ellThisIlo"
>>
>> (perhaps the paste command as I've done it is the best way, but depending
>> on
>> how the substrings are reported there may be different ways). Thanks!
>>
>> Jonathan
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 11 Apr 2011 - 20:25:14 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 11 Apr 2011 - 21:00:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive