Re: [R] Re gular Expression help

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Sat, 08 Nov 2008 17:09:59 -0500

I suspect strapply is only relatively slow on short strings where it doesn't matter anyways since for long strings performance would likely be dominated by the underlying regexp operations. I know that users are using the package for very long strings since I once had to lift the 25,000 character limit since I had complaints about that. The expressiveness and brevity of strapply (it would be shortest if it were not for the length of the word simplify) offset any disadvantage in my view.

On Sat, Nov 8, 2008 at 5:02 PM, Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no> wrote:
> Gabor Grothendieck wrote:
>> For the problem at hand I think I would use your solution
>> which is both easily understood and fastest. On the
>> other hand the tapply based solutions are coordinate
>> free (i.e. no explicit mucking with indices) and readily
>> generalize to more than 2 groups -- just replace [^pq] with
>> [^pqr], say.
>>
>>
>
> for sure, mine was optimized towards the case, not towards generalizability.
> the gsubfn one is a loser, though.
>
> but the first one *is* easily generalizable, e.g.,
>
> letters = "pqrs"
> sapply(sprintf("^[^%s]*%s", letters, unlist(strsplit(letters,
> split=""))), grep, x=x, value=TRUE)
>
> while an order of magnitude faster than the tapply ones.
>
> vQ
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 08 Nov 2008 - 22:14:06 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 08 Nov 2008 - 23:30:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive