Re: [R] Re gular Expression help

From: Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no>
Date: Sat, 08 Nov 2008 22:21:45 +0100

Gabor Grothendieck wrote:
> Here are a few more solutions. x is the input vector
> of character strings.
>
> The first is a slightly shorter version of one of Wacek's.
> The next three all create an anonymous grouping variable
> (using sub, substr/gsub and strapply respectively)
> whose components are "p" and "q" and then tapply
> is used to separate out the corresponding components
> of x according to the grouping:
>
> sapply(c(p = "^[^pq]*p", q = "^[^pq]*q"), grep, x = x, value = TRUE)
>
> tapply(x, sub("^[^pq]*(.).*", "\\1", x), c)
>
> tapply(x, substr(gsub("[^pq]", "", x), 1, 1), c)
>
> library(gsubfn)
> tapply(x, strapply(x, "^[^pq]*(.)", simplify = c), c)
>

wow! cool stuff. if you're interested in comparing their efficiency, source the attached script.

vQ

generate = function(n, m)

        replicate(n, paste(sample(letters, m, replace=TRUE), collapse=""))

tests = list(

	wacek =
	function(data) {
		p = grep("^[^pq]*p", data)
		list(p=data[p], q=data[-p])
	},
	
	gabor1 =
	function(data) 
		sapply(c(p="^[^pq]*p", q="^[^pq]*q"), grep, x=data, value=TRUE),
		
	gabor2 =
	function(data)
		tapply(data, sub("^[^pq]*p(.).*", "\\1", data), c),
	
	gabor3 =
	function(data)
		tapply(data, substr(gsub("[^pq]", "", data), 1, 1), c),
	
	gabor4 =
	{ library(gsubfn); function(data)
		tapply(data, strapply(data, "^[^pq]*(.)", simplify=c), c) }
)         

data = generate(1000,10)
lapply(names(tests),

	function(name) {
		cat(name, ":\n", sep="")
		print(system.time(replicate(30,tests[[name]](data)))) } )

______________________________________________

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 08 Nov 2008 - 21:27:53 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 08 Nov 2008 - 22:30:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive