Re: [R] regular expressions, sub

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Sat 28 Jan 2006 - 06:49:49 EST

In this post:

        http://finzi.psych.upenn.edu/R/Rhelp02a/archive/30590.html

Thomas Lumley provided a function to traverse a formula recursively. We can modify it as shown to transform ln(m)^n to ln^n(m) producing proc2. We then bundle everything up into proc3 which uses substitute to translate log to ln and remove (, the calls proc2 to do the aforementioned substitute and finally we use simple character processing to clean up the rest.

Although this is substantially longer in terms of lines of code we did not have to write many of them because proc2 is actually just a modification of the code in the indicated post and the character processing becomes extremely simple. Also its more powerful able to handle expressions like:

        log(D) ~ log(log(N)^2)^3

proc2 <-function(formula){

	process<-function(expr){
	    if (length(expr)==1)
	      return(expr)
	   if(length(expr)==2) {
	      expr[[2]] <- process(expr[[2]])
	      return(expr)
	   }
	   if ( expr[[1]]==as.name("^") && length(expr[[2]])==2 &&
		      expr[[2]][[1]] == as.name("ln") &&
		      class(idx <- expr[[3]]) == "numeric") {
		expr <- as.call(list(as.name(paste("ln",idx,sep = "^")),
		   expr[[2]][[2]]))
		expr[[2]] <- process(expr[[2]])
		return(expr)
	   }
	   expr[[2]]<-process(expr[[2]])
	   expr[[3]]<-process(expr[[3]])
	   return(expr)
	  }

   formula[[3]]<-process(formula[[3]])
   formula
}

proc3 <- function(f) {

	# replace log with ln
	result <- do.call("substitute", list(f, list(log = as.name("ln"))))

	# remove I
	result <- do.call("substitute", list(result, list(I = as.name("("))))

	# transform ln(m)^n to ln^n(m)
	result <- proc2(result)

	# now clean up using simple character substitutions
	result <- deparse(result)

	# ( -> space
	result <- gsub("[(]", " ", result)

	# remove " and )
	gsub("[\")]", "", result)

}

# tests

proc3( log(D) ~ log(N)+I(log(N)^2)+log(t) ) # "ln D ~ ln N + ln^2 N + ln t"

proc3( log(D) ~ log(log(N)^2)^3) # "ln D ~ ln^3 ln^2 N"

On 1/27/06, Christian Hoffmann <christian.hoffmann@wsl.ch> wrote:
> Hi,
>
> I am trying to use sub, regexpr on expressions like
>
> log(D) ~ log(N)+I(log(N)^2)+log(t)
>
> being a model specification.
>
> The aim is to produce:
>
> "ln D ~ ln N + ln^2 N + ln t"
>
> The variable names N, t may change, the number of terms too.
>
> I succeded only partially, help on regular expressions is hard to
> understand for me, examples on my case are rare. The help page on R-help
> for grep etc. and "regular expressions"
>
> What I am doing:
>
> (f <- log(D) ~ log(N)+I(log(N)^2)+log(t))
> (ft <- sub("","",f)) # creates string with parts of formula, how to do
> it simpler?
> (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one string
>
> Then I want to use \1 for backreferences something like
>
> (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln \\1^\\2",fu))
>
> to change "log(g)^7" to "ln^7 g",
>
> and to eliminate I(): sub("I(blabla)","\\1",fv) # I(xxx) -> xxx
>
> The special characters are making trouble, sub acceps "(", ")" only in
> pairs. Code for experimentation:
>
> trysub <- function(s,t,e) {
> ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in
> c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE))
> print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," ","
> ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes:
> ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3,
> useBytes=i4)),sep=""));invisible(0) }
>
> trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases
> 5,6,13..16, the rest unsubstituted
>
> trysub("log(","ln ",fu) # B: no substitutions; errors for
> cases 1..4,7.. 12 # typical errors:
> "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n"
>
> trysub("log\(","ln ",fu) # C: same as A
>
> trysub("log\\(","ln ",fu) # D: no substitutions; errors for
> cases 15,16 # typical errors:
> "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x,
> ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression
> 'log\\('\n"
>
> trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no substitutions, no errors
> # E: typical errors:
> "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression
> 'log\\(([:alpha:]+)\\)'\n"
>
>
>
> Thanks for help
> Christian
>
> PS. The explanations in the documents
> --
> Dr. Christian W. Hoffmann,
> Swiss Federal Research Institute WSL
> Mathematics + Statistical Computing
> Zuercherstrasse 111
> CH-8903 Birmensdorf, Switzerland
>
> Tel +41-44-7392-277 (office) -111(exchange)
> Fax +41-44-7392-215 (fax)
> christian.hoffmann@wsl.ch
> http://www.wsl.ch/staff/christian.hoffmann
>
> International Conference 5.-7.6.2006 Ekaterinburg Russia
> "Climate changes and their impact on boreal and temperate forests"
> http://ecoinf.uran.ru/conference/
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jan 28 07:09:01 2006

This archive was generated by hypermail 2.1.8 : Sat 28 Jan 2006 - 08:46:29 EST