Re: [R] Editing Strings in R

From: Gabor Grothendieck <ggrothendieck_at_myway.com>
Date: Fri 30 Jul 2004 - 13:25:24 EST

Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:

>
> On Thu, 2004-07-29 at 21:08, Gabor Grothendieck wrote:
> > Bulutoglu Dursun A Civ AFIT/ENC <Dursun.Bulutoglu <at> afit.edu> writes:
> >
> > >
> > > I was wondering if there is a way of editting strings in R. I
> > > have a set of strings and each set is a row of numbers and paranthesis.
> > > For example the first row is:
> > > (0 2)(3 4)(7 9)(5 9)(1 5)
> > > and I have a thousand or so such rows. I was wondering how I
> > > could get the corresponding string obtained by adding 1 to all the
> > > numbers in the string above.
> >
> > First do the 1 character translations simultaneously using chartr and
> > then use gsub for the remaining one to two character translation:
> >
> > gsub("0","10",chartr("0123456789","1234567890","(0 2)(3 4)(7 9)(5 9)(1
5)"))
>
> Gabor,
>
> One problem: Multi-digit numbers in the source string:
>
> > gsub("0","10",chartr("0123456789","1234567890",
> "(10 99)(3 4)(7 9)(5 9)(1 5)"))
> [1] "(21 1010)(4 5)(8 10)(6 10)(2 6)"
>
> Note the first number "10" gets transformed to "21" and the "99" goes to
> "1010".
>
> I made a quick update to NewRow, which is not faster, but gets it to two
> lines, instead of three, and is a bit cleaner:
>
> NewRow <- function(x)
> {
> TempMat <- matrix(as.numeric(unlist(strsplit(x, "([\\(\\) ])"))),
> ncol = 3, byrow = TRUE) + 1
>
> paste("(", TempMat[, 2], " ", TempMat[, 3], ")", sep = "",
> collapse = "")
> }
>
> Note that with multi digit numbers, it gives a correct result:
>
> > NewRow("(10 99)(101 4)(7 9)(5 9)(1 5)")
> [1] "(11 100)(102 5)(8 10)(6 10)(2 6)"

The above assumes a particular pattern of parentheses, based on the poster's example, just as mine assumed one digit numbers based on the poster's example. Both our examples assume the numbers are non-negative integers.

The poster can advise us on which additional assumptions, if any, are allowable but, just in case, here is a one line solution that handles multi-digit numbers and does not assume a particular pattern of parentheses and spaces.

For a number, say 99, the gsub replaces it with ",99+1," and the inner paste adds c(" to the front and ") to the end making it a valid R expression which we then evaluate and finally paste back together using the outer paste:

R> line <- "(10 99)(101 4)(7 9)()((5 9)(1 5))" # test data

R> paste(eval(parse(text = paste('c("', gsub("([0-9]+)", '",\\1+1,"', line, ext = TRUE), '")', sep = ""))), collapse = "")

[1] "(11 100)(102 5)(8 10)()((6 10)(2 6))"



R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jul 30 13:31:57 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:55:22 EST