Re: [R] Parsing regular expressions differently - feature request

From: Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no>
Date: Sat, 08 Nov 2008 23:43:58 +0100

Duncan Murdoch wrote:
>
>>>>>> I was wondering if that is really necessary for perl=TRUE? wouldn't
>>>>>> it be
>>>>>> possible to parse a string differently in a regex context, e.g.
>>>>>> automatically insert \\ for each \ , such that you can use the perl
>>>>>> syntax
>>>>>> directly? For example, if you want to input a newline as a
>>>>>> character, you
>>>>>> would use \n anyway. At the moment one says \\n to make it clear to
>>>>>> R that
>>>>>> you mean \n to make clear that you mean newline... this is pretty
>>>>>> annoying.
>>>>>> How likely is it that you want to pass a real newline character to
>>>>>> PCRE
>>>>>> directly?
>>>>> No, that's not possible. At the level where the parsing takes place
>>>>> R has
>>>>> no idea of its eventual use, so it can't tell that some strings are
>>>>> going to
>>>>> be interpreted as Perl, and others not.
>> Here's a quick hack to achieve the impossible:
>
> That might solve John's problem, but I doubt it. As far as I can see
> it won't handle \L, for example.
>

well, it was not supposed to. it addresses the need for doubling backslashes when a backslash character is an element of the regex.

foo = "foo\\n\n"

grep("\n", foo, perl=TRUE, value=TRUE)
mygrep("\n", foo, perl=TRUE, value=TRUE) # both match the newline

grep("\\n", foo, perl=TRUE, value=TRUE)
mygrep("\\n", foo, perl=TRUE, value=TRUE) # both match (guess what)

bar = "bar\n"

grep("\n", bar, perl=TRUE, value=TRUE)
mygrep("\n", bar, perl=TRUE, value=TRUE) # both match the newline

grep("\\n", bar, perl=TRUE, value=TRUE)
mygrep("\\n", bar, perl=TRUE, value=TRUE) # counterintuitively, grep matches (intuitively, it should match backslash-n, not a newline, but there's just a newline in bar) -- i do know why it matches, but i'm pretty sure for many of those who do it's an inconvenient detail, and for those who don't it's a confusing annoyance

zee = "zee\\"

grep("\\", zee, perl=TRUE, value=TRUE)
mygrep("\\", zee, perl=TRUE, value=TRUE) # grep fails, needs "\\\\"

conclusion? i'd opt for mygrep in my own code; i guessed this was what john wanted, therefore the post.

vQ



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 08 Nov 2008 - 22:45:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 09 Nov 2008 - 00:30:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive