Re: [R] Regular expressions: bug or misunderstanding?

From: Ted Harding <>
Date: Sun, 06 Jul 2008 22:37:13 +0100 (BST)

On 06-Jul-08 21:17:04, Duncan Murdoch wrote:
> I'm trying to write a gsub() call that takes a string and escapes all
> the unescaped quote marks in it. So the string
> \"
> would be left unchanged, but
> \\"
> would be changed to
> \\\"
> because the double backslash doesn't act as an escape for the quote,
> the first just escapes the second. I have the usual problems of
> writing regular expressions involving backslashes which make
> everything I write completely unreadable, so I'm going to change
> the problem for this post: I will define E to be the escape
> character, and q to be the quote; the gsub() call would leave
> Eq
> unchanged, but would change
> EEq
> to EEEq, etc.
> The expression I have come up with after this change is
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", x)
> i.e. "(start of line, or non-escape, followed by an even number of
> escapes), all of which we call expression 1, followed by a quote,
> is replaced by expression 1 followed by an escape and a quote".
> This works sometimes, but not always:
> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq")
> [1] "Eq"
> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq")
> [1] "EEEq"
> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq")
> [1] "EqaEq"
> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq")
> [1] "qEq"
> Notice that in the final example, the first quote doesn't get escaped.
> Why not????

I think (without having done the "experimental diagnostics") that it's because in "qq" the first q mtaches (^|[^E]) because it matches [^E] (i.e. is a "non-escape"); since it is followed by q, it is the second q which gets the escape. Possibly you need to include "^q" as an additional alternative match at the start of the line.


E-Mail: (Ted Harding) <> Fax-to-email: +44 (0)870 094 0861
Date: 06-Jul-08                                       Time: 22:37:10
------------------------------ XFMail ------------------------------

______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Sun 06 Jul 2008 - 21:41:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Jul 2008 - 00:31:42 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive