Re: [R] regexpr help (match.length=0)

From: Joris Meys <jorismeys_at_gmail.com>
Date: Wed, 02 Jun 2010 02:28:16 +0200

Dear all,

It sounds as if regexp works according to the same rules as Perl, very nicely explained in:
http://blob.perl.org/books/beginning-perl/3145_Chap05.pdf

Yet, I couldn't help but wonder if there are also differences in behaviour. I couldn't find any yet, but there must be some. Anybody care to elaborate on this?

Cheers
Joris

On Wed, Jun 2, 2010 at 1:05 AM, Matt Shotwell <shotwelm_at_musc.edu> wrote:

> On Tue, 2010-06-01 at 16:43 -0400, Erik Iverson wrote:
> >
> > McGehee, Robert wrote:
> > > R-help,
> > > Sorry if this is more of a regex question than an R question. However,
> > > help would be appreciated on my use of the regexpr function.
> > >
> > > In the first example below, I ask for all characters (a-z) in 'abc123';
> > > regexpr returns a 3-character match beginning at the first character.
> > >
> > >> regexpr("[[:alpha:]]*", "abc123")
> > > [1] 1
> > > attr(,"match.length")
> > > [1] 3
> > >
> > > However, when the text is flipped regexpr, and I ask for a match of all
> > > characters in '123abc', regexpr returns a zero-character match
> beginning
> > > at the first character. Can someone explain what a zero length match
> > > means (i.e. why not return -1), and why the result isn't 4,
> > > match.length=3?
> >
> > It means it matches 0 characters, which is fine since you use *, which
> > means match 0 or more occurrences of the regex. It sounds like you want
> > + instead of *. Also see gregexpr.

>

> Also, regular expressions try to match as early as possible. That's why
> the match is at position one of length zero, and not at position four of
> length three.
>

> Matt Shotwell
> Graduate Student
> Division of Biostatistics and Epidemiology
> Medical University of South Carolina
>

> > >
> > >> regexpr("[[:alpha:]]*", "123abc")
> > > [1] 1
> > > attr(,"match.length")
> > > [1] 0
> > >
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
Joris.Meys_at_Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 02 Jun 2010 - 00:49:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Jun 2010 - 00:50:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive