Re: [Rd] invert argument in grep

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 10 Nov 2006 - 11:28:47 GMT

On Fri, 10 Nov 2006, Duncan Murdoch wrote:

> On 11/9/2006 5:14 AM, Romain Francois wrote:
>> Hello,
>>
>> What about an `invert` argument in grep, to return elements that are
>> *not* matching a regular expression :
>>
>> R> grep("pink", colors(), invert = TRUE, value = TRUE)
>>
>> would essentially return the same as :
>>
>> R> colors() [ - grep("pink", colors()) ]

Note that grep("pat", x, value = TRUE) is not the same as x[grep("pat", x)], as the help page carefully points out. (I think it would be better if it were.)

>>
>> I'm attaching the files that I modified (against today's tarball) for
>> that purpose.

(BTW, sending whole files makes it difficult to see the changes and even harder to merge them; please use diffs. From a quick look the changes were very incomplete, as the internal functions were changed and there were no changed C files.)

> I think a more generally useful change would be to be able to return a
> logical vector with TRUE for a match and FALSE for a non-match, so a
> simple !grep(...) does the inversion. (This is motivated by the recent
> R-help discussion of the fact that x[-selection] doesn't always invert
> the selection when it's a vector of indices.)

I don't think that is pertinent here, as the indices are always a vector of positive integers.

> A way to do that without expanding the argument list would be to allow
>
> value="logical"
>
> as well as value=TRUE and value=FALSE.
>
> This would make boolean operations easy, e.g.
>
> colors()[grep("dark", colors(), value="logical")
> & !grep("blue", colors(), value="logical")]
>
> to select the colors that contain "dark" but not "blue". (In this case
> the RE to select that subset is rather simple because "dark" always
> precedes "blue", but if that wasn't true, it would be a lot messier.)

That might be worthwhile, but it is relatively simple to change positive integer indices to logical ones and v.v.

My personal take is that having 'value=TRUE' was already a complication not worth having, and implementing it at C level was an efficiency tweak not worth the maintenance effort (and also means that '[' methods are not dispatched).

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri Nov 10 23:19:51 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 10 Nov 2006 - 15:30:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.