[Rd] grep and PCRE fun

From: Jeffrey Horner <jeffrey.horner_at_gmail.com>
Date: Thu, 29 Sep 2011 16:00:42 -0500


I think I've found a bug in the C function do_grep located in src/main/grep.c. It seems to affect both the latest revisions of R-2-13-branch and trunk when compiling R without optimizations and with it's own version of pcre located in src/extra, at least on ubuntu 10.04.

 According to the pcre_exec API (I presume the later versions), the ovecsize argument must be a multiple of 3 , and the ovector argument must point to a location that can hold at least ovecsize integers. All the pcre_exec calls made by do_grep, save one, honors this. That one call seems to overwrite areas of the stack it shouldn't. Here's the smallest example I found that tickles the bug:

> grep("[^[:blank][:cntrl]]","\\n",perl=TRUE)
Error in grep("[^[:blank][:cntrl]]", "\\n", perl = TRUE) :   negative length vectors are not allowed

As described above, this error occurs on ubuntu 10.04 when R is compiled without optimizations ( I typically use CFLAGS="-ggdb" CXXFLAGS="-ggdb" FFLAGS="-ggdb" ./configure --enable-R-shlib), and the pcre_exec call executed from do_get overwrites the integer nmatches and sets it to -1. This has the effect of making do_grep try and allocate a results vector of length -1, which of course causes the error message above.

I'd be interested to know if this bug happens on other platforms.

Below is my simple fix for R-2-13-branch (a similar fix works for trunk as well).


$ svn diff main/grep.c
Index: main/grep.c

