Re: [R] parsing - input buffer overflow

From: Prof Brian Ripley <>
Date: Fri, 13 Jun 2008 09:52:06 +0100 (BST)

On Fri, 13 Jun 2008, Daniel Malter wrote:

> Hi,
> I am trying to parse a large amount of text using gregexpr(). Unfortunately,
> I get an "input buffer overflow" message when I attempt that with too large
> an amount of text. The error messages occurs before the parsing. The problem
> is that I cannot assign the text to a variable (an object) if the text is
> too large.

R does have limits on the command line length (1024 bytes up to R-devel, 4096 bytes there). What happens if you exceed that depends on the interface you are using (and you have not told us). Beyond that, the parser has a limit of MAXELTSIZE (8192 bytes) on strings.

I don't see any need for 'improvement' though: why are you entering very long strings as part of the R program? They are data, and e.g. readLines() and scan() have no limits on string length beyond those imposed by R's internals (2^31-1 bytes).

> This problem has been mentioned before, which I found using the RSiteSearch.
> However, the post is from 2006, and I thought it might have improved by now.
> Is there any way to increase the limit or to get around this problem?
> x="Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island,
> Tristan da Cunha"

I presume that is not an example? It looks like a character vector which has been collapsed by paste(x, ", ") and would be better strsplit() into its components than using gregexpr.

> #What I want to achieve is to parse the text for the number of occurrences
> of a certain character string within the text.
> #This is done using:
> n=100 #choose n large enough
> length(which("Saint",x,[[1]][1:n])==FALSE))
> But again, if the text is large, I cannot assign it to x. I'd be grateful
> for any suggestions.
> Cheers,
> Daniel
> -------------------------
> cuncta stricte discussurus
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 13 Jun 2008 - 11:04:45 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 Jun 2008 - 11:30:39 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive