Re: [Rd] readchar() bug or feature? was Re: Clarification for readChar man page

From: Jeffrey Horner <jeff.horner_at_vanderbilt.edu>
Date: Fri, 15 Jun 2007 10:07:28 -0500

Jeffrey Horner wrote:

> Jeffrey Horner wrote:

>> Duncan Murdoch wrote:
>>> On 6/14/2007 10:49 AM, Jeffrey Horner wrote:
>>>> Hi,
>>>>
>>>> Here's a patch to the readChar manual page (R-trunk as of today)
>>>> that better clarifies readChar's return value.
>>> Your update is not right. For example:
>>>
>>> x <- as.raw(32:96)
>>> readChar(x, nchars=rep(2,100))
>>>
>>> This returns a character vector of length 100, of which the first 32
>>> elements have 2 chars, the next one has 1, and the rest are "".
>>>
>>> So the length of nchars really does affect the length of the value.
>>>
>>> Now, I haven't looked at the code, but it's possible we could delete
>>> the "(which might be less than \code{length(nchars)})" remark, and if
>>> not, it would be useful to explain the situations in which the return
>>> value could be shorter than the nchars vector.
>>
>> Well, this is rather a misunderstanding on my part; I completely
>> forgot about vectorization. The manual page makes sense to me now.
>>
>> But the situation about the return value possibly being less than
>> length(nchars) isn't clear. Consider a 101 byte text file in a
>> non-multibyte character locale:
>>
>> f <- tempfile()
>> writeChar(paste(rep(seq(0,9),10),collapse=''),con=f)
>>
>> and calling readChar() to read 100 bytes with length(nchar)=10:
>>
>> > readChar(f,nchar=rep(10,10))
>> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>>
>> and readChar() reading the entire file with length(nchar)=11:
>>
>> > readChar(f,nchar=rep(10,11))
>> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [11] "\0"
>>
>> but the following two outputs are confusing. readchar() with
>> length(nchar)>=12 returns a character vector length 12:
>>
>> > readChar(f,nchar=rep(10,12))
>> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [11] "\0" ""
>> > readChar(f,nchar=rep(10,13))
>> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [11] "\0" ""
>>
>> It seems that the first time EOF is encountered on a read operation,
>> an empty string is returned, but on subsequent reads nothing is
>> returned. Is this intended behavior?
> 
> I believe this is an off-by-1 bug in do_readchar(). The following fix to 
> R-trunk v41946 causes the above readchar() calls to cap the returned 
> vector length at 11:
> 
> Index: src/main/connections.c
> ===================================================================
> --- src/main/connections.c      (revision 41946)
> +++ src/main/connections.c      (working copy)
> @@ -3286,7 +3286,7 @@
>             if(!con->open(con)) error(_("cannot open the connection"));
>      }
>      PROTECT(ans = allocVector(STRSXP, n));
> -    for(i = 0, m = i+1; i < n; i++) {
> +    for(i = 0, m = 0; i < n; i++) {
>         len = INTEGER(nchars)[i];
>         if(len == NA_INTEGER || len < 0)
>             error(_("invalid value for '%s'"), "nchar");
> 


This does look like an off-by-1 bug as do_readbin's for loops are coded just like the above patch.

Jeff

-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 15 Jun 2007 - 15:39:27 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 15 Jun 2007 - 16:34:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.