Re: [Rd] random output with sub(fixed = TRUE)

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed 21 Dec 2005 - 22:29:54 GMT

On Wed, 21 Dec 2005, Roger D. Peng wrote:

> Well, who am I to break this long-standing ritual? :)
>
> Interestingly, while the printed output looks wrong, I get
>
> > v <- paste(0:10, "asdf", sep = ".")
> > a <- sub(".asdf", "", v, fixed = TRUE)
> > b <- as.character(0:10)
> > identical(a, b)
> [1] TRUE
> >

identical is wrong! R character strings have a true length and a C-style length: print() prints the all the characters, even those after embedded nuls. identical uses

 	    if(strcmp(CHAR(STRING_ELT(x, i)),
 		      CHAR(STRING_ELT(y, i))) != 0)

which is C-style.

The issue is character.c:1015 whose nr gets trashed: note the first answer in the vector is correct. So easy to fix.

This code has been as currently for years, so I don't think this is at all related to the release of 2.2.1.

> Peter Dalgaard wrote:
>> "Roger D. Peng" <rpeng@jhsph.edu> writes:
>>
>>
>>> I've noticed what I think is curious behavior in using 'sub(fixed = TRUE)' and
>>> was wondering if my expectation is incorrect. Here is one example:
>>>
>>> v <- paste(0:10, "asdf", sep = ".")
>>> sub(".asdf", "", v, fixed = TRUE)
>>>
>>> The results I get are
>>>
>>>> sub(".asdf", "", v, fixed = TRUE)
>>> [1] "0" "1\0st\0\0" "2\0<af>\001\0\0" "3\0<af>\001\0\0"
>>> [5] "4\0mes\0" "5\0<ba>\001\0\0" "6\0\0\0\0\0" "7\0\0\0m\0"
>>> [9] "8\0\0\0t\0" "9\0<fe>\0\0\0" "10\0\0\0\0\0"
>>>>
>>>
>>> I expected "0" in the first entry and everything else would be unchanged. Your
>>> results may vary since every time I run 'sub()' in this way, I get a slightly
>>> different answer in entires 2 through 11.
>>>
>>> As it turns out, 'gsub(fixed = TRUE)' gives me the answer I *actually* wanted,
>>> which was to replace the string in every entry. But I still think the behavior
>>> of 'sub(fixed = TRUE) is a bit odd.
>>>
>>>> version
>>> _
>>> platform x86_64-unknown-linux-gnu
>>> arch x86_64
>>> os linux-gnu
>>> system x86_64, linux-gnu
>>> status
>>> major 2
>>> minor 2.1
>>> year 2005
>>> month 12
>>> day 20
>>> svn rev 36812
>>> language R
>>>>
>>
>>
>> Argh...
>>
>> year 2005
>> month 12
>> day 21
>>
>> and something like this gets discovered. It's a ritual, I tell ya, a ritual!
>>
>> If you look at the output and terminate all strings at the embedded
>> \0, it looks much more sensible, so it should be fairly easy to spot
>> the cause of this bug...
>>
>
> --
> Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu Dec 22 09:36:30 2005

This archive was generated by hypermail 2.1.8 : Thu 22 Dec 2005 - 02:22:15 GMT