[Rd] Correct usage of nchar(): precautionary change for R 2.6.0

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue, 29 May 2007 10:39:11 +0100 (BST)


Remember that nchar() returns by default the number of *bytes* and not the number of characters. I've recently spotted many cases in which nchar() has been used with substr() which works in characters; this can lead to incorrect results. (This seems the commonest use of nchar() in packages.)

There were two reasons why nchar() was left defaulting to bytes when we allowed MBCSs in R:

  1. Many of the uses are of the form if(nchar(x)) or if(nchar(x)==0) or even if nchar(x) != 0. Computing the length of a string is an inefficient way to find out if it is non-empty, especially if it has to be converted to wchars to do so.
  2. Once you allow multibyte characters, not all character strings are valid and for those nchar(x, "c") is NA. Not much code has been written to take into account the possibility that nchar() might return an NA.

Despite these reasons, it seems that the dangers of incorrect use outweigh them. So for 2.6.0

It seems that nchar() is used quite often to lay out 'printed' or graphical output. For that, normally nchar(type="width") is what is needed.

None of this is an issue in single-byte locales or for ASCII text in UTF-8 or the Windows' CJK locales, but please bear in mind that you cannot assume such for a public package. (The assumption that ASCII code is represented in single bytes is pretty widespread, but at some point we may want to support Windows' native UCS-2 encoding for which it is not true.)

The best advice is to use the 'type' argument for all uses of nchar() in public code unless perhaps you are sure only ASCII data will ever be encountered.

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 29 May 2007 - 09:40:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 29 May 2007 - 14:34:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.