Re: [R] Unicode characters (R 2.7.0 on Windows XP SP3 and Hardy Heron)

From: Prof Brian Ripley <>
Date: Fri, 30 May 2008 23:11:51 +0100 (BST)

On Fri, 30 May 2008, Duncan Murdoch wrote:

> On 5/30/2008 4:12 PM, Hans-Joerg Bibiko wrote:
>> Quoting Duncan Murdoch <>:
>>> On 5/30/2008 12:58 PM, Hans-Jörg Bibiko wrote:
>>>> to put it simply. Windows cannot handle utf-8 data. There is no utf-8
>>>> locale available.
>>> Code page 65001 is utf-8. Most text editors (including Notepad)
>>> include an option to save in the UTF-8 encoding.
>>> Some programs don't fully support utf-8 (some don't even support the
>>> native UCS-2), but most don't care. That's the nice thing about utf-8.
>>> So in what sense can Windows not handle utf-8 data?

>> Of course, you're right. I only meant in that context R for Windows, not
>> Windows at all. Sorry for my incorrectness.
> But I think with Brian Ripley's work over the last while, R for Windows
> actually handles utf-8 pretty well. (It might not guess at that encoding,
> but if you tell it that's what you're using...)

UTF-8, please (only the capitalized form is correct).

R passes around, prints and plots UTF-8 character data pretty well, but it translates to the native encoding for almost all character-level manipulations (and not just on Windows). ?Encoding spells out the exceptions (and I think the original poster had not read it). As time goes on we may add more, but it is really tedious (and somewhat error-prone) to have multiple paths through the code for different encodings (and different OSes do handle these differently -- Windows' use of UTF-16 means that one character may not be one wchar_t).

A couple of the other points in the original posting were corrected in R-patched just after release.

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.

Received on Mon 02 Jun 2008 - 04:09:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 02 Jun 2008 - 04:30:37 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive