Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments

From: Thomas Zumbrunn <thomas_at_zumbrunn.name>
Date: Fri, 20 Jan 2012 00:39:18 +0100

On Thursday 19 January 2012, peter dalgaard wrote:
> On Jan 18, 2012, at 23:54 , Thomas Zumbrunn wrote:
> > plain("Zürich") ## works
> > plain("Z\u00BCrich") ## fails
> > escaped("Zürich") ## fails
> > escaped("Z\u00BCrich") ## works
>
> Using the correct UTF-8 code helps quite a bit:
>
> U+00BC ¼ c2 bc VULGAR FRACTION ONE QUARTER
> U+00FC ü c3 bc LATIN SMALL LETTER U WITH DIAERESIS

Thank you for pointing that out. How embarrassing - I systematically used the wrong representations. Even worse, I didn't carefully read "Writing R Extensions" which speaks of "Unicode as \uxxxx escapes" rather than "UTF-8 as \uxxxx escapes", so e.g. looking up the UTF-16 byte representations would have done the trick.

I didn't find a recommended method of replacing non-ASCII characters with Unicode \uxxxx escape sequences and ended up using the Unix command line tool "iconv". However, the iconv version installed on my GNU/Linux machine (openSUSE 11.4) seems to be outdated and doesn't support the very useful "-- unicode-subst" option yet. I installed "libiconv" from http://www.gnu.org/software/libiconv/, and now I can easily replace all nonASCII  characters in my UTF-8 encoded R files with:

  iconv -f UTF-8 -t ASCII --unicode-subst="\u%04X" my-utf-8-encoded-file.R

Thomas Zumbrunn



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 19 Jan 2012 - 23:41:19 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 24 Jan 2012 - 12:30:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive