Re: [R] read.spss and umlaut

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Thu 03 Aug 2006 - 23:34:04 EST

On Thu, 3 Aug 2006, Thomas Kuster wrote:

> Hello
>
> Am Mittwoch, 2. August 2006 17.11 schrieb Thomas Lumley:
>> This sounds like a conflict between encodings -- eg if R is assuming UTF-8
>> and the file is encoding in Latin-1 then the sequence
>> U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
>> U+0072 : LATIN SMALL LETTER R
>> is coded as FC72 in the file, which is an illegal byte sequence in UTF-8.
>
> Hex: 74 65 20 66 fc 72 20 61 6c 6c 65 53 45 2f 31 36
> Text: t e f r a l l e S E / 1 6

Ok, so that looks like Latin-1 encoding in the file

>> The underlying C code (being written in the US quite a long time ago)
>> doesn't know about encodings, and I don't know what the rules are in SPSS
>> for valid characters (I suspect that in these old portable file formats it
>> probably just reads and writes bytes, leaving it up to the OS to interpret
>> them.
>
> But why stopp the C code reading? Is "/" not the endmark of the string? What
> is the problem, if I chance that in the source?

You haven't shown anything that indicates that the C code stopped reading. More likely R just stops displaying when it gets to an illegal byte sequence. You could use nchar() to count the bytes in the string to find out.

>> You could try running R in a non-UTF-8 locale to see if it helps.
>
> I think my local is non-UTF-8 (de_CH, isolatin). How can I check that, and set
> an other temporary?

You can use charToRaw() to see what R thinks the byte sequence is for a word with a u-umlaut.

Sys.setlocale() will let you change the locale, but your locale does look non-UTF-8.

This is all guesswork since we can't see the file.

         -thomas



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Aug 03 23:41:22 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 04 Aug 2006 - 18:17:25 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.