Re: [R] read.spss and umlaut

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Thu 03 Aug 2006 - 01:11:13 EST

On Wed, 2 Aug 2006, Thomas Kuster wrote:

> Hello
>
> When I read a SPSS *.por file with read.spss everything after a umlaut is
> missing:

This sounds like a conflict between encodings -- eg if R is assuming UTF-8 and the file is encoding in Latin-1 then the sequence U+00FC : LATIN SMALL LETTER U WITH DIAERESIS U+0072 : LATIN SMALL LETTER R
is coded as FC72 in the file, which is an illegal byte sequence in UTF-8.

The underlying C code (being written in the US quite a long time ago) doesn't know about encodings, and I don't know what the rules are in SPSS for valid characters (I suspect that in these old portable file formats it probably just reads and writes bytes, leaving it up to the OS to interpret them.

You could try running R in a non-UTF-8 locale to see if it helps.

If anyone has definitive information about how SPSS represents strings and decides on valid characters that might be useful too.

         -thomas

>> library("foreign")
>> spssdaten <- read.spss("projets.por")
>> attr(spssdaten$PROJETX, "value.labels")[1:20]
> Bg Stammzellenforschung Bb
> 863 862
> Bb Neugestaltung des Finanzausgleichs
> 861 854
> EV Postdienste f Bb
> 853 852
> Bb Bg Steuerpaket
> 851 843
> Bb Anhebung der Mehrwertsteuer s 11. AHV-Revision
> 842 841
> Volkinitiative Lebenslange Verwahrung
> 833 832
> Gegenentwurf zur Avanti EV Lehrstellen-Initiative
> 831 824
> EV Moratorium Plus EV Strom ohne Atom
> 823 822
> EV Ja zu fairen Mieten EV Gleiche Rechte f
> 821 815
> EV Gesundheitsinitiative EV Sonntags-Initiative
> 814 813
>
> The SPSS-File is okay:
>> system("cat projets.por |grep Postdienste")
> echtserwerb 3. GenerationSD/N/EV Postdienste für alleSE/16/Änderrung Bg EOG
> Mut
>
> How can I read the SPSS-File with the Umlaut?
>
> Bye
> Thomas Kuster
>
> R: 2.1.0 (2005-04-18)
> OS: Debian Linux, 2.6.10-isgee-neptun-1
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

______________________________________________

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Aug 03 01:24:53 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 03 Aug 2006 - 20:17:01 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.