[R] WG: AW: Another problem with encoding

From: Matthias Wendel <office_at_matthiaswendel.de>
Date: Wed, 2 Jan 2008 16:02:16 +0100


Hello, Peter,

        I tried it out: iconv(names(attributes(spss[,'Y6'])[[1]][14]), "UTF-8", "LATIN1", sub='byte') yielded

[1] "<c4>rzte Chirurgie"

and c4 corresponds in most encodings to Ä. What can I do next? I wonder whether there is a more comfortable way then to change the occurences of <..> by the adequate character. Regards,
Matthias

-----Ursprüngliche Nachricht-----
Von: Peter Dalgaard [mailto:p.dalgaard_at_biostat.ku.dk] Gesendet: Dienstag, 1. Januar 2008 20:21 An: Matthias Wendel
Betreff: Re: AW: [R] Another problem with encoding

Matthias Wendel wrote:
> Happy new year and my apologies, Peter. Here are the missing facts:
> I'm reading in a spss-file, doing some calculations and putting the
> results in a xml file. The xml-file is UTF-8 encoded and so should the results and their labels (eg Ärzte Chirurgie):
> Here is part of the R session:
>
>

As a matter of principle: Requests for more information are not offers that I will solve your problems personally. Stay on the list!

The characters seem to travel OK in email, so latin1is a guess. Have you tried the sub="byte" argument to iconv()?

>
>> Sys.getlocale()
>>
> [1]
>

"LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.125
> 2"
>
>> spss[,'Y6']
>>
> [1] 6 3 8 11 8 9 6 8 3 5 10 15 NA 9 8 3 8 16 6 6 NA 10 5 2 7 7 6 16 7 15 7 10 12
> [34] 8 7 12 12 16 7 6 8 8 15 6 NA 8 99 7 12 8 9 16 7 16 8 7 7 1 15 12 8 7 10 7 8 7
> [67] 8 9 8 6 6 8 6 16 11 5 11 11 1 11 3 7 7 10 10 10 6 11 16 NA 1 3 2 10 99 10 3 3 9
> [100] 7 16 99 16 1 10 2 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 NA 10 16 16 NA 6 10 5 11
> [133] 11 1 1 1 1 16 1 16 1 1 1 1 6 6 6 16 8 16 16 16 16 5 6 10 99 11 11 10 6 6 1 1 6
> [166] 1 11 11 16 9 11 16 6 8 8 16 16 8 6 16 16 12 12 12 12 12 12 12 16 9 16 15 12 12 15 10 16 15
> [199] 4 1 2 14 4 4 2 5 NA 1 5 5 7 9 5 12 12 NA 16 12 12 12 12 12 12 12 12 12 99 NA 12 12 NA
> [232] 1 16 1 7 11 5 6 7 1 13 6 8 16 2 1 5 16 16 9 8 8 8 7 16 8 8 2 8 5 4 6 14 5
> [265] 14 8 8 14 4 4 8 14 8 14 6 2 3 14 3 16 5 15 15 15 15 15 15 15 15 15 15 15 13 13 13 13 13
> [298] 13 13 13 13 13 13 13 13 15 6 NA 12 3 9 9 NA 10 16
> attr(,"value.labels")
> Verwaltung Servicegesellschaft Waldfriede (SKW)
> 16 15
> Kurzzeitpflege Waldfriede Sozialstation
> 14 13
> Krankenpflegeschule Med. Technischer Dienst
> 12 11
> Pflege OP Funktionsdienst
> 10 9
> Pflege Gynäkologie Pflege Chirurgie
> 8 7
> Pflege Innere Ärzte Anästhesie, Röntgen
> 6 5
> Ärzte Gynäkologie Ärzte Chirurgie
> 4 3
> Ärzte Innere Patientenberatung/-betreuung
> 2 1
>
>> names(attributes(spss[,'Y6'])[[1]][14])
>>
> [1] "Ärzte Chirurgie"
>
>> iconv(names(attributes(spss[,'Y6'])[[1]][14]), "UTF-8", "LATIN1")
>>
> [1] NA
>
>> utf8ToInt(names(attributes(spss[,'Y6'])[[1]][14]))
>>
> Fehler in utf8ToInt(names(attributes(spss[, "Y6"])[[1]][14])) :
> invalid UTF-8 string
>
>
> Cheers,
> Matthias
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Dalgaard [mailto:p.dalgaard_at_biostat.ku.dk]
> Gesendet: Montag, 31. Dezember 2007 10:45
> An: Matthias Wendel
> Cc: r-help_at_stat.math.ethz.ch
> Betreff: Re: [R] Another problem with encoding
>
> Matthias Wendel wrote:
>
>> Hi
>> I've imported an spss-file using read.spss. One variable has value
>> like 'Ärzte'. I thought this is UTF-8 encoded, but it is not (as the results of iconv and utf8ToInt suggest). Is there any way to
>>
> find out how these spss-values are encoded?
>
>>
>>
> You are assuming a bit much of your readers.
>
> What exactly are you doing? Is it a value, a value label, or perhaps a variable name. How do the results of read.spss look on the

R
> side? How did you apply iconv and utf8ToInt? What is your locale?
>
> I mean, we could try and guess all those details, but you are the one with the hard info, and the motivation...
>
>

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 02 Jan 2008 - 15:07:04 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Jan 2008 - 16:30:04 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive