Re: [R] Umlaut read from csv-file

From: Heinz Tuechler <tuechler_at_gmx.at>
Date: Sat, 08 Nov 2008 09:31:30 +0100

At 08:01 08.11.2008, Prof Brian Ripley wrote:
>We have no idea what you understood (you didn't tell us), but the help says
>
>encoding: character vector. The encoding(s) to be assumed when 'file'
> is a character string: see 'file'. A possible value is
> '"unknown"': see the ‚€˜Details‚€™.
>
>...
> This paragraph applies if 'file' is a filename (rather than a
> connection). If 'encoding = "unknown"', an attempt is made to
> guess the encoding. The result of 'localeToCharset()' is used as
> a guide. If 'encoding' has two or more elements, they are tried
> in turn until the file/URL can be read without error in the trial
> encoding.
>
>So source(encoding="latin1") says the file is
>encoded in Latin-1 and should be re-encoded if
>necessary (e.g. in UTF-8 locale).
>
>Setting the Encoding of parsed character strings is not mentioned.
>
>You could have written out a data frame with
>write.csv() and re-read it with
>read.csv(encoding = "latin1"): that was the
>workaround you were given earlier (not to use source).

Thank you for this explanation. I felt that I did not understand the help page of source() and I hoped, encoding='latin1' would have the same effect as in read.csv(), but rethinking it, I see that it would conflict with the primary functionality of source(). Earlier I tried writing the data.frame with write.csv and re-reading it. This works, but additional information like labels(), I have to tranfer in a second step. The best way I could immagine, would be some function, which marks every character string in the whole structure of a data.frame, including all attributes, as latin1.

>On Sat, 8 Nov 2008, Heinz Tuechler wrote:
>
>>At 16:52 07.11.2008, Prof Brian Ripley wrote:
>>>On Fri, 7 Nov 2008, Peter Dalgaard wrote:
>>>
>>>>Heinz Tuechler wrote:
>>>>>Dear Prof.Ripley!
>>>>>Thank you very much for your attention. In the given example Encoding(),
>>>>>or the encoding parameter of read.csv solve the problem. I hope your
>>>>>patch will solve also the problem, when I read a spss file by
>>>>>spss.get(), since this function has no encoding parameter and my real
>>>>>problem originated there.
>>>>read.spss() (package foreign) does have a reencode argument, though; and
>>>>this is called by spss.get(), so it looks like an easy hack to add it
>>>>there.
>>>Yes, older software like spss.get needs to get
>>>updated for the internationalization
>>>age. Modifying it to have a ... argument
>>>passed to read.spss would be a good idea (and future-proofing).
>>>In cases like this it is likely that the SPSS
>>>file does contain its encoding (although
>>>sometimes it does not and occasionally it is
>>>wrong), so it is helpful to make use of the
>>>info if it is there. However, the default is
>>>read.spss(reencode=NA) because of the problems
>>>of assuming that the info is correct when it is not are worse.
>>
>>The cause, why I tried the example below was to
>>solve the encoding by dumping and then
>>re-sourcing a data.frame with the encoding
>>parameter set to latin1. As you can see,
>>source(x, encoding='latin1') does not have the
>>effect I expected. Unfortunately I do not have
>>any idea, what I understood wrong regarding the meaning of encoding='latin1'.
>>
>>Heinz T√ľchler
>>
>>
>>us <- c("a", "b", "c", "√§", "√∂", "√ľ")
>>Encoding(us)
>>[1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
>>dump('us', 'us_dump.txt')
>>rm(us)
>>source('us_dump.txt', encoding='latin1')
>>us
>>[1] "a" "b" "c" "√§" "√∂" "√ľ"
>>Encoding(us)
>>[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
>>unlink('us_dump.txt')
>>
>>
>>
>>
>>>--
>>>Brian D. Ripley, ripley_at_stats.ox.ac.uk
>>>Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>>>University of Oxford, Tel: +44 1865 272861 (self)
>>>1 South Parks Road, +44 1865 272866 (PA)
>>>Oxford OX1 3TG, UK Fax: +44 1865 272595
>>
>>______________________________________________
>>R-help_at_r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>--
>Brian D. Ripley, ripley_at_stats.ox.ac.uk
>Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>University of Oxford, Tel: +44 1865 272861 (self)
>1 South Parks Road, +44 1865 272866 (PA)
>Oxford OX1 3TG, UK Fax: +44 1865 272595



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 08 Nov 2008 - 08:36:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 08 Nov 2008 - 09:30:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive