Re: [R] Umlaut read from csv-file

From: Heinz Tuechler <tuechler_at_gmx.at>
Date: Fri, 07 Nov 2008 00:50:51 +0100


Dear Prof.Ripley!

Thank you very much for your attention. In the given example Encoding(), or the encoding parameter of read.csv solve the problem. I hope your patch will solve also the problem, when I read a spss file by spss.get(), since this function has no encoding parameter and my real problem originated there.

many thanks

Heinz Tüchler

At 23:51 06.11.2008, you wrote:
>Look at Encoding() on your two strings. The
>results are different, and this seems to be the
>root of the problem. Adding encoding="latin1"
>to the read.csv call is a workaround.
>
>It looks like there is a problem in the use of
>the CHARSXP cache: if I save the session then x0
>== x becomes true when I reload it, even though the encodings remain different.
>
>I've found the immediate cause and will change this in R-patched shortly.
>
>On Thu, 6 Nov 2008, Heinz Tuechler wrote:
>
>>Dear All!
>>
>>Reading character strings containing an
>>"umlaut" from a csv-file I find a (to me)
>>surprising behaviour in R 2.8.0, that I did not notice in R 2.7.2.
>>A comparison by "==" results in FALSE, while grep does find the aggreement.
>>See the example below.
>>The crucial line is x=="div 1-2 Veränderungen",
>>with the result [1] FALSE in R 2.8.0 but
>>[1] TRUE in R 2.7.2.
>>
>>Thank you in advance for your help
>>
>>Heinz Tüchler
>>
>>##### in R 2.8.0 patched
>>
>>x0 <- "div 1-2 Veränderungen" # define a character string
>>
>>write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
>>rm(x0)
>>
>>x <- read.csv('chr.csv', skip=0, header=TRUE,
>>as.is=TRUE)$x # read in csv-file
>>x
>>x=="div 1-2 Veränderungen"
>>>[1] FALSE
>>grep("div 1-2 Veränderungen", x)
>>>[1] 1
>>grep("div 1-2 Veränderungen", x, value=TRUE)
>>>[1] "div 1-2 Veränderungen"
>>
>>unlink('chr.csv') # delete file
>>
>>Version:
>>platform = i386-pc-mingw32
>>arch = i386
>>os = mingw32
>>system = i386, mingw32
>>status = Patched
>>major = 2
>>minor = 8.0
>>year = 2008
>>month = 11
>>day = 04
>>svn rev = 46830
>>language = R
>>version.string = R version 2.8.0 Patched (2008-11-04 r46830)
>>
>>Windows XP (build 2600) Service Pack 2
>>
>>Locale:
>>LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>>
>>Search Path:
>>.GlobalEnv, package:stats, package:graphics,
>>package:grDevices, package:utils,
>>package:datasets, package:methods, Autoloads, package:base
>>
>>
>>##### in R 2.7.2 patched
>>
>>
>>x0 <- "div 1-2 Veränderungen" # define a character string
>>
>>write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
>>rm(x0)
>>
>>x <- read.csv('chr.csv', skip=0, header=TRUE,
>>as.is=TRUE)$x # read in csv-file
>>x
>>x=="div 1-2 Veränderungen"
>>>[1] TRUE
>>grep("div 1-2 Veränderungen", x)
>>>[1] 1
>>grep("div 1-2 Veränderungen", x, value=TRUE)
>>>[1] "div 1-2 Veränderungen"
>>
>>unlink('chr.csv') # delete file
>>
>>Version:
>>platform = i386-pc-mingw32
>>arch = i386
>>os = mingw32
>>system = i386, mingw32
>>status = Patched
>>major = 2
>>minor = 7.2
>>year = 2008
>>month = 09
>>day = 02
>>svn rev = 46486
>>language = R
>>version.string = R version 2.7.2 Patched (2008-09-02 r46486)
>>
>>Windows XP (build 2600) Service Pack 2
>>
>>Locale:
>>LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>>
>>Search Path:
>>.GlobalEnv, package:stats, package:graphics,
>>package:grDevices, package:utils,
>>package:datasets, package:methods, Autoloads, package:base
>>
>>______________________________________________
>>R-help_at_r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>--
>Brian D. Ripley, ripley_at_stats.ox.ac.uk
>Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>University of Oxford, Tel: +44 1865 272861 (self)
>1 South Parks Road, +44 1865 272866 (PA)
>Oxford OX1 3TG, UK Fax: +44 1865 272595



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 06 Nov 2008 - 23:56:11 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Nov 2008 - 13:30:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive