Re: [R] How to replace German umlauts in strings?

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Thu, 10 Apr 2008 20:20:11 +0200

Dieter Menne wrote:
> Hans-Jörg Bibiko <bibiko <at> eva.mpg.de> writes:
>
>
>> On 10.04.2008, at 18:03, Hofert Marius wrote:
>>
>>> I have a file containing names of German students. These names
>>> contain the characters "ä", "ö" or "ü" (German umlauts). I use
>>> read.table() to read the file and let's assume the table is then
>>> stored in a variable called "data". The names are then contained in
>>> the first column, i.e. data[,1]. Now if I simply display the variable
>>> "data", I see, that "ä" is replaced by \x8a, "ö" is replaced by \x9a
>>> and so forth.
>>>
>
> This is strange. When I have a file umlaut.txt
>
> Name
> Äserich
> Ömadel
> Übermunsch
>
> and read it in with
>
> umlaut = read.table("umlaut.txt", header = TRUE)
> umlautasis = read.table("umlaut.txt", header = TRUE,as.is = TRUE)
>
> I get the following in both cases:
>
> umlautasis
> Name
> 1 Äserich
> 2 Ömadel
> 3 Übermunsch
>
> This is on Windows Vista. I use it every day without ever having seen nasty
> codings, typically with the following in latex
>
> \usepackage[T1]{fontenc}
> \usepackage{textcomp}
> \usepackage{babel}
> \usepackage[latin1]{inputenc} % For ü,ä
>
>
> Dieter
>
Thing is that \x9a for o-umlaut is an unusual encoding:

 > names(which(sapply(iconvlist(),iconv, x="S\x9aren")=="Sören"))

[1] "CP1282"            "CSMACINTOSH"       "MAC"             
[4] "MAC-CENTRALEUROPE" "MACINTOSH"         "MACIS"           
[7] "MAC-IS"            "MAC-SAMI"        
 > iconv("öäüÖÄÜ", to="MAC")
[1] "\x9a\x8a\x9f\x85\x80\x86"

and accordingly,

 > data$names <- iconv(data$names,from="MAC")  > data
  names points

1 Björn     10
2 Sören     20

or, if you need to do it for many variables, this should work:

ix <- sapply(data, is.character)
data[ix] <- lapply(data[ix], iconv, from="MAC")

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 10 Apr 2008 - 18:28:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 Apr 2008 - 20:31:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive