Re: [R] Reading in a table with ISO-latin1 encoding in MacOS-X (Intel)

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 09 Jun 2006 - 00:17:23 EST

You are using this as intended, although your email message came in latin9 not latin1, which does not affect your examples. Have you actually checked (e.g. via a hex dump) that the file is in latin1?

I assume that if you converted the file to UTF-8 you then used

read.table(R_data/hs+sfnet.T.060505.tbl4", header=TRUE)

If so, you need to investigate the locale in use, as which letters are valid depends on the locale: on Linux UTF-8 locales all letters in all languages are valid in R names, but that is not necessarily the MacOS interpretation. (Invalid characters in names will be converted to ., and if the locale is wrong so may be the interpretation of bytes as characters.)

You might find more informed help on the r-sig-mac list.

On Thu, 8 Jun 2006, Antti Arppe wrote:

> Dear colleages in R,
>
> I have earlier been working with R in Linux, where reading in a table
> containing Scandinavian letters ("ń", "÷", and "ň") in the header as part of
> variable names has not caused any problem whatsoever.
>
> However, when trying to do the same in R running on new MacOS-X (with an
> Intel processor) with the same original text table does not seem to work
> whichever way I try. Following the recommendations on the R site and using
> the 'file' function to set the encoding breaks down at the first encounter
> with a Scandinavian character:
>
> THINK <- read.table(file("R_data/hs+sfnet.T.060505.tbl4",
> encoding="latin1"),header=TRUE)
> Warning messages:
> 1: invalid input found on input connection 'R_data/hs+sfnet.T.060505.tbl4'
> 2: incomplete final line found by readTableHeader on
> 'R_data/hs+sfnet.T.060505.tbl4'
>
> A sample exemplifying such characters as variable labels is below (for which
> the behavior of R in Mac is the same as for the larger file referred to
> above):.
>
> ajatella miettiń pohtia
> 1 FALSE FALSE TRUE
> 2 FALSE FALSE FALSE
> 3 FALSE TRUE FALSE
> 4 FALSE TRUE FALSE
> 5 TRUE FALSE FALSE
> 6 TRUE FALSE FALSE
> 7 FALSE FALSE FALSE
> 8 FALSE TRUE FALSE
> 9 FALSE TRUE FALSE
> 10 FALSE FALSE FALSE
>
> Converting the the file from ISO-latin-1 to UTF8 (with Mac's TextEdit
> application)allows the file to be read in in its entirety, but still the
> Scandinavian character in the heading is coerced to a period '.', or two, in
> fact (i.e. 'miettiń' -> 'miett..')
>
> Have I possibly misunderstood how the 'file' function should be used in
> conjunction with 'read.table', or might the problem with latin1-to-utf
> conversion be somewhere else?

>
> Appreciating any help on this matter,
>
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Received on Fri Jun 09 00:30:02 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 09 Jun 2006 - 02:11:00 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.