Re: [R] Encoding() and strsplit()

From: Heinz Tuechler <tuechler_at_gmx.at>
Date: Fri, 07 Nov 2008 10:00:12 +0100

At 09:15 07.11.2008, Prof Brian Ripley wrote:
>See the 'R Internals' manual.

Thank you, now I understand a little more. My real problem, however is a data frame produced by spss.get(). Is there a simple possibility to mark all characters in that data.frame (except ASCII characters), including levels of factors to latin1?

Heinz Tchler

>ASCII characters are not marked as Latin-1 nor UTF-8.
>
>On Fri, 7 Nov 2008, Heinz Tuechler wrote:
>
>>Dear All,
>>
>>Encoding() goes beyond my understanding. See
>>the example. I would expect from reading the
>>help for Encoding() that strsplit preserves the
>>encoding for each resulting element, but for simple letters it gets lost.
>>Also it seems that an Encoding() cannot be
>>declared for simple letters. They remain in any
>>case "unknown". In paste() "latin1" seems to dominate "unknown".
>>What kind of characteristic of an object is the
>>encoding? It does not show up as attribute and
>>also str() does not give me any hint.
>>Where can I find some explanation regarding encoding?
>>
>>Thanks
>>
>>Heinz
>>
>>### Encoding() and strsplit
>>u <- 'abc'
>>Encoding(u)
>>[1] "latin1"
>>Encoding(u) <- 'latin1' # to be sure about encoding
>>us <- strsplit(u, '')[[1]] # split in single strings
>>Encoding(us)
>>[1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
>>Encoding(us) <- rep('latin1', length(us))
>>Encoding(us)
>>[1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
>>pus <- paste(us[1], us[5], sep='')
>>Encoding(pus)
>>[1] "latin1"
>>
>>Version:
>>platform = i386-pc-mingw32
>>arch = i386
>>os = mingw32
>>system = i386, mingw32
>>status = Patched
>>major = 2
>>minor = 8.0
>>year = 2008
>>month = 11
>>day = 04
>>svn rev = 46830
>>language = R
>>version.string = R version 2.8.0 Patched (2008-11-04 r46830)
>>
>>Windows XP (build 2600) Service Pack 2
>>
>>Locale:
>>LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>>
>>Search Path:
>>.GlobalEnv, package:stats, package:graphics,
>>package:grDevices, package:utils,
>>package:datasets, package:methods, Autoloads, package:base
>>
>>______________________________________________
>>R-help_at_r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>--
>Brian D. Ripley, ripley_at_stats.ox.ac.uk
>Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>University of Oxford, Tel: +44 1865 272861 (self)
>1 South Parks Road, +44 1865 272866 (PA)
>Oxford OX1 3TG, UK Fax: +44 1865 272595



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Nov 2008 - 09:06:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Nov 2008 - 09:30:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive