Re: [R] Encoding() and strsplit()

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri, 07 Nov 2008 08:15:55 +0000 (GMT)

See the 'R Internals' manual.

ASCII characters are not marked as Latin-1 nor UTF-8.

On Fri, 7 Nov 2008, Heinz Tuechler wrote:

> Dear All,
>
> Encoding() goes beyond my understanding. See the example. I would expect from
> reading the help for Encoding() that strsplit preserves the encoding for each
> resulting element, but for simple letters it gets lost.
> Also it seems that an Encoding() cannot be declared for simple letters. They
> remain in any case "unknown". In paste() "latin1" seems to dominate
> "unknown".
> What kind of characteristic of an object is the encoding? It does not show up
> as attribute and also str() does not give me any hint.
> Where can I find some explanation regarding encoding?
>
> Thanks
>
> Heinz
>
> ### Encoding() and strsplit
> u <- 'abc'
> Encoding(u)
> [1] "latin1"
> Encoding(u) <- 'latin1' # to be sure about encoding
> us <- strsplit(u, '')[[1]] # split in single strings
> Encoding(us)
> [1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
> Encoding(us) <- rep('latin1', length(us))
> Encoding(us)
> [1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
> pus <- paste(us[1], us[5], sep='')
> Encoding(pus)
> [1] "latin1"
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status = Patched
> major = 2
> minor = 8.0
> year = 2008
> month = 11
> day = 04
> svn rev = 46830
> language = R
> version.string = R version 2.8.0 Patched (2008-11-04 r46830)
>
> Windows XP (build 2600) Service Pack 2
>
> Locale:
> LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices,
> package:utils, package:datasets, package:methods, Autoloads, package:base
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Fri 07 Nov 2008 - 08:19:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Nov 2008 - 09:30:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive