Re: [Rd] Encoding errors in Rd files

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue, 24 Jul 2012 21:46:54 +0100

On 24/07/2012 21:08, steven mosher wrote:
> Well, I'm working on project trying to bring back an old package last
> published on R 1.9 back to life.
> I'm almost there but I am getting killed by an encoding error in the Rd
> files
>
> After reading the manual, I decided to try UTF-8. Mostly because I could
> spell it. ha.
>
> That got me a bit closer but I still have these warnings
>
> * checking data for non-ASCII characters ... WARNING
> Warning: found non-ASCII string(s)
> 'Tourbihre de la Rivihre-aux-Feu' in object 'modpoll'
> 'Lac ` la Fourche' in object 'modpoll'
> 'Lac ` la Loutre' in object 'modpoll'
> 'Lac Kinogami' in object 'modpoll'

How to handle those is in 'Writing R Extensions': basically convert to UTF-8 and mark them as UTF-8.

> * checking data for ASCII and uncompressed saves ... OK
> * checking examples ... OK
> * checking PDF version of manual ... WARNING
> LaTeX errors when creating PDF version.
> This typically indicates Rd problems.
> LaTeX errors found:
> ! Package inputenc Error: Keyboard character used is undefined
> (inputenc) in inputencoding `utf8'.
>
> I'll keep searching the help list archives for a clue, but If somebody
> could point me at educational material it's really time
> that I learn this aspect.

Without the actual file we can do little. The message means that something in the manual inputs (and it could be the DESCRIPTION file or an Rd file) contains a character not known to LaTeX. Most likely it is simply not a UTF-8 character, but it could also be outside LaTeX's gamut.

Normally the LaTeX log (which is in the check output) is more revealing: you can also try this part alone with R CMD Rd2pdf (and R CMD Rd2pdf --no-description often points the finger at the DESCRIPTION file).

>
> I've read http://developer.r-project.org/Encodings_and_R.html
>
> How do I figure out which encoding to use with the error seen above

Assuming this is not something esoteric, UTF-8 is the most comprehensive choice, but LaTeX's UTF-8 coverage (and that of the fonts used) is heavily biased to Western European scripts. So for example for Lithuanian you may want to choose something else (Latin-7?).

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 24 Jul 2012 - 20:51:33 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 25 Jul 2012 - 08:10:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive