Re: [ESS] Re: [R] Strange characters in 2.1.0?

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Wed 08 Jun 2005 - 19:20:21 EST

>>>>> "PaCo" == Patrick Connolly <p.connolly@hortresearch.co.nz>
>>>>> on Wed, 8 Jun 2005 11:31:44 +1200 writes:

    PaCo> On Tue, 07-Jun-2005 at 04:10PM +0200, Martin Maechler wrote:     PaCo> |> >>>>> "Dan" == Dan Bolser <dmb@mrc-dunn.cam.ac.uk>

    PaCo> |>       ..........
    PaCo> |> 
    PaCo> |>     Dan> I have gone back to 2.0.0 :)
    PaCo> |> 
    PaCo> |> Don't do that!
    PaCo> |> You've lost tons of nice new features and gained quite an amount
    PaCo> |> of old bugs by downgrading .. 

    PaCo> I get the non-generic quotes to show on the screen, but they won't
    PaCo> print with enscript. I end up with a lot of wrapped lines and     PaCo> nonsense where an unknown character should be.

Why is this diverted from R- to ESS-help? Printing with enscript is also a topic for printing a transcript 'output.Rout' resulting e.g. from R CMD BATCH input.R output.Rout I'm committing a cross-posting felony now, by posting back to R-help {and please drop ESS-help from "cc" when further replying}....

    PaCo> What do I need to do to get enscript to know about such characters?
    PaCo> There is an encoding parameter which defaults to latin1.  Should I
    PaCo> change that to something?

Yes, in principle. "latin1" aka ISO-latin-1 aka iso-8859-1 is (for western European languages) the predecessor standard of the new unicode standard where we use the UTF-8 encoding {and the above is (too) much simplified; also enter "locale" settings and standards}

However, my version of enscript does not seem to support UTF-8 (yet). Nor does 'a2ps' an alternative to enscript which does pretty print R source files.

So there are basically two options :

  1. Get rid of unicode / utf-8 by setting the locale of your computer / login to use the "old" locales, e.g. en_US instead of en_US.utf-8. This will be more or less fine for Emacs and R --- though in in our {Redhat Enterprise} setup, the X11-fonts for non-utf-locales are quite crippled compared to those for utf-8 ones.

   However, as more and more other utilities are based on utf-8    encoded files, you will see funny characters there    if you are using locales like "de_*" or "fr_*", at least,    e.g. for man pages which are only in utf-8 for our Redhat OS setup.

2) Improve the printing tools by

  1. filtering *.utf-8 to latin-*
  2. printing the resulting latin-*

   For filtering, there are programs like 'recode' (was "GNU    recode", now "Free recode") which are extremely flexible and    'iconv' (less flexible but wider spread) that can translate    utf-8 to and from all kind of encodings / character sets.

In the future, of course everything will work out of the box when all the utilities in your computer will be aware of utf encodings and will automatically send correct stuff to the printer and display it correctly in all kind of viewers/editors... :-)

Given my experiences during the last several months (where I, e.g., also found that our oldish LaTeX setup   didn't yet accept \usepackage[utf8]{inputencoding ), If I were in New Zeeland and would not need accents or umlauts, I'd probably stick with latin1 (and would make sure my X server got proper non-utf8 fonts) for another year or so.

Martin



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jun 08 20:24:54 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:27 EST