Re: [R] Formatting numbers with a limited amount of digits consistently

From: Marc Schwartz <MSchwartz_at_mn.rr.com>
Date: Wed 01 Jun 2005 - 00:30:05 EST

On Mon, 2005-05-30 at 23:53 -0400, Gabor Grothendieck wrote:
> On 5/30/05, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
> > Gabor Grothendieck wrote:
> > > On 5/30/05, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
> > >
> > >>Henrik Andersson wrote:
> > >>
> > >>>I have tried to get signif, round and format to display numbers like
> > >>>these consistently in a table, using e.g. signif(x,digits=3)
> > >>>
> > >>>17.01
> > >>>18.15
> > >>>
> > >>>I want
> > >>>
> > >>>17.0
> > >>>18.2
> > >>>
> > >>>Not
> > >>>
> > >>>17
> > >>>18.2
> > >>>
> > >>>
> > >>>Why is the last digit stripped off in the case when it is zero!
> > >>
> > >>signif() changes the value; you don't want that, you want to affect how
> > >>a number is displayed. Use format() or formatC() instead, for example
> > >>
> > >> > x <- c(17.01, 18.15)
> > >> > format(x, digits=3)
> > >>[1] "17.0" "18.1"
> > >> > noquote(format(x, digits=3))
> > >>[1] 17.0 18.1
> > >>
> > >
> > >
> > > That works in the above context but I don't think it works generally:
> > >
> > > R> f <- head(faithful)
> > > R> f
> > > eruptions waiting
> > > 1 3.600 79
> > > 2 1.800 54
> > > 3 3.333 74
> > > 4 2.283 62
> > > 5 4.533 85
> > > 6 2.883 55
> > >
> > > R> format(f, digits = 3)
> > > eruptions waiting
> > > 1 3.60 79
> > > 2 1.80 54
> > > 3 3.33 74
> > > 4 2.28 62
> > > 5 4.53 85
> > > 6 2.88 55
> > >
> > > R> # this works in this case
> > > R> noquote(prettyNum(round(f,1), nsmall = 1))
> > > eruptions waiting
> > > [1,] 3.6 79.0
> > > [2,] 1.8 54.0
> > > [3,] 3.3 74.0
> > > [4,] 2.3 62.0
> > > [5,] 4.5 85.0
> > > [6,] 2.9 55.0
> > >
> > > and even that does not work in the desired way (which presumably
> > > is not to use exponent format) if you have some
> > > large enough numbers like 1e6 which it will display using
> > > the e notation rather than using ordinary notation.
> >
> > formatC with format="f" seems to work for me, though it assumes you're
> > specifying decimal places rather than significant digits. It also wants
> > a vector of numbers as input, not a dataframe. So the following gives
> > pretty flexible control over what a table will look like:
> >
> > > data.frame(eruptions = formatC(f$eruptions, digits=2, format='f'),
> > + waiting = formatC(f$waiting, digits=1, format='f'))
> > eruptions waiting
> > 1 1000000.11 79.0
> > 2 1.80 54.0
> > 3 3.33 74.0
> > 4 2.28 62.0
> > 5 4.53 85.0
> > 6 2.88 55.0
> >
> > >
> > > I have struggled with this myself and have generally been able
> > > to come up with something for specific instances but I have generally
> > > found it a pain to do a simple thing like format a table exactly as I want
> > > without undue effort. Maybe someone else has figured this out.
> >
> > I think that formatting tables properly requires some thought, and R is
> > no good at thinking. You can easily recognize a badly formatted table,
> > but it's very hard to write down rules that work in general
> > circumstances. It's also a matter of taste, so if I managed to write a
> > function that matched my taste, you would find you wanted to make changes.
> >
> > It's sort of like expecting plot(x, y) to always come up with the best
> > possible plot of y versus x. It's just not a reasonable expectation.
> > It's better to provide tools (like abline() for plots or formatC() for
> > tables) that allow you to tailor a plot or table to your particular needs.
> >
>
> Thanks. That seems to be the idiom I was missing. One thing that would
> be nice would be if formatC could handle data frames.

Guys, perhaps I am missing something here, but there seems to be some confusion as to how the numbers are stored internally, versus how the output is displayed and the meaning of "significant digits", which is what I believe Henrik's original query was about.

By default, R's printed output uses the settings from options("digits") and options("scipen") to define output based upon the number of significant digits, which is of course not the same as the number of decimal places. Hence the variance in the output that Henrik gets and why the trailing zero is dropped.

The use of signif() does not help here because it is still based upon the number of significant digits, where the trailing zero still gets dropped.

The use of the above are "inexact" when it comes to creating formatted output for a table with a consistent number of decimal places to align columns of numbers.

format() is still problematic here because it too uses the number of significant digits, defaulting to options("digits").

Using formatC() or sprintf() in conjunction with cat() is usually the best way to gain control over how numeric output is formatted, especially in a nicely aligned table. This is what I use in CrossTable (), where I want decimal aligned columns for numbers in the tabular output, along with fixed width columns for textual output (ie. labels, etc.).

Briefly, along the lines of Gabor's example on the output using the faithful dataset above, one could use something like:

> f <- head(faithful)

> noquote(apply(f, 2, function(x) formatC(x, format = "f", digits = 1)))
  eruptions waiting

1 3.6       79.0
2 1.8       54.0
3 3.3       74.0
4 2.3       62.0
5 4.5       85.0
6 2.9       55.0

which only affects how the data is printed, not the data itself. It can work fine for a 2D object that has all numeric columns.

Note however that the numeric columns are left-aligned, not rightaligned,  as in the default print method, since the output of the above function is a character matrix, rather than a data.frame with numeric columns. Hence, note:

> f

  eruptions waiting

1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55


Thus, for greater control, one should use sprintf() and cat():

out.lines <- sprintf("%15s %15s\n", colnames(f)[1], colnames(f)[2])

for (i in 1:nrow(f))
{
  out.lines <- c(out.lines,

                 sprintf("%14.1f  %14.1f\n", f[i, 1], f[i, 2]))
}

> cat(out.lines)

      eruptions         waiting
            3.6            79.0
            1.8            54.0
            3.3            74.0
            2.3            62.0
            4.5            85.0
            2.9            55.0



In the above case, one can specify the column widths for the column labels and the row values. Of course, the above could be extended to become a generic function for data frames with multiple data types, with arguments enabling the specification of column widths, number of decimal places, etc. One might even want more than one specification for the number of decimal places depending upon the nature of the columns on the object to be printed, so vectors could be used for these arguments.

I'll leave that for further exercise.

Final note to Henrik: Note that the IEEE 754 rounding standard as implemented in R results in:

> round(18.15, 1)

[1] 18.1
> formatC(18.15, format = "f", digits = 1)
[1] "18.1"
> sprintf("%5.1f", 18.15)

[1] " 18.1"

This is because the rounding method implemented is the "go to the even digit" approach. Thus, you don't get 18.2.

See ?round for more information.

HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jun 01 00:36:28 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:16 EST