From: Petr Savicky <savicky_at_cs.cas.cz>

Date: Tue, 26 Jun 2007 17:03:24 +0200

y <- as.character(x);

mant.y <- sub("e.*","",y);

z <- formatC(x,digits=21,width=-1,format="e"); mant.z <- sub("e.*","",z);

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 26 Jun 2007 - 15:30:59 GMT

Date: Tue, 26 Jun 2007 17:03:24 +0200

I would like to reply to the following email from May in the thread [Rd] R 2.5.0 refuses to print enough digits to recover exact floating point values

On Wed, May 23, 2007 at 06:32:36PM +0100, Prof Brian Ripley wrote:

> I think this is a bug in the MacOS X runtime. I've checked the C99

*> standard, and can see no limits on the precision that you should be able
**> to specify to printf.
**>
**> Not that some protection against such OSes would come amiss.
**>
**> However, the C99 standard does make clear that [sf]printf is not required
**> (even as 'recommended practice') to be accurate to more than *_DIG places,
**> which as ?as.character has pointed out is 15 on the machines R runs on.
*

Let me use

http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1124.pdf
In Section 7.19.6, Recommended practice, paragraph 13, it is specified
that rounding to DECIMAL_DIG should be done correctly.

In Annex F.5 Binary-decimal conversion, it is explicitly stated that conversion from the widest supported IEC 60559 format to decimal with DECIMAL_DIG digits and back is the identity function. In footnote 305, it is specified that if double (53 bit precision) is the widest format supported, then DECIMAL_DIG >= 17.

R uses double precision as default, so I think DECIMAL_DIG = 17 is a reasonable value. This is the minimum number of decimal digits, for which the conversion

double -> string -> double

is value preserving, if rounding is correct.

DBL_DIG = 15 is another constant, which is the maximum number of digits, for which the conversion

string -> double -> string

may be done value preserving.

> It really is the case that writing out a number to > 15 significant digits

*> and reading it back in again is not required to give exactly the same
**> IEC60559 (sic) number, and furthermore there are R platforms for which
**> this does not happen. What Mr Weinberg claimed is 'now impossible' never
**> was possible in general (and he seems to be berating the R developers for
**> not striving to do better than the C standard requires of OSes). In fact,
**> I believe this to be impossible unless you have access to extended
**> precsion arithmetic, as otherwise printf/scanf have to use the same
**> arithmetic as the computations.
**>
**> This is why R supports transfer of floating-point numbers via readBin and
**> friends, and uses a binary format itself for save() (by default).
**>
**> I should also say that any algorithm that depends on that level of details
**> in the numbers will depend on the compiler used and optimization level and
**> so on. Don't expect repeatability to that level even with binary data
**> unless you (are allowed to) never apply bugfixes to your system.
*

Repeatability can hardly be achieved among different R installations, due to the reasons you explain. However, repeatability of a computation within the same installation may be achieved and may be sometimes useful. For example it may be useful for debugging, similarly as set.seed. Saving the data in binary is a possible approach for this, however, decimal numbers in a text may serve this purpose equally well, if they are precise enough.

I would like to ask a question concerning the function scientific in src/main/format.c, which is called from formatReal in the same file. Function scientific, besides other things, determines, whether a smaller number of significant digits than R_print.digits is sufficient to get a decimal number with relative error at most max(10^-R_print.digits,2*DBL_EPSILON), where DBL_EPSILON=2^-52.

For simplicity, let me consider only the case, when R_print.digits = DBL_DIG (which is 15). Then the bound for the relative error is 1e-15, since 1e-15 > 2*2^-52.

There are two error bounds in consideration: (1) the absolute error of the rounded mantissa (significand) is at most 5e-15, (2) the relative error of the rounded number is at most 1e-15.

It is natural to consider also (1), since R_print.digits = 15 and 5e-15 is the maximum absolute error of the mantissa for correct rounding to 15 significant digits.

If the mantissa is in [1,5], then (2) => (1). Hence, for these mantissas, function scientific should suggest a smaller number of digits than 15 only if the result is as accurate as rounding to 15 digits.

On the other hand, if the mantissa is in (5,10), then (2) does not imply (1). Hence, here we sometimes do not get the full precision 15 digit numbers. Is this behaviour the intention?

This has, for example, the following consequence: if a 15-digit number is read into R using read.table and then written back to a text using write.table, we need not get the same result.

For testing, I use the following platforms:

FLAGS means CFLAGS,FFLAGS,CXXFLAGS,FCFLAGS on Linuxes, R version is 2.5.1 RC (2007-06-23 r42041), on Windows, R version is binary distribution of R-2.5.1 (RC). [1] AMD Athlon(tm) 64 Processor 3500+, SUSE 10.1, gcc 4.1.0, FLAGS = -g -O2 -march=pentium4 -mfpmath=sse [2] AMD Athlon(tm) 64 Processor 3500+, SUSE 10.1, gcc 4.1.0, FLAGS = default [3] Intel Xeon processor, Fedora Core 2, gcc 3.3.3, FLAGS -g -O2 -march=pentium4 -mfpmath=sse [4] Intel Xeon processor, Fedora Core 2, gcc 3.3.3, FLAGS = default [5] Pentium III, SuSE 10.0, gcc 4.0.2, FLAGS = -g -O2 -ffloat-store [6] Pentium III, SuSE 10.0, gcc 4.0.2, FLAGS = default

[7] Pentium III, Windows XP.

On all 7 platforms, the problem with reproducing 15-digit numbers appears for example for the following numbers

a <- read.table(stdin());

9.14363081376609 9.15664585780591 9.21585022237569 9.23994602039041 9.32750736234049 9.35423342635191 9.44884219569079 9.47261017210051

write.table(a,file="");

# "V1" # "1" 9.1436308137661 # "2" 9.1566458578059 # "3" 9.2158502223757 # "4" 9.2399460203904 # "5" 9.3275073623405 # "6" 9.3542334263519 # "7" 9.4488421956908 # "8" 9.4726101721005

On platforms [1,3,5,7], the problem appears also for smaller mantissas

a <- read.table(stdin());

7.57660233100411 7.61904762387919 8.05515628983101 8.07459617374249

write.table(a,file="");

# "V1" # "1" 7.5766023310041 # "2" 7.6190476238792 # "3" 8.055156289831 # "4" 8.0745961737425

An alternative approach would be to print the value with R_print.digits (at most 17) precision and eliminate trailing zeros from the mantissa. This would guarantee (1) for 15 digits.

More generally, the absolute error of the mantissa would be at most 5*10^-R_print.digits and if the number may be represented with smaller number of digits with error of the mantissa less than 5*10^-R_print.digits, such a shorter representation would be generated.

For functions, which format each value in a vector separately (as.character, cat, write.table), removing trailing zeros may be done already in sprintf. For R function print, which uses the same number of decimal positions for all numbers in a vector, the string representation of all values in a vector has to be seen before a decision on the number of removed trailing zeros is made.

Also, I would like to point out that due to rounding errors in function scientific, the actual errors sometimes exceed both bounds (1) and (2). To demonstrate this, use the following:

vect <- function(z) { v <- strsplit(z,"")[[1]]; as.integer(v[-2]); } diff <- function(x,y) { x <- vect(x); y <- vect(y); y <- c(y,rep(0,times=length(x)-length(y))); # assuming length(x) >= length(y) abs(sum((x - y)*10^-(1:length(x) - 1))); }

x <- c(

5.1767789744225927e+21, 8.8010507503190114e+22, 8.0551457236914104e+23, 6.1185028214373079e+24, 4.1491936503345055e+25, 9.4921247662789880e+25);

y <- as.character(x);

mant.y <- sub("e.*","",y);

z <- formatC(x,digits=21,width=-1,format="e"); mant.z <- sub("e.*","",z);

err <- matrix(nrow=length(x),ncol=2);

for (i in 1:length(x)) {

err[i,1] <- diff(mant.z[i],mant.y[i]); err[i,2] <- err[i,1]/as.double(mant.z[i]);}

data.frame(formatC=z,as.character=y,abs.err=err[,1],rel.err=err[,2]);

On platforms [1,3,5], the output is (abs.err is absolute error of the mantissa):

formatC as.character abs.err rel.err 1 5.176778974422592651264e+21 5.1767789744226e+21 7.348736e-15 1.419558e-15 2 8.801050750319011351757e+22 8.801050750319e+22 1.135176e-14 1.289818e-15 3 8.055145723691410401526e+23 8.0551457236914e+23 1.040153e-14 1.291290e-15 4 6.118502821437307892007e+24 6.1185028214373e+24 7.892007e-15 1.289859e-155 4.149193650334505529822e+25 4.1491936503345e+25 5.529822e-15 1.332746e-15 6 9.492124766278988034841e+25 9.4921247662790e+25 1.196516e-14 1.260535e-15

On platforms [2,4,6], as.character works correctly on these examples.

On [7] (windows), the output is

formatC as.character abs.err rel.err 1 5.176778974422592651157e+21 5.1767789744226e+21 7.348843e-15 1.419578e-15 2 8.801050750319011351831e+22 8.801050750319e+22 1.135183e-14 1.289827e-15 3 8.055145723691410400945e+23 8.0551457236914e+23 1.040095e-14 1.291217e-15 4 6.118502821437307892093e+24 6.1185028214373e+24 7.892093e-15 1.289873e-155 4.149193650334505529750e+25 4.14919365033451e+25 4.470250e-15 1.077378e-15 6 9.492124766278988034877e+25 9.4921247662790e+25 1.196512e-14 1.260532e-15

formatC has smaller precision here, so the errors in the table are calculated with smaller precision. However, besides x[5], the actual errors of the conversion are the same as on platforms [1,3,5], since the numbers in x are the same (verified using save/load in binary format) and besides x[5], also the result of as.character is the same. The number x[5] is converted correctly.

All the best, Petr.

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 26 Jun 2007 - 15:30:59 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 26 Jun 2007 - 15:35:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*