Re: [Rd] Randomness not due to seed

From: Dirk Eddelbuettel <edd_at_debian.org>
Date: Wed, 20 Jul 2011 08:38:20 -0500

On 20 July 2011 at 14:03, Jeroen Ooms wrote:
| >> I think Bill Dunlap's answer addressed it:  the claim appears to be false.
|
| Here is another example where there is randomness that is not due to
| the seed. On the same machine, the same R binary, but through another
| interface. First directly in the shell:
|
| > sessionInfo()
| R version 2.13.1 (2011-07-08)
| Platform: i686-pc-linux-gnu (32-bit)
|
| locale:
| [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
| [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
| [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
| [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
| [9] LC_ADDRESS=C LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics grDevices utils datasets methods base
|
| > set.seed(123)
| > print(coef(lm(dist~speed, data=cars)),digits=22)
| (Intercept) speed
| -17.579094890510951643137 3.932408759124087715975

That's PBKAC --- even double precision does NOT get you 22 digits precision.

You may want to read up on 'what every computer scientist should know about floating point arithmetic' by Goldberg (which is both a true internet classic) and ponder why a common setting for the various 'epsilon' settings of general convergence is set to of the constants supplied by the OS and/or its C library. R has

  #define SINGLE_EPS FLT_EPSILON
  [...]
  #define DOUBLE_EPS DBL_EPSILON

in Constants.h. You can then chase the definition of FLT_EPSILON and DBL_EPSILON through your system headers (which is a good exercise).

One place you may end up in the manual -- the following from the GNU libc documentationon :Floating Point Parameters"

FLT_EPSILON

     This is the minimum positive floating point number of type float such that
     1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 1E-5. 

DBL_EPSILON
LDBL_EPSILON

     These are similar to FLT_EPSILON, but for the data types double and long
     double, respectively. The type of the macro's value is the same as the type
     it describes. The values are not supposed to be greater than 1E-9.

So there -- nine digits.

Dirk    

| # And this is through eclipse (java)
|
| > sessionInfo()
| R version 2.13.1 (2011-07-08)
| Platform: i686-pc-linux-gnu (32-bit)
|
| locale:
| [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
| [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
| [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
| [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
| [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
|
| attached base packages:
| [1] stats graphics grDevices utils datasets methods base
|
| other attached packages:
| [1] rj_0.5.2-1
|
| loaded via a namespace (and not attached):
| [1] rJava_0.9-1 tools_2.13.1
|
| > set.seed(123)
| > print(coef(lm(dist~speed, data=cars)),digits=22)
| (Intercept) speed
| -17.57909489051087703615 3.93240875912408460735
|
| ______________________________________________
| R-devel_at_r-project.org mailing list
|
https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Gauss once played himself in a zero-sum game and won $50.
                      -- #11 at http://www.gaussfacts.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 20 Jul 2011 - 13:40:53 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 20 Jul 2011 - 16:10:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive