Re: [Rd] RFC: lchoose() vs lfactorial() etc

From: Petr Savicky <savicky_at_cs.cas.cz>
Date: Fri, 18 Dec 2009 00:14:49 +0100

On Thu, Dec 17, 2009 at 03:10:49PM +0100, Martin Maechler wrote: [...]

>     MM> This, of course, is an even more compelling reason to implement
>     MM> the change of return  log(abs(choose(.,.)),
>     MM> and at the moment, I'd even plan to "backport" that to R "2.10.1
>     MM> patched", as the current behavior is clearly bugous.
> 
> This has now happened;
> svn revisions  50775 and 50776  {R-devel and R-patched}.

Thank you.

When preparing a test, whether lchoose(n, k) \approx log(abs(choose(n, k))), it appeared that there is a minor problem also in choose(n, k), when n is very close to k, but not equal.

  options(digits=15)
  n <- 5 + (-15:15)*1e-8
  cbind(n, choose(n, 5))

  #                n                  
  #  [1,] 4.99999985 0.999999657500042
  #  [2,] 4.99999986 0.999999680333370
  #  [3,] 4.99999987 0.999999703166698
  #  [4,] 4.99999988 0.999999726000027
  #  [5,] 4.99999989 0.999999748833356
  #  [6,] 4.99999990 0.999999771666685
  #  [7,] 4.99999991 0.000000000000000
  #  [8,] 4.99999992 0.000000000000000
  #  [9,] 4.99999993 0.000000000000000

# [10,] 4.99999994 0.000000000000000
# [11,] 4.99999995 0.000000000000000
# [12,] 4.99999996 0.000000000000000
# [13,] 4.99999997 0.000000000000000
# [14,] 4.99999998 0.000000000000000
# [15,] 4.99999999 0.000000000000000
# [16,] 5.00000000 1.000000000000000
# [17,] 5.00000001 5.000000000000000
# [18,] 5.00000002 5.000000000000000
# [19,] 5.00000003 5.000000000000000
# [20,] 5.00000004 5.000000000000000
# [21,] 5.00000005 5.000000000000000
# [22,] 5.00000006 5.000000000000000
# [23,] 5.00000007 5.000000000000000
# [24,] 5.00000008 5.000000000000000
# [25,] 5.00000009 5.000000000000000
# [26,] 5.00000010 1.000000228333353
# [27,] 5.00000011 1.000000251166690
# [28,] 5.00000012 1.000000274000027
# [29,] 5.00000013 1.000000296833365
# [30,] 5.00000014 1.000000319666704
# [31,] 5.00000015 1.000000342500042

All the values in the second column should be close to 1, but some are 0 and some are 5. The reason seems to be in the line 112 of src/nmath/choose.c, which is
  if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ If n is not exactly an integer, then k becomes also non-integer. Since the code relies on k being an exact integer, we get an error
as follows.

If n = k - eps for a small positive eps, then the next line 113   if (k < 0) return 0.;
determines the output since k is now - eps.

If n = k + eps for a small positive eps, then the line 116   r = n;
determines the output, since now k = eps, so the condition j <= k is not
satisfied already at the beginning of the next cycle.

I suggest two solutions, a more conservative one and a less conservative one.

A more conservative solution is to replace the line 112 by   if(n-k < k && n >= 0 && R_IS_INT(n)) k = floor(n - k + 0.5); /* <- Symmetry */ This yields the following in the same test as above.

  #                n                  
  #  [1,] 4.99999985 0.999999657500042
  #  [2,] 4.99999986 0.999999680333370
  #  [3,] 4.99999987 0.999999703166698
  #  [4,] 4.99999988 0.999999726000027
  #  [5,] 4.99999989 0.999999748833356
  #  [6,] 4.99999990 0.999999771666685
  #  [7,] 4.99999991 1.000000000000000
  #  [8,] 4.99999992 1.000000000000000
  #  [9,] 4.99999993 1.000000000000000

# [10,] 4.99999994 1.000000000000000
# [11,] 4.99999995 1.000000000000000
# [12,] 4.99999996 1.000000000000000
# [13,] 4.99999997 1.000000000000000
# [14,] 4.99999998 1.000000000000000
# [15,] 4.99999999 1.000000000000000
# [16,] 5.00000000 1.000000000000000
# [17,] 5.00000001 1.000000000000000
# [18,] 5.00000002 1.000000000000000
# [19,] 5.00000003 1.000000000000000
# [20,] 5.00000004 1.000000000000000
# [21,] 5.00000005 1.000000000000000
# [22,] 5.00000006 1.000000000000000
# [23,] 5.00000007 1.000000000000000
# [24,] 5.00000008 1.000000000000000
# [25,] 5.00000009 1.000000000000000
# [26,] 5.00000010 1.000000228333353
# [27,] 5.00000011 1.000000251166690
# [28,] 5.00000012 1.000000274000027
# [29,] 5.00000013 1.000000296833365
# [30,] 5.00000014 1.000000319666704
# [31,] 5.00000015 1.000000342500042

So, the extreme values 0 and 5 do not occur, but there is an interval of constant values and on both ends of this interval, there is a jump much larger than machine accuracy.

A less conservative solution (a patch is in an attachment) replaces lines 110 - 121 by

    if (k < k_small_max) {

        int j;
        if(n-k < k && n >= 0 && (n == floor(n))) k = n-k; /* <- Symmetry */
        if (k <  0) return 0.;
        if (k == 0) return 1.;
        /* else: k >= 1 */
        r = n;
        for(j = 2; j <= k; j++) {
            r *= (n-j+1);
            r /= j;
        }
        return r;

    }

Here, the symetry is used only for exact integers. The symetry is valid only for integers, since choose(n, k) is a polynomial of degree k. So, for different values of k, we get a different functions on any interval of positive length.

The division at the line

            r /= j;
always produces an exact integer, since during the whole cycle, r is always either an integer binomial coefficient or its integer multiple. So, the division has an integer result and there is no rounding error unless the
intermediate result does not fit into 53 bit precison.

Using the above code, we get

  options(digits=15)
  n <- 5 + (-15:15)*1e-8
  cbind(n, choose(n, 5))

  #                n                  
  #  [1,] 4.99999985 0.999999657500042
  #  [2,] 4.99999986 0.999999680333370
  #  [3,] 4.99999987 0.999999703166698
  #  [4,] 4.99999988 0.999999726000027
  #  [5,] 4.99999989 0.999999748833356
  #  [6,] 4.99999990 0.999999771666685
  #  [7,] 4.99999991 0.999999794500014
  #  [8,] 4.99999992 0.999999817333345
  #  [9,] 4.99999993 0.999999840166677

# [10,] 4.99999994 0.999999863000008
# [11,] 4.99999995 0.999999885833339
# [12,] 4.99999996 0.999999908666670
# [13,] 4.99999997 0.999999931500002
# [14,] 4.99999998 0.999999954333335
# [15,] 4.99999999 0.999999977166667
# [16,] 5.00000000 1.000000000000000
# [17,] 5.00000001 1.000000022833334
# [18,] 5.00000002 1.000000045666667
# [19,] 5.00000003 1.000000068500001
# [20,] 5.00000004 1.000000091333336
# [21,] 5.00000005 1.000000114166671
# [22,] 5.00000006 1.000000137000006
# [23,] 5.00000007 1.000000159833342
# [24,] 5.00000008 1.000000182666680
# [25,] 5.00000009 1.000000205500016
# [26,] 5.00000010 1.000000228333353
# [27,] 5.00000011 1.000000251166690
# [28,] 5.00000012 1.000000274000027
# [29,] 5.00000013 1.000000296833365
# [30,] 5.00000014 1.000000319666704
# [31,] 5.00000015 1.000000342500042

Within the chosen accuracy 1e-8, there is no visible jump. With 1e-15, we get

  options(digits=15)
  n <- 5 + (-15:15)*1e-15
  cbind(n, choose(n, 5))

  #                      n                  
  #  [1,] 4.99999999999998 0.999999999999966
  #  [2,] 4.99999999999999 0.999999999999967
  #  [3,] 4.99999999999999 0.999999999999970
  #  [4,] 4.99999999999999 0.999999999999972
  #  [5,] 4.99999999999999 0.999999999999976
  #  [6,] 4.99999999999999 0.999999999999978
  #  [7,] 4.99999999999999 0.999999999999980
  #  [8,] 4.99999999999999 0.999999999999982
  #  [9,] 4.99999999999999 0.999999999999984

# [10,] 4.99999999999999 0.999999999999986
# [11,] 4.99999999999999 0.999999999999988
# [12,] 5.00000000000000 0.999999999999990
# [13,] 5.00000000000000 0.999999999999994
# [14,] 5.00000000000000 0.999999999999996
# [15,] 5.00000000000000 0.999999999999998
# [16,] 5.00000000000000 1.000000000000000
# [17,] 5.00000000000000 1.000000000000002
# [18,] 5.00000000000000 1.000000000000004
# [19,] 5.00000000000000 1.000000000000006
# [20,] 5.00000000000000 1.000000000000010
# [21,] 5.00000000000001 1.000000000000012
# [22,] 5.00000000000001 1.000000000000014
# [23,] 5.00000000000001 1.000000000000016
# [24,] 5.00000000000001 1.000000000000018
# [25,] 5.00000000000001 1.000000000000020
# [26,] 5.00000000000001 1.000000000000022
# [27,] 5.00000000000001 1.000000000000024
# [28,] 5.00000000000001 1.000000000000028
# [29,] 5.00000000000001 1.000000000000031
# [30,] 5.00000000000001 1.000000000000032
# [31,] 5.00000000000002 1.000000000000035

So, the jump in the exactly integer value n[16] == 5 is comparable to machine accuracy.

I would be pleased to provide more tests, if some of the above solutions or their modifications can be accepted.

Petr Savicky.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 17 Dec 2009 - 23:19:33 GMT

This archive was generated by hypermail 2.2.0 : Mon 21 Dec 2009 - 12:41:11 GMT