Re: [Rd] choose(n,k) when n is almost integer

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Sat, 19 Dec 2009 22:02:49 +0100

{I've changed the subject; it's really no longer about  lchoose()'s definition}

>>>>> Petr Savicky <savicky_at_cs.cas.cz>
>>>>> on Fri, 18 Dec 2009 00:14:49 +0100 writes:

> On Thu, Dec 17, 2009 at 03:10:49PM +0100, Martin Maechler wrote:
> [...]

    MM> This, of course, is an even more compelling reason to implement
    MM> the change of return  log(abs(choose(.,.)),
    MM> and at the moment, I'd even plan to "backport" that to R "2.10.1
    MM> patched", as the current behavior is clearly bugous.
    >> 

    >> This has now happened;
    >> svn revisions 50775 and 50776 {R-devel and R-patched}.

> Thank you.

> When preparing a test, whether lchoose(n, k) \approx log(abs(choose(n, k))), it
> appeared that there is a minor problem also in choose(n, k), when n is very close
> to k, but not equal.

> options(digits=15)
> n <- 5 + (-15:15)*1e-8
> cbind(n, choose(n, 5))
> # n
> # [1,] 4.99999985 0.999999657500042
> # [2,] 4.99999986 0.999999680333370
> # [3,] 4.99999987 0.999999703166698
> # [4,] 4.99999988 0.999999726000027
> # [5,] 4.99999989 0.999999748833356
> # [6,] 4.99999990 0.999999771666685
> # [7,] 4.99999991 0.000000000000000
> # [8,] 4.99999992 0.000000000000000
> # [9,] 4.99999993 0.000000000000000
> # [10,] 4.99999994 0.000000000000000
> # [11,] 4.99999995 0.000000000000000
> # [12,] 4.99999996 0.000000000000000
> # [13,] 4.99999997 0.000000000000000
> # [14,] 4.99999998 0.000000000000000
> # [15,] 4.99999999 0.000000000000000
> # [16,] 5.00000000 1.000000000000000
> # [17,] 5.00000001 5.000000000000000
> # [18,] 5.00000002 5.000000000000000
> # [19,] 5.00000003 5.000000000000000
> # [20,] 5.00000004 5.000000000000000
> # [21,] 5.00000005 5.000000000000000
> # [22,] 5.00000006 5.000000000000000
> # [23,] 5.00000007 5.000000000000000
> # [24,] 5.00000008 5.000000000000000
> # [25,] 5.00000009 5.000000000000000
> # [26,] 5.00000010 1.000000228333353
> # [27,] 5.00000011 1.000000251166690
> # [28,] 5.00000012 1.000000274000027
> # [29,] 5.00000013 1.000000296833365
> # [30,] 5.00000014 1.000000319666704
> # [31,] 5.00000015 1.000000342500042

> All the values in the second column should be close to 1, but some are 0 and
> some are 5. The reason seems to be in the line 112 of src/nmath/choose.c,
> which is
> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
> If n is not exactly an integer, then k becomes also non-integer.
> Since the code relies on k being an exact integer, we get an error
> as follows.

> If n = k - eps for a small positive eps, then the next line 113
> if (k < 0) return 0.;
> determines the output since k is now - eps.

> If n = k + eps for a small positive eps, then the line 116
> r = n;
> determines the output, since now k = eps, so the condition j <= k is not
> satisfied already at the beginning of the next cycle.

> I suggest two solutions, a more conservative one and a less conservative one.

> A more conservative solution is to replace the line 112 by
> if(n-k < k && n >= 0 && R_IS_INT(n)) k = floor(n - k + 0.5); /* <- Symmetry */
> This yields the following in the same test as above.

> # n
> # [1,] 4.99999985 0.999999657500042
> # [2,] 4.99999986 0.999999680333370
> # [3,] 4.99999987 0.999999703166698
> # [4,] 4.99999988 0.999999726000027
> # [5,] 4.99999989 0.999999748833356
> # [6,] 4.99999990 0.999999771666685
> # [7,] 4.99999991 1.000000000000000
> # [8,] 4.99999992 1.000000000000000
> # [9,] 4.99999993 1.000000000000000
> # [10,] 4.99999994 1.000000000000000
> # [11,] 4.99999995 1.000000000000000
> # [12,] 4.99999996 1.000000000000000
> # [13,] 4.99999997 1.000000000000000
> # [14,] 4.99999998 1.000000000000000
> # [15,] 4.99999999 1.000000000000000
> # [16,] 5.00000000 1.000000000000000
> # [17,] 5.00000001 1.000000000000000
> # [18,] 5.00000002 1.000000000000000
> # [19,] 5.00000003 1.000000000000000
> # [20,] 5.00000004 1.000000000000000
> # [21,] 5.00000005 1.000000000000000
> # [22,] 5.00000006 1.000000000000000
> # [23,] 5.00000007 1.000000000000000
> # [24,] 5.00000008 1.000000000000000
> # [25,] 5.00000009 1.000000000000000
> # [26,] 5.00000010 1.000000228333353
> # [27,] 5.00000011 1.000000251166690
> # [28,] 5.00000012 1.000000274000027
> # [29,] 5.00000013 1.000000296833365
> # [30,] 5.00000014 1.000000319666704
> # [31,] 5.00000015 1.000000342500042

> So, the extreme values 0 and 5 do not occur, but there is an interval
> of constant values and on both ends of this interval, there is a jump
> much larger than machine accuracy.

> A less conservative solution (a patch is in an attachment) replaces
> lines 110 - 121 by

> if (k < k_small_max) {
> int j;
> if(n-k < k && n >= 0 && (n == floor(n))) k = n-k; /* <- Symmetry */
> if (k < 0) return 0.;
> if (k == 0) return 1.;
> /* else: k >= 1 */
> r = n;
> for(j = 2; j <= k; j++) {
> r *= (n-j+1);
> r /= j;
> }
> return r;
> }

> Here, the symetry is used only for exact integers. The symetry is
> valid only for integers, since choose(n, k) is a polynomial of
> degree k. So, for different values of k, we get a different functions
> on any interval of positive length.

> The division at the line
> r /= j;
> always produces an exact integer, since during the whole cycle, r is always
> either an integer binomial coefficient or its integer multiple. So, the
> division has an integer result and there is no rounding error unless the
> intermediate result does not fit into 53 bit precison.

> Using the above code, we get

> options(digits=15)
> n <- 5 + (-15:15)*1e-8
> cbind(n, choose(n, 5))
> # n
> # [1,] 4.99999985 0.999999657500042
> # [2,] 4.99999986 0.999999680333370
> # [3,] 4.99999987 0.999999703166698
> # [4,] 4.99999988 0.999999726000027
> # [5,] 4.99999989 0.999999748833356
> # [6,] 4.99999990 0.999999771666685
> # [7,] 4.99999991 0.999999794500014
> # [8,] 4.99999992 0.999999817333345
> # [9,] 4.99999993 0.999999840166677
> # [10,] 4.99999994 0.999999863000008
> # [11,] 4.99999995 0.999999885833339
> # [12,] 4.99999996 0.999999908666670
> # [13,] 4.99999997 0.999999931500002
> # [14,] 4.99999998 0.999999954333335
> # [15,] 4.99999999 0.999999977166667
> # [16,] 5.00000000 1.000000000000000
> # [17,] 5.00000001 1.000000022833334
> # [18,] 5.00000002 1.000000045666667
> # [19,] 5.00000003 1.000000068500001
> # [20,] 5.00000004 1.000000091333336
> # [21,] 5.00000005 1.000000114166671
> # [22,] 5.00000006 1.000000137000006
> # [23,] 5.00000007 1.000000159833342
> # [24,] 5.00000008 1.000000182666680
> # [25,] 5.00000009 1.000000205500016
> # [26,] 5.00000010 1.000000228333353
> # [27,] 5.00000011 1.000000251166690
> # [28,] 5.00000012 1.000000274000027
> # [29,] 5.00000013 1.000000296833365
> # [30,] 5.00000014 1.000000319666704
> # [31,] 5.00000015 1.000000342500042

> Within the chosen accuracy 1e-8, there is no visible jump.
> With 1e-15, we get

> options(digits=15)
> n <- 5 + (-15:15)*1e-15
> cbind(n, choose(n, 5))
> # n
> # [1,] 4.99999999999998 0.999999999999966
> # [2,] 4.99999999999999 0.999999999999967
> # [3,] 4.99999999999999 0.999999999999970
> # [4,] 4.99999999999999 0.999999999999972
> # [5,] 4.99999999999999 0.999999999999976
> # [6,] 4.99999999999999 0.999999999999978
> # [7,] 4.99999999999999 0.999999999999980
> # [8,] 4.99999999999999 0.999999999999982
> # [9,] 4.99999999999999 0.999999999999984
> # [10,] 4.99999999999999 0.999999999999986
> # [11,] 4.99999999999999 0.999999999999988
> # [12,] 5.00000000000000 0.999999999999990
> # [13,] 5.00000000000000 0.999999999999994
> # [14,] 5.00000000000000 0.999999999999996
> # [15,] 5.00000000000000 0.999999999999998
> # [16,] 5.00000000000000 1.000000000000000
> # [17,] 5.00000000000000 1.000000000000002
> # [18,] 5.00000000000000 1.000000000000004
> # [19,] 5.00000000000000 1.000000000000006
> # [20,] 5.00000000000000 1.000000000000010
> # [21,] 5.00000000000001 1.000000000000012
> # [22,] 5.00000000000001 1.000000000000014
> # [23,] 5.00000000000001 1.000000000000016
> # [24,] 5.00000000000001 1.000000000000018
> # [25,] 5.00000000000001 1.000000000000020
> # [26,] 5.00000000000001 1.000000000000022
> # [27,] 5.00000000000001 1.000000000000024
> # [28,] 5.00000000000001 1.000000000000028
> # [29,] 5.00000000000001 1.000000000000031
> # [30,] 5.00000000000001 1.000000000000032
> # [31,] 5.00000000000002 1.000000000000035

> So, the jump in the exactly integer value n[16] == 5 is comparable
> to machine accuracy.

> I would be pleased to provide more tests, if some of the above solutions
> or their modifications can be accepted.

> Petr Savicky.

    [.......................]

Thanks a lot, Petr, for your explorations. I agree that at least something like your conservative change should happen.

Personally, I'd agree with you and prefer the "progressive" (non-conservative) change,
basically getting rid of all R_IS_INT(n) tests, which seem kludgey.

OTOH, the R core members who did start using R_IS_INT(n) must have had good reasons with another set examples.

Have you tried 'make check-devel' (or 'check-all') also with the progressive change?

I'll hope to get back to this on Monday. Martin



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 19 Dec 2009 - 21:05:33 GMT

This archive was generated by hypermail 2.2.0 : Wed 23 Dec 2009 - 04:21:12 GMT