Re: [R] Comparison: glm() vs. bigglm()

From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>
Date: Fri, 29 Jun 2007 17:05:29 +0200

Benilton Carvalho wrote:
> Hi,
>
> Until now, I thought that the results of glm() and bigglm() would
> coincide. Probably a naive assumption?
>
> Anyways, I've been using bigglm() on some datasets I have available.
> One of the sets has >15M observations.
>
> I have 3 continuous predictors (A, B, C) and a binary outcome (Y).
> And tried the following:
>
> m1 <- bigglm(Y~A+B+C, family=binomial(), data=dataset1, chunksize=10e6)
> m2 <- bigglm(Y~A*B+C, family=binomial(), data=dataset1, chunksize=10e6)
> imp <- m1$deviance-m2$deviance
>
> For my surprise "imp" was negative.
>
> I then tried the same models, using glm() instead... and as I
> expected, "imp" was positive.
>
> I also noticed differences on the coefficients estimated by glm() and
> bigglm() - small differences, though, and CIs for the coefficients (a
> given coefficient compared across methods) overlap.
>
> Are such incrongruences expected? What can I use to check for
> convergence with bigglm(), as this might be one plausible cause for a
> negative difference on the deviances?
>
It doesn't sound right, but I cannot reproduce your problem on a similar sized problem (it pretty much killed my machine...). Some observations:

  1. You do realize that you are only using 1.5 chunks? (15M vs. 10e6 chunksize)
  2. Deviance changes are O(1) under the null hypothesis but the deviances themselves are O(N). In a smaller variant (N=1e5), I got

> m1$deviance

[1] 138626.4
> m2$deviance

[1] 138626.4
> m2$deviance - m1$deviance

[1] -0.05865785

This does leave some scope for roundoff to creep in. You may want to play with a lower setting of tol=...

-- 
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 29 Jun 2007 - 15:42:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 29 Jun 2007 - 16:32:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.